IO Module

This module contains the functions pew uses for importing and exporting data.

Agilent

Import of line-by-line collected Agilent ‘.b’ batches. Both raw binaries and the ‘.csv’ exports are supported. Tested with Agilent 7500, 7700 and 8900 ICPs.

pewlib.io.agilent.collect_datafiles(path, methods)

Finds ‘.d’ datafiles in a directory.

A list of expected datafiles is created for each method in methods. Methods are tested in order until ones successfully finds ALL expected datafiles.

Parameters:

path (str | Path) – path to directory
methods (list[str]) – list of methods to try, {‘alphabetical’, ‘acq_method_xml’, ‘batch_csv’, ‘batch_xml’}

Return type:

list[Path]

Returns:

A list of datafiles

pewlib.io.agilent.load(path, collection_methods=None, use_acq_for_names=True, counts_per_second=False, drop_names=None, full=False)

Imports an Agilent ‘.b’ batch.

First attempts a binary import, falling back to importing any ‘.csv’ files.

Parameters:

path (str | Path) – path to batch
collection_methods (list[str] | None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]
use_acq_for_names (bool) – read element names from ‘AcqMethod.xml’, only for csv
counts_per_second (bool) – return data in CPS, only for binary
drop_names (list[str] | None) – names to remove from final array
full (bool) – also return dict with scantime

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

pewlib.io.agilent.load_binary(path, collection_methods=None, counts_per_second=False, drop_names=None, full=False)

Imports an Agilent ‘.b’ batch.

Import is performed using the ‘MSScan.bin’, ‘MSProfile.bin’ binaries and ‘MSTS_XSpecific.xml’ document. By default drop_names drops the ‘Time’ field.

Parameters:

path (str | Path) – path to batch
collection_methods (list[str] | None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]
counts_per_second (bool) – return data in CPS
drop_names (list[str] | None) – names to remove from final array
full (bool) – also return dict with scantime

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

Raises:

FileNotFoundError – ‘MSScan.bin’, ‘MSProfile.bin’ or ‘MSTS_XSpecific.xml’ not found
IOError – invalid binary format

See also

pewlib.io.agilent.collect_datafiles()

pewlib.io.agilent.load_csv(path, collection_methods=None, use_acq_for_names=True, drop_names=None, full=False)

Imports an Agilent ‘.b’ batch.

Import is performed using the ‘.csv’ files found in each ‘.d’ datafile. If a ‘.csv’ can not be found then all data in the line is set to 0. To load properly formatted element names use use_acq_for_names. By default drop_names drops the ‘Time_[Sec]’ field.

Parameters:

path (str | Path) – path to batch
collection_methods (list[str] | None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]
use_acq_for_names (bool) – read element names from ‘AcqMethod.xml’
drop_names (list[str] | None) – names to remove from final array
full (bool) – also return dict with scantime

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

See also

pewlib.io.agilent.collect_datafiles()

pewlib.io.agilent.load_info(path)

Reads information from a batch.

Instrument info is read from the first Devices.xml found, batch info from the BatchLog.xml. An empty dictionary is returned if neither file can be read.

Possible keys:: Acquisition {Date,Name,Path,User} Instrument {Type,Model,Serial,Vendor}

Parameters:: path (str | Path) – path to batch
Return type:: dict[str, str]
Returns:: dict

CSV

Import of line-by-line data stored as a series of .csv files.

class pewlib.io.csv.GenericOption(drop_names=None, kw_genfromtxt=None, regex='.*\\\\.csv', drop_nan_rows=False, drop_nan_columns=False, transposed=False)

Options for instrument specific csv imports.

Options are used by pewlib.io.csv.load() to filter and sort paths, generate data and read parameters from csvs.

Parameters:

drop_names (list[str] | None) – columns dropped from imports
kw_genfromtxt (dict | None) – kwargs for numpy.genfromtxt
regex (str) – regex string for matching filenames

filter(paths)

Filter non matching paths.

Return type:: list[Path]

readParams(data)

Read parameters from data.

Return type:: dict

sort(paths)

Sort paths using ‘sortkey’.

Return type:: list[Path]

validForPath(path)

Checks if option is valid for a file or directory.

Return type:: bool

class pewlib.io.csv.NuOption

Option for Nu Instruments data.

readParams(data)

Read parameters from data.

Return type:: dict

sortkey(path)

Sorts files numerically.

Return type:: int

class pewlib.io.csv.ThermoLDROption

Option for Thermo iCAP LDR data.

readParams(data)

Read parameters from data.

Return type:: dict

sortkey(path)

Sorts files numerically.

Return type:: int

class pewlib.io.csv.TofwerkOption

Option for TOFWERK data.

readParams(data)

Read parameters from data.

Return type:: dict

sortkey(path)

Sorts files using the timestamp in name.

Return type:: float

pewlib.io.csv.is_valid_directory(path)

Tests if a directory contains at least one csv.

Return type:: bool

pewlib.io.csv.load(path, option=None, full=False)

Load a directory where lines are stored in separate .csv files.

Paths are filtered and sorted according to the option used, defaulting to the value of pewlib.io.csv.option_for_path().

Parameters:

path (str | Path) – directory
hint – type hint (NuHint, TofwerkHint)
genfromtxtkws – kwargs for numpy.genfromtxt
full (bool) – also return parameters

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

See also

pewlib.io.csv.GenericOption numpy.genfromtxt()

pewlib.io.csv.option_for_path(path)

Attempts to find the correct type hint for the directory. If no specific type hint is found then a GenericOption.

Return type:: GenericOption

ImzML

Import of mass-spec imaging data in the imzML format. Each imzML file consists of a xml (‘.imzML’) and external binary (‘.ibd’)

class pewlib.io.imzml.ImzML(scan_settings, mz_params, intensity_params, spectra, external_binary)

Class for storing relevant data parsed from an imzML file.

To generate from a file use pewlib.io.imzml.ImzML.from_file().

Parameters:

scan_settings (ScanSettings) – a pewlib.io.imzml.ScanSettings
mz_params (ParamGroup) – a pewlib.io.imzml.ParamGroup for mz array
intensity_params (ParamGroup) – a pewlib.io.imzml.ParamGroup for intensities
spectra (list[Spectrum] | dict[tuple[int, int], Spectrum]) – either a list of pewlib.io.imzml.Spectrum or a dict mapping pixel positions to each pewlib.io.imzml.Spectrum.
external_binary (Path | str) – path to the ‘.idb’ binary

binned_masses(mass_width_mz=0.1)

Summed intensities within a certain width.

Bins data across the entire mass range, with a bin width of mass_width_mz.

Parameters:: mass_width_mz (float) – width of each bin
Return type:: tuple[ndarray, ndarray]
Returns:: array of bins, binned intensity data (Y, X, N)

extract_masses(target_masses, mass_width_ppm=None, mass_width_mz=None)

Extracts image of one or more m/z.

Data within +/- 0.5 mass_width_ppm or mass_width_mz is summed.

Parameters:

target_masses (ndarray | float) – m/z to extract
mass_width_ppm (float | None) – extraction width in ppm
mass_width_mz (float | None) – extraction width in m/z (Da)

Return type:

ndarray

Returns:

array of intensities, shape (Y, X, N)

extract_tic()

The total-ion-chromatogram image.

Extracted from the cvParam MS:1000285 if availble, otherwise the summed intensities.

Return type:: ndarray
Returns:: image of tic, shape (Y, X)

classmethod from_etree(et, external_binary, scan_number=1)

Create an ImzML class from a pre-parsed element tree.

Parameters:

et (ElementTree) – the element tree from parsing
external_binary (Path | str) – path to ‘.idb’ file
scan_number (int) – scan number to import

Raises:

ValueError – when vital parameters are missing

Return type:

ImzML

classmethod from_file(path, external_binary=None, use_fast_parse=False)

Create an ImzML object from a file path. If external_binary is None, the imzML path with suffix ‘.ibd’ is used.

Parameters:

path (Path | str) – path to imzML file
external_binary (Path | str | None) – path to .ibd file

Raises:

FileNotFoundError – if path or external_binary do not exist

Return type:

ImzML

mass_range()

Maximum mass range.

Return type:: tuple[float, float]
Returns:: lowest m/z, highest m/z

untargeted_extraction(num=10, precision_mz=0.1, min_pixel_count=10, min_height_fraction=0.1, min_height_absolute=100.0)

Extracts the num most abundant masses for each spectra. The precision specifies the width to use for grouping similar masses.

Parameters:

num (int) – number of peaks per spectra to test
precision – number of decimals for grouping m/z
min_pixel_count (int) – minimum number of pixels a mass must occour in
min_height_fraction (float) – minimum peak height as fraction of maximum image signal
min_height_absolute (float) – minimum peak height in counts

Return type:

tuple[ndarray, ndarray]

Returns:

array of (average) masses len N, image of size (Y, X, N)

class pewlib.io.imzml.ParamGroup(id, dtype, compressed=False, external=False)

Stores imzML referenceableParamGroup info.

Generate from an imzML <referenceableParamGroup> using pewlib.io.imzml.ParamGroup.from_xml_element().

Only un-compressed, external data is supported.

Parameters:

id (str) – id of the group, e.g. ‘mzArray’, ‘intensities’
dtype (type) – type of data referenced, e.g. np.float32
compressed (bool) – is the data compressed
external (bool) – is data external

classmethod from_xml_element(element)

Generate ParamGroup from an imzML <referenceableParamGroup> element.

Attempts to read the id, dtype, compression and if data is external.

Parameters:

element (Element) – the <referenceableParamGroup> element

Raises:

ValueError when id, type not found –
NotImplementedError – when data is compresssed

Return type:

ParamGroup

class pewlib.io.imzml.ScanSettings(image_size, pixel_size)

Stores imzML scan settings.

Generate from a <scanSettings> imzML element using pewlib.io.imzml.ScanSettings.from_xml_element().

Parameters:

image_size (tuple[int, int] | None) – size in pixels (x, y), or None
pixel_size (tuple[float, float]) – pixel size in μm (x, y)

classmethod from_xml_element(element)

Generate ScanSettings from an imzML <scanSettings> element.

Attempts to read the image and pixel size from the imzML. Image size is absent in some imzML files.

Parameters:: element (Element) – the <scanSettings> element
Raises:: ValueError when pixel size not found –
Return type:: ScanSettings

class pewlib.io.imzml.Spectrum(pos, tic, offsets, lengths)

Stores an imzML spectrum info.

Generate from a <spectrum> imzML element using pewlib.io.imzml.Spectrum.from_xml_element().

Parameters:

pos (tuple[int, int]) – pixel pos (x, y)
tic (float | None) – optional total-ion-chromatogram value
offsets (dict[str, int]) – dict of {pewlib.io.imzml.ParamGroup.id: external data offset in bytes}
lengths (dict[str, int]) – dict of {pewlib.io.imzml.ParamGroup.id: external data length in bytes}

classmethod from_xml_element(element, scan_number=1)

Generate Spectrum from an imzML <spectrum> element.

Attempts to read the pos, tic and external data byte offsets and lengths.

Parameters:: element (Element) – the <spectrum> element
Raises:: ValueError when pos, offset or length are found –
Return type:: Spectrum

get_binary_data(reference_id, dtype, external_binary=None)

Reads data from external binary.

For faster access keep a BufferedReader active for all Spectrum reads, limiting the number of times the file is opened.

Parameters:

reference_id (str) – the pewlib.io.imzml.ParamGroup.id to read.
dtype (type) – data type, e.g. np.float32
external_binary (Path | BufferedReader | None) – path or file handle to .ibd

Return type:

ndarray

Returns:

array of data

pewlib.io.imzml.fast_parse_imzml(imzml, external_binary, callback=None)

Custom non-xml parser for imzML files.

Faster than etree.ElementTree.parse but less reliable. The current file position is reported at each <spectrum> import via the optional callback function. If callback returns False, the import is cancelled and a UserWarning raised.

Parameters:

imzml (Path | str) – path to xml
external_binary (Path | str) – path to the .ibd
callback (Callable[[int], bool] | None) – optional callback function for progress

Return type:

ImzML

Returns:

ImzML class

Raises:

UserWarning – when callback returns False

pewlib.io.imzml.load(imzml, external_binary, target_masses, mass_width_ppm=10.0)

Load data from an imzML.

Parameters:

imzml (Path | str | ImzML) – path to imzML, or pre-parsed tree
external_binary (Path | str) – path to binary data, usually .ibd file
target_masses (float | ndarray) – masses to import
mass_width_ppm (float) – width of imported regions

Return type:

tuple[ndarray, dict]

Returmz:: image data, dict of parameters

Numpy NPZ

Import and export in pew’s custom file format, based on numpy’s compressed ‘.npz’. This format svaes image data, laser parameters and calibrations in one file.

pewlib.io.npz.load(path)

Loads data from ‘.npz’ file.

Loads files created using pewlib.io.npz.save(). On load the a Laser or SRRLaser is reformed from the saved data.

Parameters:: path (str | Path) – path to ‘.npz’
Return type:: Laser
Returns:: Laser or SRRLaser
Raises:: ValueError – incomatible version

See also

numpy.load()

pewlib.io.npz.save(path, laser)

Saves data to ‘.npz’ file.

Converts a Laser or SRRLaser to a series of np.ndarray which are then saved to a compressed ‘.npz’ archive. The time and current version are also saved. If path does not end in ‘.npz’ it is appended.

Parameters:

path (str | Path) – path to save to
laser (Laser | SRRLaser) – Laser or SRRLaser

Return type:

None

See also

numpy.savez_compressed()

NWI Laser Logs

Synchronisation of laser parameters (ablation times and locations) with signal data. Data should be imported using the other pewlib.io modules then passed with the laser parameters file to these functions.

pewlib.io.laser.guess_delay_from_data(data, times)

Guess delay from laser firing to ICP-MS measurement.

Looks for a change of > 10% in the TIC, up to 1 second into data.

Parameters:

data (ndarray) – structured array of signals, flatttend
times (ndarray) – array of times, same length as data

Return type:

float

Returns:

delay in ms

pewlib.io.laser.sync_data_nwi_laser_log(data, times, log_file, sequence=None, delay=None, squeeze=False)

Syncs ICP-MS data collected as a single line per raster with the laser log file.

Parameters:

data (ndarray) – 1d ICP-MS data
times (ndarray | float) – array of times (s) the same size as data, or pixel acquistion time
log – log data or path to LaserLog csv
sequence (ndarray | int | None) – select raster(s) to import, defaults to all
delay (float | None) – delay in s between laser and ICP-MS, default calculates from the TIC
squeeze (bool) – remove any rows and columns of all NaNs

Return type:

tuple[ndarray, dict]

PerkinElmer

Import of line-by-line PerkinElmer ELAN ‘XL’ directories.

pewlib.io.perkinelmer.is_valid_directory(path)

Tests if a directory contains PerkinElmer data.

Ensures the path exists, is a directory and contains at least one ‘.xl’ file.

Return type:: bool

pewlib.io.perkinelmer.load(path, import_parameters=True, full=False)

Loads PerkinElmer directory.

Searches the directory path for ‘.xl’ files and used them to reconstruct data. If import_parameters and a ‘parameters.conf’ is used then the scantime, speed and spotsize can be imported.

Parameters:

path (str | Path) – path to directory
import_parameters (bool) – import params from ‘parameters.conf’
full (bool) – also return dict with params

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

See also

pewlib.io.perkinelmer.collect_datafiles()

Text Image

Import and export of text-images, files where data is stored as delimited text values. Data is read in order from the first line.

pewlib.io.textimage.load(path, delimiter=None, comments='#', name=None)

Load text-image.

Loads 2d data from file. If delimiter is specified then all tab and ‘;’ are converted to ‘,’ before import. If name is specified then a single field structured array is returned.

Parameters:

path (str | Path) – path to file
delimiter (str | None) – file delimiter
comments (str) – file comment character
name (str | None) – return single name field structured array

Return type:

ndarray

pewlib.io.textimage.save(path, data, header='')

Save data to csv.

See numpy.savetxt()

Parameters:

path (str | Path) – path to file
data (ndarray) – unstructured array
header (str) – file header

Return type:

None

VTK

Exports to VTK formats for use in programs such as Paraview.

pewlib.io.vtk.save(path, data, spacing)

Save data as a VTK ImageData XML.

Saves an array to a ‘.vti’ file. Data origin is set to (0, 0) and equally spaced using x, y, z of spacing. If data is rasied to 3-dimensonal if lower.

Parameters:

path (str | Path) – path to file
data (ndarray) – array
spacing (tuple[float, float, float]) – spacing of ‘.vti’

Return type:

None