IO Module

This module contains the functions pew uses for importing and exporting data.

Agilent

Import of line-by-line collected Agilent ‘.b’ batches. Both raw binaries and the ‘.csv’ exports are supported. Tested with Agilent 7500, 7700 and 8900 ICPs.

pewlib.io.agilent.collect_datafiles(path, methods)

Finds ‘.d’ datafiles in a directory.

A list of expected datafiles is created for each method in methods. Methods are tested in order until ones successfully finds ALL expected datafiles.

Parameters:
  • path (str | Path) – path to directory

  • methods (list[str]) – list of methods to try, {‘alphabetical’, ‘acq_method_xml’, ‘batch_csv’, ‘batch_xml’}

Return type:

list[Path]

Returns:

A list of datafiles

pewlib.io.agilent.load(path, collection_methods=None, use_acq_for_names=True, counts_per_second=False, drop_names=None, full=False)

Imports an Agilent ‘.b’ batch.

First attempts a binary import, falling back to importing any ‘.csv’ files.

Parameters:
  • path (str | Path) – path to batch

  • collection_methods (list[str] | None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]

  • use_acq_for_names (bool) – read element names from ‘AcqMethod.xml’, only for csv

  • counts_per_second (bool) – return data in CPS, only for binary

  • drop_names (list[str] | None) – names to remove from final array

  • full (bool) – also return dict with scantime

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

pewlib.io.agilent.load_binary(path, collection_methods=None, counts_per_second=False, drop_names=None, full=False)

Imports an Agilent ‘.b’ batch.

Import is performed using the ‘MSScan.bin’, ‘MSProfile.bin’ binaries and ‘MSTS_XSpecific.xml’ document. By default drop_names drops the ‘Time’ field.

Parameters:
  • path (str | Path) – path to batch

  • collection_methods (list[str] | None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]

  • counts_per_second (bool) – return data in CPS

  • drop_names (list[str] | None) – names to remove from final array

  • full (bool) – also return dict with scantime

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

Raises:
  • FileNotFoundError – ‘MSScan.bin’, ‘MSProfile.bin’ or ‘MSTS_XSpecific.xml’ not found

  • IOError – invalid binary format

pewlib.io.agilent.load_csv(path, collection_methods=None, use_acq_for_names=True, drop_names=None, full=False)

Imports an Agilent ‘.b’ batch.

Import is performed using the ‘.csv’ files found in each ‘.d’ datafile. If a ‘.csv’ can not be found then all data in the line is set to 0. To load properly formatted element names use use_acq_for_names. By default drop_names drops the ‘Time_[Sec]’ field.

Parameters:
  • path (str | Path) – path to batch

  • collection_methods (list[str] | None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]

  • use_acq_for_names (bool) – read element names from ‘AcqMethod.xml’

  • drop_names (list[str] | None) – names to remove from final array

  • full (bool) – also return dict with scantime

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

pewlib.io.agilent.load_info(path)

Reads information from a batch.

Instrument info is read from the first Devices.xml found, batch info from the BatchLog.xml. An empty dictionary is returned if neither file can be read.

Possible keys:

Acquisition {Date,Name,Path,User} Instrument {Type,Model,Serial,Vendor}

Parameters:

path (str | Path) – path to batch

Return type:

dict[str, str]

Returns:

dict

CSV

Import of line-by-line data stored as a series of .csv files.

class pewlib.io.csv.GenericOption(drop_names=None, kw_genfromtxt=None, regex='.*\\\\.csv', drop_nan_rows=False, drop_nan_columns=False, transposed=False)

Options for instrument specific csv imports.

Options are used by pewlib.io.csv.load() to filter and sort paths, generate data and read parameters from csvs.

Parameters:
  • drop_names (list[str] | None) – columns dropped from imports

  • kw_genfromtxt (dict | None) – kwargs for numpy.genfromtxt

  • regex (str) – regex string for matching filenames

filter(paths)

Filter non matching paths.

Return type:

list[Path]

readParams(data)

Read parameters from data.

Return type:

dict

sort(paths)

Sort paths using ‘sortkey’.

Return type:

list[Path]

validForPath(path)

Checks if option is valid for a file or directory.

Return type:

bool

class pewlib.io.csv.NuOption

Option for Nu Instruments data.

readParams(data)

Read parameters from data.

Return type:

dict

sortkey(path)

Sorts files numerically.

Return type:

int

class pewlib.io.csv.ThermoLDROption

Option for Thermo iCAP LDR data.

readParams(data)

Read parameters from data.

Return type:

dict

sortkey(path)

Sorts files numerically.

Return type:

int

class pewlib.io.csv.TofwerkOption

Option for TOFWERK data.

readParams(data)

Read parameters from data.

Return type:

dict

sortkey(path)

Sorts files using the timestamp in name.

Return type:

float

pewlib.io.csv.is_valid_directory(path)

Tests if a directory contains at least one csv.

Return type:

bool

pewlib.io.csv.load(path, option=None, full=False)

Load a directory where lines are stored in separate .csv files.

Paths are filtered and sorted according to the option used, defaulting to the value of pewlib.io.csv.option_for_path().

Parameters:
  • path (str | Path) – directory

  • hint – type hint (NuHint, TofwerkHint)

  • genfromtxtkws – kwargs for numpy.genfromtxt

  • full (bool) – also return parameters

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

See also

pewlib.io.csv.GenericOption numpy.genfromtxt()

pewlib.io.csv.option_for_path(path)

Attempts to find the correct type hint for the directory. If no specific type hint is found then a GenericOption.

Return type:

GenericOption

ImzML

Import of mass-spec imaging data in the imzML format. Each imzML file consists of a xml (‘.imzML’) and external binary (‘.ibd’)

class pewlib.io.imzml.ImzML(scan_settings, mz_params, intensity_params, spectra, external_binary)

Class for storing relevant data parsed from an imzML file.

To generate from a file use pewlib.io.imzml.ImzML.from_file().

Parameters:
binned_masses(mass_width_mz=0.1)

Summed intensities within a certain width.

Bins data across the entire mass range, with a bin width of mass_width_mz.

Parameters:

mass_width_mz (float) – width of each bin

Return type:

tuple[ndarray, ndarray]

Returns:

array of bins, binned intensity data (Y, X, N)

extract_masses(target_masses, mass_width_ppm=None, mass_width_mz=None)

Extracts image of one or more m/z.

Data within +/- 0.5 mass_width_ppm or mass_width_mz is summed.

Parameters:
  • target_masses (ndarray | float) – m/z to extract

  • mass_width_ppm (float | None) – extraction width in ppm

  • mass_width_mz (float | None) – extraction width in m/z (Da)

Return type:

ndarray

Returns:

array of intensities, shape (Y, X, N)

extract_tic()

The total-ion-chromatogram image.

Extracted from the cvParam MS:1000285 if availble, otherwise the summed intensities.

Return type:

ndarray

Returns:

image of tic, shape (Y, X)

classmethod from_etree(et, external_binary, scan_number=1)

Create an ImzML class from a pre-parsed element tree.

Parameters:
  • et (ElementTree) – the element tree from parsing

  • external_binary (Path | str) – path to ‘.idb’ file

  • scan_number (int) – scan number to import

Raises:

ValueError – when vital parameters are missing

Return type:

ImzML

classmethod from_file(path, external_binary=None, use_fast_parse=False)

Create an ImzML object from a file path. If external_binary is None, the imzML path with suffix ‘.ibd’ is used.

Parameters:
  • path (Path | str) – path to imzML file

  • external_binary (Path | str | None) – path to .ibd file

Raises:

FileNotFoundError – if path or external_binary do not exist

Return type:

ImzML

mass_range()

Maximum mass range.

Return type:

tuple[float, float]

Returns:

lowest m/z, highest m/z

untargeted_extraction(num=10, precision_mz=0.1, min_pixel_count=10, min_height_fraction=0.1, min_height_absolute=100.0)

Extracts the num most abundant masses for each spectra. The precision specifies the width to use for grouping similar masses.

Parameters:
  • num (int) – number of peaks per spectra to test

  • precision – number of decimals for grouping m/z

  • min_pixel_count (int) – minimum number of pixels a mass must occour in

  • min_height_fraction (float) – minimum peak height as fraction of maximum image signal

  • min_height_absolute (float) – minimum peak height in counts

Return type:

tuple[ndarray, ndarray]

Returns:

array of (average) masses len N, image of size (Y, X, N)

class pewlib.io.imzml.ParamGroup(id, dtype, compressed=False, external=False)

Stores imzML referenceableParamGroup info.

Generate from an imzML <referenceableParamGroup> using pewlib.io.imzml.ParamGroup.from_xml_element().

Only un-compressed, external data is supported.

Parameters:
  • id (str) – id of the group, e.g. ‘mzArray’, ‘intensities’

  • dtype (type) – type of data referenced, e.g. np.float32

  • compressed (bool) – is the data compressed

  • external (bool) – is data external

classmethod from_xml_element(element)

Generate ParamGroup from an imzML <referenceableParamGroup> element.

Attempts to read the id, dtype, compression and if data is external.

Parameters:

element (Element) – the <referenceableParamGroup> element

Raises:
  • ValueError when id, type not found

  • NotImplementedError – when data is compresssed

Return type:

ParamGroup

class pewlib.io.imzml.ScanSettings(image_size, pixel_size)

Stores imzML scan settings.

Generate from a <scanSettings> imzML element using pewlib.io.imzml.ScanSettings.from_xml_element().

Parameters:
  • image_size (tuple[int, int] | None) – size in pixels (x, y), or None

  • pixel_size (tuple[float, float]) – pixel size in μm (x, y)

classmethod from_xml_element(element)

Generate ScanSettings from an imzML <scanSettings> element.

Attempts to read the image and pixel size from the imzML. Image size is absent in some imzML files.

Parameters:

element (Element) – the <scanSettings> element

Raises:

ValueError when pixel size not found

Return type:

ScanSettings

class pewlib.io.imzml.Spectrum(pos, tic, offsets, lengths)

Stores an imzML spectrum info.

Generate from a <spectrum> imzML element using pewlib.io.imzml.Spectrum.from_xml_element().

Parameters:
  • pos (tuple[int, int]) – pixel pos (x, y)

  • tic (float | None) – optional total-ion-chromatogram value

  • offsets (dict[str, int]) – dict of {pewlib.io.imzml.ParamGroup.id: external data offset in bytes}

  • lengths (dict[str, int]) – dict of {pewlib.io.imzml.ParamGroup.id: external data length in bytes}

classmethod from_xml_element(element, scan_number=1)

Generate Spectrum from an imzML <spectrum> element.

Attempts to read the pos, tic and external data byte offsets and lengths.

Parameters:

element (Element) – the <spectrum> element

Raises:

ValueError when pos, offset or length are found

Return type:

Spectrum

get_binary_data(reference_id, dtype, external_binary=None)

Reads data from external binary.

For faster access keep a BufferedReader active for all Spectrum reads, limiting the number of times the file is opened.

Parameters:
  • reference_id (str) – the pewlib.io.imzml.ParamGroup.id to read.

  • dtype (type) – data type, e.g. np.float32

  • external_binary (Path | BufferedReader | None) – path or file handle to .ibd

Return type:

ndarray

Returns:

array of data

pewlib.io.imzml.fast_parse_imzml(imzml, external_binary, callback=None)

Custom non-xml parser for imzML files.

Faster than etree.ElementTree.parse but less reliable. The current file position is reported at each <spectrum> import via the optional callback function. If callback returns False, the import is cancelled and a UserWarning raised.

Parameters:
  • imzml (Path | str) – path to xml

  • external_binary (Path | str) – path to the .ibd

  • callback (Callable[[int], bool] | None) – optional callback function for progress

Return type:

ImzML

Returns:

ImzML class

Raises:

UserWarning – when callback returns False

pewlib.io.imzml.load(imzml, external_binary, target_masses, mass_width_ppm=10.0)

Load data from an imzML.

Parameters:
  • imzml (Path | str | ImzML) – path to imzML, or pre-parsed tree

  • external_binary (Path | str) – path to binary data, usually .ibd file

  • target_masses (float | ndarray) – masses to import

  • mass_width_ppm (float) – width of imported regions

Return type:

tuple[ndarray, dict]

Returmz:

image data, dict of parameters

Numpy NPZ

Import and export in pew’s custom file format, based on numpy’s compressed ‘.npz’. This format svaes image data, laser parameters and calibrations in one file.

pewlib.io.npz.load(path)

Loads data from ‘.npz’ file.

Loads files created using pewlib.io.npz.save(). On load the a Laser or SRRLaser is reformed from the saved data.

Parameters:

path (str | Path) – path to ‘.npz’

Return type:

Laser

Returns:

Laser or SRRLaser

Raises:

ValueError – incomatible version

See also

numpy.load()

pewlib.io.npz.save(path, laser)

Saves data to ‘.npz’ file.

Converts a Laser or SRRLaser to a series of np.ndarray which are then saved to a compressed ‘.npz’ archive. The time and current version are also saved. If path does not end in ‘.npz’ it is appended.

Parameters:
  • path (str | Path) – path to save to

  • laser (Laser | SRRLaser) – Laser or SRRLaser

Return type:

None

See also

numpy.savez_compressed()

NWI Laser Logs

Synchronisation of laser parameters (ablation times and locations) with signal data. Data should be imported using the other pewlib.io modules then passed with the laser parameters file to these functions.

pewlib.io.laser.guess_delay_from_data(data, times)

Guess delay from laser firing to ICP-MS measurement.

Looks for a change of > 10% in the TIC, up to 1 second into data.

Parameters:
  • data (ndarray) – structured array of signals, flatttend

  • times (ndarray) – array of times, same length as data

Return type:

float

Returns:

delay in ms

pewlib.io.laser.sync_data_nwi_laser_log(data, times, log_file, sequence=None, delay=None, squeeze=False)

Syncs ICP-MS data collected as a single line per raster with the laser log file.

Parameters:
  • data (ndarray) – 1d ICP-MS data

  • times (ndarray | float) – array of times (s) the same size as data, or pixel acquistion time

  • log – log data or path to LaserLog csv

  • sequence (ndarray | int | None) – select raster(s) to import, defaults to all

  • delay (float | None) – delay in s between laser and ICP-MS, default calculates from the TIC

  • squeeze (bool) – remove any rows and columns of all NaNs

Return type:

tuple[ndarray, dict]

PerkinElmer

Import of line-by-line PerkinElmer ELAN ‘XL’ directories.

pewlib.io.perkinelmer.is_valid_directory(path)

Tests if a directory contains PerkinElmer data.

Ensures the path exists, is a directory and contains at least one ‘.xl’ file.

Return type:

bool

pewlib.io.perkinelmer.load(path, import_parameters=True, full=False)

Loads PerkinElmer directory.

Searches the directory path for ‘.xl’ files and used them to reconstruct data. If import_parameters and a ‘parameters.conf’ is used then the scantime, speed and spotsize can be imported.

Parameters:
  • path (str | Path) – path to directory

  • import_parameters (bool) – import params from ‘parameters.conf’

  • full (bool) – also return dict with params

Return type:

ndarray | tuple[ndarray, dict]

Returns:

structured array of data dict of params if full

See also

pewlib.io.perkinelmer.collect_datafiles()

Text Image

Import and export of text-images, files where data is stored as delimited text values. Data is read in order from the first line.

pewlib.io.textimage.load(path, delimiter=None, comments='#', name=None)

Load text-image.

Loads 2d data from file. If delimiter is specified then all tab and ‘;’ are converted to ‘,’ before import. If name is specified then a single field structured array is returned.

Parameters:
  • path (str | Path) – path to file

  • delimiter (str | None) – file delimiter

  • comments (str) – file comment character

  • name (str | None) – return single name field structured array

Return type:

ndarray

pewlib.io.textimage.save(path, data, header='')

Save data to csv.

See numpy.savetxt()

Parameters:
  • path (str | Path) – path to file

  • data (ndarray) – unstructured array

  • header (str) – file header

Return type:

None

VTK

Exports to VTK formats for use in programs such as Paraview.

pewlib.io.vtk.save(path, data, spacing)

Save data as a VTK ImageData XML.

Saves an array to a ‘.vti’ file. Data origin is set to (0, 0) and equally spaced using x, y, z of spacing. If data is rasied to 3-dimensonal if lower.

Parameters:
  • path (str | Path) – path to file

  • data (ndarray) – array

  • spacing (tuple[float, float, float]) – spacing of ‘.vti’

Return type:

None