IO Module
This module contains the functions pew uses for importing and exporting data.
Agilent
Import of line-by-line collected Agilent ‘.b’ batches. Both raw binaries and the ‘.csv’ exports are supported. Tested with Agilent 7500, 7700 and 8900 ICPs.
- pewlib.io.agilent.collect_datafiles(path, methods)
Finds ‘.d’ datafiles in a directory.
A list of expected datafiles is created for each method in methods. Methods are tested in order until ones successfully finds ALL expected datafiles.
- Parameters:
path (
str|Path) – path to directorymethods (
list[str]) – list of methods to try, {‘alphabetical’, ‘acq_method_xml’, ‘batch_csv’, ‘batch_xml’}
- Return type:
list[Path]- Returns:
A list of datafiles
- pewlib.io.agilent.load(path, collection_methods=None, use_acq_for_names=True, counts_per_second=False, drop_names=None, flatten=False)
Imports an Agilent ‘.b’ batch.
First attempts a binary import, falling back to importing any ‘.csv’ files.
- Parameters:
path (
str|Path) – path to batchcollection_methods (
list[str] |None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]use_acq_for_names (
bool) – read element names from ‘AcqMethod.xml’, only for csvcounts_per_second (
bool) – return data in CPS, only for binarydrop_names (
list[str] |None) – names to remove from final arrayflatten (
bool) – flatten into a single line, useful for multi image batches
- Return type:
tuple[ndarray,dict]- Returns:
structured array of data, dict of params
- pewlib.io.agilent.load_binary(path, collection_methods=None, counts_per_second=False, drop_names=None, flatten=False)
Imports an Agilent ‘.b’ batch.
Import is performed using the ‘MSScan.bin’, ‘MSProfile.bin’ binaries and ‘MSTS_XSpecific.xml’ document. By default drop_names drops the ‘Time’ field.
- Parameters:
path (
str|Path) – path to batchcollection_methods (
list[str] |None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]counts_per_second (
bool) – return data in CPSdrop_names (
list[str] |None) – names to remove from final arrayflatten (
bool) – return as a flat array
- Return type:
tuple[ndarray,dict]- Returns:
structured array of data, dict of params
- Raises:
FileNotFoundError – ‘MSScan.bin’, ‘MSProfile.bin’ or ‘MSTS_XSpecific.xml’ not found
IOError – invalid binary format
- pewlib.io.agilent.load_csv(path, collection_methods=None, use_acq_for_names=True, drop_names=None, flatten=False)
Imports an Agilent ‘.b’ batch.
Import is performed using the ‘.csv’ files found in each ‘.d’ datafile. If a ‘.csv’ can not be found then all data in the line is set to 0. To load properly formatted element names use use_acq_for_names. By default drop_names drops the ‘Time_[Sec]’ field.
- Parameters:
path (
str|Path) – path to batchcollection_methods (
list[str] |None) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]use_acq_for_names (
bool) – read element names from ‘AcqMethod.xml’drop_names (
list[str] |None) – names to remove from final arrayflatten (
bool) – return as a flat arrayfull – also return dict with scantime
- Return type:
tuple[ndarray,dict]- Returns:
structured array of data, dict of params
- pewlib.io.agilent.load_info(path)
Reads information from a batch.
Instrument info is read from the first Devices.xml found, batch info from the BatchLog.xml. An empty dictionary is returned if neither file can be read.
- Possible keys:
Acquisition {Date,Name,Path,User} Instrument {Type,Model,Serial,Vendor}
- Parameters:
path (
str|Path) – path to batch- Return type:
dict[str,str]- Returns:
dict
CSV
Import of line-by-line data stored as a series of .csv files.
- class pewlib.io.csv.GenericOption(drop_names=None, kw_genfromtxt=None, regex='.*\\\\.csv', drop_nan_rows=False, drop_nan_columns=False, transposed=False)
Options for instrument specific csv imports.
Options are used by
pewlib.io.csv.load()to filter and sort paths, generate data and read parameters from csvs.- Parameters:
drop_names (
list[str] |None) – columns dropped from importskw_genfromtxt (
dict|None) – kwargs for numpy.genfromtxtregex (
str) – regex string for matching filenames
- filter(paths)
Filter non matching paths.
- Return type:
list[Path]
- readParams(data)
Read parameters from data.
- Return type:
dict
- sort(paths)
Sort paths using ‘sortkey’.
- Return type:
list[Path]
- validForPath(path)
Checks if option is valid for a file or directory.
- Return type:
bool
- class pewlib.io.csv.NuOption
Option for Nu Instruments data.
- readParams(data)
Read parameters from data.
- Return type:
dict
- sortkey(path)
Sorts files numerically.
- Return type:
int
- class pewlib.io.csv.ThermoLDROption
Option for Thermo iCAP LDR data.
- readParams(data)
Read parameters from data.
- Return type:
dict
- sortkey(path)
Sorts files numerically.
- Return type:
int
- class pewlib.io.csv.TofwerkOption
Option for TOFWERK data.
- readParams(data)
Read parameters from data.
- Return type:
dict
- sortkey(path)
Sorts files using the timestamp in name.
- Return type:
float
- pewlib.io.csv.is_valid_directory(path)
Tests if a directory contains at least one csv.
- Return type:
bool
- pewlib.io.csv.load(path, option=None, full=False)
Load a directory where lines are stored in separate .csv files.
Paths are filtered and sorted according to the option used, defaulting to the value of
pewlib.io.csv.option_for_path().- Parameters:
path (
str|Path) – directoryhint – type hint (NuHint, TofwerkHint)
genfromtxtkws – kwargs for numpy.genfromtxt
full (
bool) – also return parameters
- Return type:
ndarray|tuple[ndarray,dict]- Returns:
structured array of data dict of params if full
See also
pewlib.io.csv.GenericOptionnumpy.genfromtxt()
- pewlib.io.csv.option_for_path(path)
Attempts to find the correct type hint for the directory. If no specific type hint is found then a GenericOption.
- Return type:
ImzML
Import of mass-spec imaging data in the imzML format. Each imzML file consists of a xml (‘.imzML’) and external binary (‘.ibd’)
- class pewlib.io.imzml.ImzML(scan_settings, mz_params, intensity_params, spectra, external_binary)
Class for storing relevant data parsed from an imzML file.
To generate from a file use
pewlib.io.imzml.ImzML.from_file().- Parameters:
scan_settings (
ScanSettings) – apewlib.io.imzml.ScanSettingsmz_params (
ParamGroup) – apewlib.io.imzml.ParamGroupfor mz arrayintensity_params (
ParamGroup) – apewlib.io.imzml.ParamGroupfor intensitiesspectra (
list[Spectrum] |dict[tuple[int,int],Spectrum]) – either a list ofpewlib.io.imzml.Spectrumor a dict mapping pixel positions to eachpewlib.io.imzml.Spectrum.external_binary (
Path|str) – path to the ‘.idb’ binary
- binned_masses(mass_width_mz=0.1)
Summed intensities within a certain width.
Bins data across the entire mass range, with a bin width of mass_width_mz.
- Parameters:
mass_width_mz (
float) – width of each bin- Return type:
tuple[ndarray,ndarray]- Returns:
array of bins, binned intensity data (Y, X, N)
- extract_masses(target_masses, mass_width_ppm=None, mass_width_mz=None)
Extracts image of one or more m/z.
Data within +/- 0.5 mass_width_ppm or mass_width_mz is summed.
- Parameters:
target_masses (
ndarray|float) – m/z to extractmass_width_ppm (
float|None) – extraction width in ppmmass_width_mz (
float|None) – extraction width in m/z (Da)
- Return type:
ndarray- Returns:
array of intensities, shape (Y, X, N)
- extract_tic()
The total-ion-chromatogram image.
Extracted from the cvParam MS:1000285 if availble, otherwise the summed intensities.
- Return type:
ndarray- Returns:
image of tic, shape (Y, X)
- classmethod from_etree(et, external_binary, scan_number=1)
Create an ImzML class from a pre-parsed element tree.
- Parameters:
et (
ElementTree) – the element tree from parsingexternal_binary (
Path|str) – path to ‘.idb’ filescan_number (
int) – scan number to import
- Raises:
ValueError – when vital parameters are missing
- Return type:
- classmethod from_file(path, external_binary=None, use_fast_parse=False)
Create an ImzML object from a file path. If external_binary is None, the imzML path with suffix ‘.ibd’ is used.
- Parameters:
path (
Path|str) – path to imzML fileexternal_binary (
Path|str|None) – path to .ibd file
- Raises:
FileNotFoundError – if path or external_binary do not exist
- Return type:
- mass_range()
Maximum mass range.
- Return type:
tuple[float,float]- Returns:
lowest m/z, highest m/z
- untargeted_extraction(num=10, precision_mz=0.1, min_pixel_count=10, min_height_fraction=0.1, min_height_absolute=100.0)
Extracts the num most abundant masses for each spectra. The precision specifies the width to use for grouping similar masses.
- Parameters:
num (
int) – number of peaks per spectra to testprecision – number of decimals for grouping m/z
min_pixel_count (
int) – minimum number of pixels a mass must occour inmin_height_fraction (
float) – minimum peak height as fraction of maximum image signalmin_height_absolute (
float) – minimum peak height in counts
- Return type:
tuple[ndarray,ndarray]- Returns:
array of (average) masses len N, image of size (Y, X, N)
- class pewlib.io.imzml.ParamGroup(id, dtype, compressed=False, external=False)
Stores imzML referenceableParamGroup info.
Generate from an imzML <referenceableParamGroup> using
pewlib.io.imzml.ParamGroup.from_xml_element().Only un-compressed, external data is supported.
- Parameters:
id (
str) – id of the group, e.g. ‘mzArray’, ‘intensities’dtype (
type) – type of data referenced, e.g. np.float32compressed (
bool) – is the data compressedexternal (
bool) – is data external
- classmethod from_xml_element(element)
Generate ParamGroup from an imzML <referenceableParamGroup> element.
Attempts to read the id, dtype, compression and if data is external.
- Parameters:
element (
Element) – the <referenceableParamGroup> element- Raises:
ValueError when id, type not found –
NotImplementedError – when data is compresssed
- Return type:
- class pewlib.io.imzml.ScanSettings(image_size, pixel_size)
Stores imzML scan settings.
Generate from a <scanSettings> imzML element using
pewlib.io.imzml.ScanSettings.from_xml_element().- Parameters:
image_size (
tuple[int,int] |None) – size in pixels (x, y), or Nonepixel_size (
tuple[float,float]) – pixel size in μm (x, y)
- classmethod from_xml_element(element)
Generate ScanSettings from an imzML <scanSettings> element.
Attempts to read the image and pixel size from the imzML. Image size is absent in some imzML files.
- Parameters:
element (
Element) – the <scanSettings> element- Raises:
ValueError when pixel size not found –
- Return type:
- class pewlib.io.imzml.Spectrum(pos, tic, offsets, lengths)
Stores an imzML spectrum info.
Generate from a <spectrum> imzML element using
pewlib.io.imzml.Spectrum.from_xml_element().- Parameters:
pos (
tuple[int,int]) – pixel pos (x, y)tic (
float|None) – optional total-ion-chromatogram valueoffsets (
dict[str,int]) – dict of {pewlib.io.imzml.ParamGroup.id: external data offset in bytes}lengths (
dict[str,int]) – dict of {pewlib.io.imzml.ParamGroup.id: external data length in bytes}
- classmethod from_xml_element(element, scan_number=1)
Generate Spectrum from an imzML <spectrum> element.
Attempts to read the pos, tic and external data byte offsets and lengths.
- Parameters:
element (
Element) – the <spectrum> element- Raises:
ValueError when pos, offset or length are found –
- Return type:
- get_binary_data(reference_id, dtype, external_binary=None)
Reads data from external binary.
For faster access keep a BufferedReader active for all Spectrum reads, limiting the number of times the file is opened.
- Parameters:
reference_id (
str) – thepewlib.io.imzml.ParamGroup.idto read.dtype (
type) – data type, e.g. np.float32external_binary (
Path|BufferedReader|None) – path or file handle to .ibd
- Return type:
ndarray- Returns:
array of data
- pewlib.io.imzml.fast_parse_imzml(imzml, external_binary, callback=None)
Custom non-xml parser for imzML files.
Faster than etree.ElementTree.parse but less reliable. The current file position is reported at each <spectrum> import via the optional
callbackfunction. Ifcallbackreturns False, the import is cancelled and a UserWarning raised.- Parameters:
imzml (
Path|str) – path to xmlexternal_binary (
Path|str) – path to the .ibdcallback (
Callable[[int],bool] |None) – optional callback function for progress
- Return type:
- Returns:
ImzML class
- Raises:
UserWarning – when callback returns False
- pewlib.io.imzml.load(imzml, external_binary, target_masses, mass_width_ppm=10.0)
Load data from an imzML.
- Parameters:
imzml (
Path|str|ImzML) – path to imzML, or pre-parsed treeexternal_binary (
Path|str) – path to binary data, usually .ibd filetarget_masses (
float|ndarray) – masses to importmass_width_ppm (
float) – width of imported regions
- Return type:
tuple[ndarray,dict]
- Returmz:
image data, dict of parameters
Iolite Laser Logs
Synchronisation of laser parameters (ablation times and locations) with signal data. Data should be imported using the other pewlib.io modules then passed with the laser parameters file to these functions.
- pewlib.io.laser.guess_delay_from_data(data, times)
Guess delay from laser firing to ICP-MS measurement.
Looks for a change of > 10% in the TIC, up to 1 second into data.
- Parameters:
data (
ndarray) – structured array of signals, flatttendtimes (
ndarray) – array of times, same length as data
- Return type:
float- Returns:
delay in ms
- pewlib.io.laser.read_iolite_laser_log(log_path, log_style='raw')
Reads an Iolite style log. Different vendors will have slighly different styles of log, so passing ‘log_style’ is reccommended to reduce the log to only laser start and end events. Currently NWL ActiveView2 and Teledyne Chromium2 are supported. Passing ‘raw’ as a style will prevent processing.
- Parameters:
log_path (
Path|str) – path to ioliltelog_style (
str) – style of log (‘activeview2’, ‘chromium2’, ‘raw’)
- Return type:
ndarray- Returns:
log as a numpy array, trimmed to useful lines
- pewlib.io.laser.sync_data_with_laser_log(data, times, log, sequence=None, delay=None, squeeze=False)
Syncs ICP-MS data collected as a single line per raster with the laser log file. Times in the log are modified to start at 0.
- Parameters:
data (
ndarray) – 1d ICP-MS datatimes (
ndarray|float) – array of times (s) the same size asdata, or pixel acquistion timelog (
ndarray) – log data or path to LaserLog csvsequence (
ndarray|int|None) – select raster(s) to import, defaults to alldelay (
float|None) – delay in s between laser and ICP-MS, default calculates from the TICsqueeze (
bool) – remove any rows and columns of all NaNs
- Return type:
tuple[ndarray,dict]
Nu Instruments
Nu Instruments data import.
- pewlib.io.nu.apply_autoblanking(autob_events, signals, masses, num_acc, start_coef, end_coef)
Apply the auto-blanking to the integrated data. There must be one cycle / segment and no missing acquisitions / data!
- Parameters:
autob – list of events from read_nu_autob_binary
signals (
ndarray) – 2d array of signals from get_signals_from_nu_datamasses (
ndarray) – 1d array of masses, from get_masses_from_nu_datanum_acc (
int) – number of accumulations per acquisitionstart_coef (
tuple[float,float]) – blanker open coefs ‘BlMassCalStartCoef’end_coef (
tuple[float,float]) – blanker close coefs ‘BlMassCalEndCoef’
- Return type:
ndarray- Returns:
blanked data
- pewlib.io.nu.apply_trigger_correction(times, corrections)
Return times with trigger time removed.
- Parameters:
times (
ndarray) – times in secondscorrections (
dict) – corrections from TriggerCorrections.dat
- Return type:
ndarray- Returns:
corrected times
- pewlib.io.nu.blanking_regions_from_autob(autob_events, num_acc, start_coef, end_coef)
Extract blanking regions from autoblank data.
- Parameters:
autob – list of events from read_nu_autob_binary
num_acc (
int) – number of accumulations per acquisitionstart_coef (
tuple[float,float]) – blanker open coefs ‘BlMassCalStartCoef’end_coef (
tuple[float,float]) – blanker close coefs ‘BlMassCalEndCoef’
- Return type:
tuple[list[tuple[int,int]],list[ndarray]]- Returns:
list of (start, end) of each region, array of (start, end) masses
- pewlib.io.nu.eventtime_from_info(info)
Reads the dwelltime (total acquistion time) from run.info. Rounds to the nearest ns.
- Parameters:
info (
dict) – dict of parameters, as returned by read_nu_directory- Return type:
float- Returns:
dwelltime in s
- pewlib.io.nu.is_nu_acquisition_directory(path)
Checks path is directory containing a ‘run.info’ and ‘integrated.index’
- Return type:
bool
- pewlib.io.nu.is_nu_image_directory(path)
Checks if directory has a ‘laser.info’ file and some acquistions.
- Return type:
bool
- pewlib.io.nu.masses_from_integ(integ, cal_coef, segment_delays)
Converts Nu peak centers into masses.
- Parameters:
integ (
ndarray) – from read_integ_binarycal_coef (
tuple[float,float]) – from run.info ‘MassCalCoefficients’segment_delays (
dict[int,float]) – dict of segment nums and delays from SegmentInfo
- Return type:
ndarray- Returns:
2d array of masses
- pewlib.io.nu.read_binaries_in_index(root, index, binary_ext, binary_read_fn, cyc_number=None, seg_number=None)
Reads Nu binaries listed in an index file.
- Parameters:
root (
Path) – directory containing files and indexindex (
list[dict]) – list of indices from json.loadsbinary_ext (
str) – extension of binary files, e.g. ‘.integ’binary_read – function to read binary file
cyc_number (
int|None) – restrict to cycleseg_number (
int|None) – restrict to segments
- Return type:
list[ndarray]- Returns:
binary data as a list of arrays
- pewlib.io.nu.read_laser_acquisition(path, autoblank=True, cycle=None, segment=None, raw=False, max_integs=None)
Read the Nu Instruments raw data directory, retuning data and run info.
Directory must contain ‘run.info’, ‘integrated.index’ and at least one ‘.integ’ file. Data is read from ‘.integ’ files listed in the ‘integrated.index’ and are checked for correct starting cycle, segment and acquisition numbers.
- Parameters:
path (
str|Path) – path to data directorymax_integ_files – maximum number of files to read
autoblank (
bool) – apply autoblanking to overrange regionscycle (
int|None) – limit import to cyclesegment (
int|None) – limit import to segmentraw (
bool) – return raw ADC countsmax_integs (
int|None) – only read the first n integ files
- Return type:
tuple[ndarray,ndarray,ndarray,ndarray,dict]- Returns:
signals in counts masses from first acquisition times in s laser pulse data in s dict of parameters from run.info
- pewlib.io.nu.read_laser_image(path)
Read a laser image from a Nu Vitesse ICP-TOF-MS.
Calls
read_laser_acquistionon valid sub directories and concatenates.- Parameters:
path (
Path|str) – path to the Image directory- Return type:
tuple[ndarray,ndarray,ndarray,ndarray,dict]- Returns:
signals masses times pulses laser_info
Numpy NPZ
Import and export in pew’s custom file format, based on numpy’s compressed ‘.npz’. This format svaes image data, laser parameters and calibrations in one file.
- pewlib.io.npz.load(path)
Loads data from ‘.npz’ file.
Loads files created using
pewlib.io.npz.save(). On load the aLaserorSRRLaseris reformed from the saved data.- Parameters:
path (
str|Path) – path to ‘.npz’- Return type:
- Returns:
LaserorSRRLaser- Raises:
ValueError – incomatible version
See also
numpy.load()
- pewlib.io.npz.save(path, laser)
Saves data to ‘.npz’ file.
Converts a
LaserorSRRLaserto a series of np.ndarray which are then saved to a compressed ‘.npz’ archive. The time and current version are also saved. If path does not end in ‘.npz’ it is appended.- Parameters:
path (
str|Path) – path to save tolaser (
Laser|SRRLaser) –LaserorSRRLaser
- Return type:
None
See also
numpy.savez_compressed()
PerkinElmer
Import of line-by-line PerkinElmer ELAN ‘XL’ directories.
- pewlib.io.perkinelmer.is_valid_directory(path)
Tests if a directory contains PerkinElmer data.
Ensures the path exists, is a directory and contains at least one ‘.xl’ file.
- Return type:
bool
- pewlib.io.perkinelmer.load(path, import_parameters=True, full=False)
Loads PerkinElmer directory.
Searches the directory path for ‘.xl’ files and used them to reconstruct data. If import_parameters and a ‘parameters.conf’ is used then the scantime, speed and spotsize can be imported.
- Parameters:
path (
str|Path) – path to directoryimport_parameters (
bool) – import params from ‘parameters.conf’full (
bool) – also return dict with params
- Return type:
ndarray|tuple[ndarray,dict]- Returns:
structured array of data dict of params if full
See also
pewlib.io.perkinelmer.collect_datafiles()
Text Image
Import and export of text-images, files where data is stored as delimited text values. Data is read in order from the first line.
- pewlib.io.textimage.load(path, delimiter=None, comments='#', name=None)
Load text-image.
Loads 2d data from file. If delimiter is specified then all tab and ‘;’ are converted to ‘,’ before import. If name is specified then a single field structured array is returned.
- Parameters:
path (
str|Path) – path to filedelimiter (
str|None) – file delimitercomments (
str) – file comment charactername (
str|None) – return single name field structured array
- Return type:
ndarray
- pewlib.io.textimage.save(path, data, header='')
Save data to csv.
See
numpy.savetxt()- Parameters:
path (
str|Path) – path to filedata (
ndarray) – unstructured arrayheader (
str) – file header
- Return type:
None
VTK
Exports to VTK formats for use in programs such as Paraview.
- pewlib.io.vtk.save(path, data, spacing)
Save data as a VTK ImageData XML.
Saves an array to a ‘.vti’ file. Data origin is set to (0, 0) and equally spaced using x, y, z of spacing. If data is rasied to 3-dimensonal if lower.
- Parameters:
path (
str|Path) – path to filedata (
ndarray) – arrayspacing (
tuple[float,float,float]) – spacing of ‘.vti’
- Return type:
None