IO Module
This module contains the functions pew uses for importing and exporting data.
Agilent
Import of line-by-line collected Agilent ‘.b’ batches. Both raw binaries and the ‘.csv’ exports are supported. Tested with Agilent 7500, 7700 and 8900 ICPs.
- pewlib.io.agilent.collect_datafiles(path, methods)
Finds ‘.d’ datafiles in a directory.
A list of expected datafiles is created for each method in methods. Methods are tested in order until ones successfully finds ALL expected datafiles.
- Parameters:
path (
str
|Path
) – path to directorymethods (
list
[str
]) – list of methods to try, {‘alphabetical’, ‘acq_method_xml’, ‘batch_csv’, ‘batch_xml’}
- Return type:
list
[Path
]- Returns:
A list of datafiles
- pewlib.io.agilent.load(path, collection_methods=None, use_acq_for_names=True, counts_per_second=False, drop_names=None, full=False)
Imports an Agilent ‘.b’ batch.
First attempts a binary import, falling back to importing any ‘.csv’ files.
- Parameters:
path (
str
|Path
) – path to batchcollection_methods (
list
[str
] |None
) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]use_acq_for_names (
bool
) – read element names from ‘AcqMethod.xml’, only for csvcounts_per_second (
bool
) – return data in CPS, only for binarydrop_names (
list
[str
] |None
) – names to remove from final arrayfull (
bool
) – also return dict with scantime
- Return type:
ndarray
|tuple
[ndarray
,dict
]- Returns:
structured array of data dict of params if full
- pewlib.io.agilent.load_binary(path, collection_methods=None, counts_per_second=False, drop_names=None, full=False)
Imports an Agilent ‘.b’ batch.
Import is performed using the ‘MSScan.bin’, ‘MSProfile.bin’ binaries and ‘MSTS_XSpecific.xml’ document. By default drop_names drops the ‘Time’ field.
- Parameters:
path (
str
|Path
) – path to batchcollection_methods (
list
[str
] |None
) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]counts_per_second (
bool
) – return data in CPSdrop_names (
list
[str
] |None
) – names to remove from final arrayfull (
bool
) – also return dict with scantime
- Return type:
ndarray
|tuple
[ndarray
,dict
]- Returns:
structured array of data dict of params if full
- Raises:
FileNotFoundError – ‘MSScan.bin’, ‘MSProfile.bin’ or ‘MSTS_XSpecific.xml’ not found
IOError – invalid binary format
- pewlib.io.agilent.load_csv(path, collection_methods=None, use_acq_for_names=True, drop_names=None, full=False)
Imports an Agilent ‘.b’ batch.
Import is performed using the ‘.csv’ files found in each ‘.d’ datafile. If a ‘.csv’ can not be found then all data in the line is set to 0. To load properly formatted element names use use_acq_for_names. By default drop_names drops the ‘Time_[Sec]’ field.
- Parameters:
path (
str
|Path
) – path to batchcollection_methods (
list
[str
] |None
) – list of datafile collection methods, default = [‘batch_xml’, ‘batch_csv’]use_acq_for_names (
bool
) – read element names from ‘AcqMethod.xml’drop_names (
list
[str
] |None
) – names to remove from final arrayfull (
bool
) – also return dict with scantime
- Return type:
ndarray
|tuple
[ndarray
,dict
]- Returns:
structured array of data dict of params if full
- pewlib.io.agilent.load_info(path)
Reads information from a batch.
Instrument info is read from the first Devices.xml found, batch info from the BatchLog.xml. An empty dictionary is returned if neither file can be read.
- Possible keys:
Acquisition {Date,Name,Path,User} Instrument {Type,Model,Serial,Vendor}
- Parameters:
path (
str
|Path
) – path to batch- Return type:
dict
[str
,str
]- Returns:
dict
CSV
Import of line-by-line data stored as a series of .csv files.
- class pewlib.io.csv.GenericOption(drop_names=None, kw_genfromtxt=None, regex='.*\\\\.csv', drop_nan_rows=False, drop_nan_columns=False, transposed=False)
Options for instrument specific csv imports.
Options are used by
pewlib.io.csv.load()
to filter and sort paths, generate data and read parameters from csvs.- Parameters:
drop_names (
list
[str
] |None
) – columns dropped from importskw_genfromtxt (
dict
|None
) – kwargs for numpy.genfromtxtregex (
str
) – regex string for matching filenames
- filter(paths)
Filter non matching paths.
- Return type:
list
[Path
]
- readParams(data)
Read parameters from data.
- Return type:
dict
- sort(paths)
Sort paths using ‘sortkey’.
- Return type:
list
[Path
]
- validForPath(path)
Checks if option is valid for a file or directory.
- Return type:
bool
- class pewlib.io.csv.NuOption
Option for Nu Instruments data.
- readParams(data)
Read parameters from data.
- Return type:
dict
- sortkey(path)
Sorts files numerically.
- Return type:
int
- class pewlib.io.csv.ThermoLDROption
Option for Thermo iCAP LDR data.
- readParams(data)
Read parameters from data.
- Return type:
dict
- sortkey(path)
Sorts files numerically.
- Return type:
int
- class pewlib.io.csv.TofwerkOption
Option for TOFWERK data.
- readParams(data)
Read parameters from data.
- Return type:
dict
- sortkey(path)
Sorts files using the timestamp in name.
- Return type:
float
- pewlib.io.csv.is_valid_directory(path)
Tests if a directory contains at least one csv.
- Return type:
bool
- pewlib.io.csv.load(path, option=None, full=False)
Load a directory where lines are stored in separate .csv files.
Paths are filtered and sorted according to the option used, defaulting to the value of
pewlib.io.csv.option_for_path()
.- Parameters:
path (
str
|Path
) – directoryhint – type hint (NuHint, TofwerkHint)
genfromtxtkws – kwargs for numpy.genfromtxt
full (
bool
) – also return parameters
- Return type:
ndarray
|tuple
[ndarray
,dict
]- Returns:
structured array of data dict of params if full
See also
pewlib.io.csv.GenericOption
numpy.genfromtxt()
- pewlib.io.csv.option_for_path(path)
Attempts to find the correct type hint for the directory. If no specific type hint is found then a GenericOption.
- Return type:
ImzML
Import of mass-spec imaging data in the imzML format. Each imzML file consists of a xml (‘.imzML’) and external binary (‘.ibd’)
- class pewlib.io.imzml.ImzML(scan_settings, mz_params, intensity_params, spectra, external_binary)
Class for storing relevant data parsed from an imzML file.
To generate from a file use
pewlib.io.imzml.ImzML.from_file()
.- Parameters:
scan_settings (
ScanSettings
) – apewlib.io.imzml.ScanSettings
mz_params (
ParamGroup
) – apewlib.io.imzml.ParamGroup
for mz arrayintensity_params (
ParamGroup
) – apewlib.io.imzml.ParamGroup
for intensitiesspectra (
list
[Spectrum
] |dict
[tuple
[int
,int
],Spectrum
]) – either a list ofpewlib.io.imzml.Spectrum
or a dict mapping pixel positions to eachpewlib.io.imzml.Spectrum
.external_binary (
Path
|str
) – path to the ‘.idb’ binary
- binned_masses(mass_width_mz=0.1)
Summed intensities within a certain width.
Bins data across the entire mass range, with a bin width of mass_width_mz.
- Parameters:
mass_width_mz (
float
) – width of each bin- Return type:
tuple
[ndarray
,ndarray
]- Returns:
array of bins, binned intensity data (Y, X, N)
- extract_masses(target_masses, mass_width_ppm=None, mass_width_mz=None)
Extracts image of one or more m/z.
Data within +/- 0.5 mass_width_ppm or mass_width_mz is summed.
- Parameters:
target_masses (
ndarray
|float
) – m/z to extractmass_width_ppm (
float
|None
) – extraction width in ppmmass_width_mz (
float
|None
) – extraction width in m/z (Da)
- Return type:
ndarray
- Returns:
array of intensities, shape (Y, X, N)
- extract_tic()
The total-ion-chromatogram image.
Extracted from the cvParam MS:1000285 if availble, otherwise the summed intensities.
- Return type:
ndarray
- Returns:
image of tic, shape (Y, X)
- classmethod from_etree(et, external_binary, scan_number=1)
Create an ImzML class from a pre-parsed element tree.
- Parameters:
et (
ElementTree
) – the element tree from parsingexternal_binary (
Path
|str
) – path to ‘.idb’ filescan_number (
int
) – scan number to import
- Raises:
ValueError – when vital parameters are missing
- Return type:
- classmethod from_file(path, external_binary=None, use_fast_parse=False)
Create an ImzML object from a file path. If external_binary is None, the imzML path with suffix ‘.ibd’ is used.
- Parameters:
path (
Path
|str
) – path to imzML fileexternal_binary (
Path
|str
|None
) – path to .ibd file
- Raises:
FileNotFoundError – if path or external_binary do not exist
- Return type:
- mass_range()
Maximum mass range.
- Return type:
tuple
[float
,float
]- Returns:
lowest m/z, highest m/z
- untargeted_extraction(num=10, precision_mz=0.1, min_pixel_count=10, min_height_fraction=0.1, min_height_absolute=100.0)
Extracts the num most abundant masses for each spectra. The precision specifies the width to use for grouping similar masses.
- Parameters:
num (
int
) – number of peaks per spectra to testprecision – number of decimals for grouping m/z
min_pixel_count (
int
) – minimum number of pixels a mass must occour inmin_height_fraction (
float
) – minimum peak height as fraction of maximum image signalmin_height_absolute (
float
) – minimum peak height in counts
- Return type:
tuple
[ndarray
,ndarray
]- Returns:
array of (average) masses len N, image of size (Y, X, N)
- class pewlib.io.imzml.ParamGroup(id, dtype, compressed=False, external=False)
Stores imzML referenceableParamGroup info.
Generate from an imzML <referenceableParamGroup> using
pewlib.io.imzml.ParamGroup.from_xml_element()
.Only un-compressed, external data is supported.
- Parameters:
id (
str
) – id of the group, e.g. ‘mzArray’, ‘intensities’dtype (
type
) – type of data referenced, e.g. np.float32compressed (
bool
) – is the data compressedexternal (
bool
) – is data external
- classmethod from_xml_element(element)
Generate ParamGroup from an imzML <referenceableParamGroup> element.
Attempts to read the id, dtype, compression and if data is external.
- Parameters:
element (
Element
) – the <referenceableParamGroup> element- Raises:
ValueError when id, type not found –
NotImplementedError – when data is compresssed
- Return type:
- class pewlib.io.imzml.ScanSettings(image_size, pixel_size)
Stores imzML scan settings.
Generate from a <scanSettings> imzML element using
pewlib.io.imzml.ScanSettings.from_xml_element()
.- Parameters:
image_size (
tuple
[int
,int
] |None
) – size in pixels (x, y), or Nonepixel_size (
tuple
[float
,float
]) – pixel size in μm (x, y)
- classmethod from_xml_element(element)
Generate ScanSettings from an imzML <scanSettings> element.
Attempts to read the image and pixel size from the imzML. Image size is absent in some imzML files.
- Parameters:
element (
Element
) – the <scanSettings> element- Raises:
ValueError when pixel size not found –
- Return type:
- class pewlib.io.imzml.Spectrum(pos, tic, offsets, lengths)
Stores an imzML spectrum info.
Generate from a <spectrum> imzML element using
pewlib.io.imzml.Spectrum.from_xml_element()
.- Parameters:
pos (
tuple
[int
,int
]) – pixel pos (x, y)tic (
float
|None
) – optional total-ion-chromatogram valueoffsets (
dict
[str
,int
]) – dict of {pewlib.io.imzml.ParamGroup.id
: external data offset in bytes}lengths (
dict
[str
,int
]) – dict of {pewlib.io.imzml.ParamGroup.id
: external data length in bytes}
- classmethod from_xml_element(element, scan_number=1)
Generate Spectrum from an imzML <spectrum> element.
Attempts to read the pos, tic and external data byte offsets and lengths.
- Parameters:
element (
Element
) – the <spectrum> element- Raises:
ValueError when pos, offset or length are found –
- Return type:
- get_binary_data(reference_id, dtype, external_binary=None)
Reads data from external binary.
For faster access keep a BufferedReader active for all Spectrum reads, limiting the number of times the file is opened.
- Parameters:
reference_id (
str
) – thepewlib.io.imzml.ParamGroup.id
to read.dtype (
type
) – data type, e.g. np.float32external_binary (
Path
|BufferedReader
|None
) – path or file handle to .ibd
- Return type:
ndarray
- Returns:
array of data
- pewlib.io.imzml.fast_parse_imzml(imzml, external_binary, callback=None)
Custom non-xml parser for imzML files.
Faster than etree.ElementTree.parse but less reliable. The current file position is reported at each <spectrum> import via the optional
callback
function. Ifcallback
returns False, the import is cancelled and a UserWarning raised.- Parameters:
imzml (
Path
|str
) – path to xmlexternal_binary (
Path
|str
) – path to the .ibdcallback (
Callable
[[int
],bool
] |None
) – optional callback function for progress
- Return type:
- Returns:
ImzML class
- Raises:
UserWarning – when callback returns False
- pewlib.io.imzml.load(imzml, external_binary, target_masses, mass_width_ppm=10.0)
Load data from an imzML.
- Parameters:
imzml (
Path
|str
|ImzML
) – path to imzML, or pre-parsed treeexternal_binary (
Path
|str
) – path to binary data, usually .ibd filetarget_masses (
float
|ndarray
) – masses to importmass_width_ppm (
float
) – width of imported regions
- Return type:
tuple
[ndarray
,dict
]
- Returmz:
image data, dict of parameters
Numpy NPZ
Import and export in pew’s custom file format, based on numpy’s compressed ‘.npz’. This format svaes image data, laser parameters and calibrations in one file.
- pewlib.io.npz.load(path)
Loads data from ‘.npz’ file.
Loads files created using
pewlib.io.npz.save()
. On load the aLaser
orSRRLaser
is reformed from the saved data.- Parameters:
path (
str
|Path
) – path to ‘.npz’- Return type:
- Returns:
Laser
orSRRLaser
- Raises:
ValueError – incomatible version
See also
numpy.load()
- pewlib.io.npz.save(path, laser)
Saves data to ‘.npz’ file.
Converts a
Laser
orSRRLaser
to a series of np.ndarray which are then saved to a compressed ‘.npz’ archive. The time and current version are also saved. If path does not end in ‘.npz’ it is appended.- Parameters:
path (
str
|Path
) – path to save tolaser (
Laser
|SRRLaser
) –Laser
orSRRLaser
- Return type:
None
See also
numpy.savez_compressed()
NWI Laser Logs
Synchronisation of laser parameters (ablation times and locations) with signal data. Data should be imported using the other pewlib.io modules then passed with the laser parameters file to these functions.
- pewlib.io.laser.guess_delay_from_data(data, times)
Guess delay from laser firing to ICP-MS measurement.
Looks for a change of > 10% in the TIC, up to 1 second into data.
- Parameters:
data (
ndarray
) – structured array of signals, flatttendtimes (
ndarray
) – array of times, same length as data
- Return type:
float
- Returns:
delay in ms
- pewlib.io.laser.sync_data_nwi_laser_log(data, times, log_file, sequence=None, delay=None, squeeze=False)
Syncs ICP-MS data collected as a single line per raster with the laser log file.
- Parameters:
data (
ndarray
) – 1d ICP-MS datatimes (
ndarray
|float
) – array of times (s) the same size asdata
, or pixel acquistion timelog – log data or path to LaserLog csv
sequence (
ndarray
|int
|None
) – select raster(s) to import, defaults to alldelay (
float
|None
) – delay in s between laser and ICP-MS, default calculates from the TICsqueeze (
bool
) – remove any rows and columns of all NaNs
- Return type:
tuple
[ndarray
,dict
]
PerkinElmer
Import of line-by-line PerkinElmer ELAN ‘XL’ directories.
- pewlib.io.perkinelmer.is_valid_directory(path)
Tests if a directory contains PerkinElmer data.
Ensures the path exists, is a directory and contains at least one ‘.xl’ file.
- Return type:
bool
- pewlib.io.perkinelmer.load(path, import_parameters=True, full=False)
Loads PerkinElmer directory.
Searches the directory path for ‘.xl’ files and used them to reconstruct data. If import_parameters and a ‘parameters.conf’ is used then the scantime, speed and spotsize can be imported.
- Parameters:
path (
str
|Path
) – path to directoryimport_parameters (
bool
) – import params from ‘parameters.conf’full (
bool
) – also return dict with params
- Return type:
ndarray
|tuple
[ndarray
,dict
]- Returns:
structured array of data dict of params if full
See also
pewlib.io.perkinelmer.collect_datafiles()
Text Image
Import and export of text-images, files where data is stored as delimited text values. Data is read in order from the first line.
- pewlib.io.textimage.load(path, delimiter=None, comments='#', name=None)
Load text-image.
Loads 2d data from file. If delimiter is specified then all tab and ‘;’ are converted to ‘,’ before import. If name is specified then a single field structured array is returned.
- Parameters:
path (
str
|Path
) – path to filedelimiter (
str
|None
) – file delimitercomments (
str
) – file comment charactername (
str
|None
) – return single name field structured array
- Return type:
ndarray
- pewlib.io.textimage.save(path, data, header='')
Save data to csv.
See
numpy.savetxt()
- Parameters:
path (
str
|Path
) – path to filedata (
ndarray
) – unstructured arrayheader (
str
) – file header
- Return type:
None
VTK
Exports to VTK formats for use in programs such as Paraview.
- pewlib.io.vtk.save(path, data, spacing)
Save data as a VTK ImageData XML.
Saves an array to a ‘.vti’ file. Data origin is set to (0, 0) and equally spaced using x, y, z of spacing. If data is rasied to 3-dimensonal if lower.
- Parameters:
path (
str
|Path
) – path to filedata (
ndarray
) – arrayspacing (
tuple
[float
,float
,float
]) – spacing of ‘.vti’
- Return type:
None