This is the command pymvpa2-preproc that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
pymvpa2-preproc - apply preprocessing steps to a PyMVPA dataset
SYNOPSIS
pymvpa2 preproc [--version] [-h] -i DATASET [DATASET ...] [--chunks CHUNKS_ATTR] [--strip-
invariant-features] [--poly-detrend DEG] [--detrend-chunks CHUNKS_ATTR] [--detrend-coords
COORDS_ATTR] [--detrend-regrs ATTR [ATTR ...]] [--filter-passband FREQ [FREQ ...]]
[--filter-stopband FREQ [FREQ ...]] [--sampling-rate FREQ] [--filter-passloss dB]
[--filter-stopattenuation dB] [--zscore] [--zscore-chunks CHUNKS_ATTR] [--zscore-params
PARAM PARAM] -o OUTPUT [--hdf5-compression TYPE]
DESCRIPTION
Preprocess a PyMVPA dataset.
This command can apply a number of preprocessing steps to a dataset. Currently supported
are
1. Polynomial de-trending
2. Spectral filtering
3. Feature-wise Z-scoring
All preprocessing steps are applied in the above order. If a different order is required,
preprocessing has to be split into two separate command calls.
POLYNOMIAL DE-TRENDING
This type of de-trending can be used to regress out arbitrary signals. In addition to
polynomials of any degree arbitrary timecourses stored as sample attributes in a dataset
can be used as confound regressors. This detrending functionality is, in contrast to the
implementation of spectral filtering, also applicable to sparse-sampled data with
potentially irregular inter-sample intervals.
SPECTRAL FILTERING
Several option are provided that are used to construct a Butterworth low-, high-, or
band-pass filter. It is advised to inspect the filtered data carefully as inappropriate
filter settings can lead to unintented side-effect. Only dataset with a fixed sampling
rate are supported. The sampling rate must be provided.
OPTIONS
--version
show program's version and license information and exit
-h, --help, --help-np
show this help message and exit. --help-np forcefully disables the use of a pager
for displaying the help.
-i DATASET [DATASET ...], --input DATASET [DATASET ...]
path(s) to one or more PyMVPA dataset files. All datasets will be merged into a
single dataset (vstack'ed) in order of specification. In some cases this option may
need to be specified more than once if multiple, but separate, input datasets are
required.
Common options for all preprocessing:
--chunks CHUNKS_ATTR
shortcut option to enabled uniform chunkwise processing for all relevant
preprocessing steps (see --zscore-chunks, --detrend-chunks). This global setting
can be overwritten by additionally specifying the corresponding individual "chunk"
options.
--strip-invariant-features
After all pre-processing steps are done, strip all invariant features from the
dataset.
Options for data detrending:
--poly-detrend DEG
Order of the Legendre polynomial to remove from the data. This will remove every
polynomial up to and including the provided value. For example, 3 will remove 0th,
1st, 2nd, and 3rd order polynomials from the data. np.B.: The 0th polynomial is the
baseline shift, the 1st is the linear trend. If you specify a single int and the
`chunks_attr` parameter is not None, then this value is used for each chunk. You
can also specify a different polyord value for each chunk by providing a list or
ndarray of polyord values with the length equal to the number of chunks.
Constraints: value must be convertible to type 'int'. [Default: 1]
--detrend-chunks CHUNKS_ATTR
If None, the whole dataset is detrended at once. Otherwise, the given samples
attribute (given by its name) is used to define chunks of the dataset that are
processed individually. In that case, all the samples within a chunk should be in
contiguous order and the chunks should be sorted in order from low to high --
unless the dataset provides information about the coordinate of each sample in the
space that should be spanned be the polynomials (see `space` argument).
Constraints: value must be `None`, or value must be a string. [Default: None]
--detrend-coords COORDS_ATTR
name of a samples attribute that is added to the preprocessed dataset storing the
coordinates of each sample in the space spanned by the polynomials. If an attribute
of such name is already present in the dataset its values are interpreted as sample
coordinates in the space spanned by the polynomials. This can be used to detrend
datasets with irregular sample spacing.
--detrend-regrs ATTR [ATTR ...]
List of sample attribute names that should be used as additional regressors. An
example use would be to regress out motion parameters. Constraints: value must be
`None`, or value must be convertible to list(str). [Default: None]
Options for spectral filtering:
--filter-passband FREQ [FREQ ...]
critical frequencies of a Butterworth filter's pass band. Critical frequencies need
to match the unit of the specified sampling rate (see: --sampling-rate). In case of
a band pass filter low and high frequency cutoffs need to be specified (in this
order). For low and high-pass filters is single cutoff frequency must be provided.
The type of filter (low/high-pass) is determined from the relation to the stop band
frequency (--filter-stopband).
--filter-stopband FREQ [FREQ ...]
Analog setting to --filter-passband for specifying the filter's stop band.
--sampling-rate FREQ
sampling rate of the dataset. All frequency specifications need to match the unit
of the sampling rate.
--filter-passloss dB
maximum loss in the passband (dB). Default: 1 dB
--filter-stopattenuation dB
minimum attenuation in the stopband (dB). Default: 30 dB
Options for data normalization:
--zscore
perform feature normalization by Z-scoring.
--zscore-chunks CHUNKS_ATTR
name of a dataset sample attribute defining chunks of samples that shall be
Z-scored independently. By default no chunk-wise normalization is done.
--zscore-params PARAM PARAM
define a fixed parameter set (mean, std) for Z-scoring, instead of computing from
actual data.
Output options:
-o OUTPUT, --output OUTPUT
output filename ('.hdf5' extension is added automatically if necessary). NOTE: The
output format is suitable for data exchange between PyMVPA commands, but is not
recommended for long-term storage or exchange as its specific content may vary
depending on the actual software environment. For long-term storage consider
conversion into other data formats (see 'dump' command).
--hdf5-compression TYPE
compression type for HDF5 storage. Available values depend on the specific HDF5
installation. Typical values are: 'gzip', 'lzf', 'szip', or integers from 1 to 9
indicating gzip compression levels.
EXAMPLES
Normalize all features in a dataset by Z-scoring
$ pymvpa2 preproc --zscore -o ds_preprocessed -i dataset.hdf5
Perform Z-scoring and quadratic detrending of all features, but process all samples
sharing a unique value of the "chunks" sample attribute individually
$ pymvpa2 preproc --chunks "chunks" --poly-detrend 2 --zscore -o ds_pp2 -i ds.hdf5
Use pymvpa2-preproc online using onworks.net services