ml4chem.data package
Submodules
ml4chem.data.handler module
ml4chem.data.parser module
ml4chem.data.preprocessing module
- class ml4chem.data.preprocessing.Preprocessing(preprocessor, purpose)[source]
Bases:
object
A wrap for preprocessing data with sklearn
This intends to be a wrapper around sklearn. The idea is to make easier to preprocess data without too much burden to users.
- Parameters:
preprocessor (tuple) – Tuple with structure: (‘name’, {kwargs}).
purpose (str) – Supported purposes are : ‘training’, ‘inference’.
Notes
The list of preprocessing modules available on sklearn and options can be found at:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing
If you need a preprocessor that is not implemented yet, just create a bug report or follow the structure shown below to implement it yourself (PR are very welcomed). In principle, all preprocessors can be implemented.
- fit(stacked_features, scheduler)[source]
Fit features
- Parameters:
stacked_features (list) – List of stacked features.
scheduler (str) – What is the scheduler to be used in dask.
- Returns:
scaled_features – Scaled features using requested preprocessor.
- Return type:
list
- save_to_file(preprocessor, path)[source]
Save the preprocessor object to file
- Parameters:
preprocessor (obj) – Preprocessing object
path (str) – Path to save .prep file.
ml4chem.data.serialization module
- ml4chem.data.serialization.dump(data, filename='data.db')[source]
Serialize data
This function allows to dump data and ML4Chem dictionaries serialized with msgpack, or torch (depending on the models).
- Parameters:
data (dict or array) – A dictionary or array containting data to be saved to file using msgpack.
filename (str) – Name of file to save in disk.