ml4chem.data package
Submodules
ml4chem.data.handler module
ml4chem.data.parser module
ml4chem.data.preprocessing module
- class ml4chem.data.preprocessing.Preprocessing(preprocessor, purpose)[source]
- Bases: - object- A wrap for preprocessing data with sklearn - This intends to be a wrapper around sklearn. The idea is to make easier to preprocess data without too much burden to users. - Parameters:
- preprocessor (tuple) – Tuple with structure: (‘name’, {kwargs}). 
- purpose (str) – Supported purposes are : ‘training’, ‘inference’. 
 
 - Notes - The list of preprocessing modules available on sklearn and options can be found at: - https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing - If you need a preprocessor that is not implemented yet, just create a bug report or follow the structure shown below to implement it yourself (PR are very welcomed). In principle, all preprocessors can be implemented. - fit(stacked_features, scheduler)[source]
- Fit features - Parameters:
- stacked_features (list) – List of stacked features. 
- scheduler (str) – What is the scheduler to be used in dask. 
 
- Returns:
- scaled_features – Scaled features using requested preprocessor. 
- Return type:
- list 
 
 - save_to_file(preprocessor, path)[source]
- Save the preprocessor object to file - Parameters:
- preprocessor (obj) – Preprocessing object 
- path (str) – Path to save .prep file. 
 
 
 
ml4chem.data.serialization module
- ml4chem.data.serialization.dump(data, filename='data.db')[source]
- Serialize data - This function allows to dump data and ML4Chem dictionaries serialized with msgpack, or torch (depending on the models). - Parameters:
- data (dict or array) – A dictionary or array containting data to be saved to file using msgpack. 
- filename (str) – Name of file to save in disk.