ml4chem.atomistic.features package
Submodules
ml4chem.atomistic.features.aev module
- class ml4chem.atomistic.features.aev.AEV(cutoff=6.5, cutofffxn=None, normalized=True, preprocessor=('MinMaxScaler', None), custom=None, save_preprocessor='ml4chem', scheduler='distributed', filename='features.db', overwrite=True, angular_type='G3', weighted=False, batch_size=None)[source]
Bases:
Gaussian
Atomic environment vector
This class build atomic environment vectors as shown in the ANI-1 potentials.
- Parameters:
cutoff (float) – Cutoff radius used for computing features.
cutofffxn (object) – A Cutoff function object.
normalized (bool) – Set it to true if the features are being normalized with respect to the cutoff radius.
preprocessor (str) – Use some scaling method to preprocess the data. Default MinMaxScaler.
custom (dict, opt) –
Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:
>>> custom = {'G2': {'etas': etas, 'Rs': rs}, 'G3': {'etas': a_etas, 'zetas': zetas, 'thetas': thetas, 'Rs': rs}}
save_preprocessor (str) – Save preprocessor to file.
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
angular_type (str) – Compute “G3” or “G4” angular symmetry functions.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
batch_size (int) – Number of data points per batch to use for training. Default is None.
References
1. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
- NAME = 'AEV'
- get_atomic_features
Delayed class method to compute atomic features
- Parameters:
atom (object) – An ASE atom object.
image (ase object, list) – List of atoms in an image.
index (int) – Index of atom in atoms object.
symbol (str) – Chemical symbol of atom in atoms object.
n_symbols (ndarray of str) – Array of neighbors’ symbols.
neighborpositions (ndarray of float) – Array of Cartesian atomic positions.
image_molecule (ase object, list) – List of atoms in an image.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- get_symmetry_functions(type, symbols, etas=None, zetas=None, Rs=None, Rs_a=None, thetas=None)[source]
Get requested symmetry functions
- Parameters:
type (str) – The desired symmetry function: ‘G2’, ‘G3’, or ‘G4’.
symbols (list) – List of chemical symbols.
etas (list) – List of etas to build the Gaussian function.
zetas (list) – List of zetas to build the Gaussian function.
Rs (list) – List to shift the center of the gaussian distributions.
Rs_a (list) – List to shift the center of the gaussian distributions of angular symmetry functions.
thetas (list) – Number of shifts in the angular environment.
- ml4chem.atomistic.features.aev.calculate_G2(n_numbers, neighborsymbols, neighborpositions, center_symbol, eta, Rs, cutoff, cutofffxn, Ri, image_molecule=None, n_indices=None, normalized=True, weighted=False)[source]
Calculate G2 symmetry function.
These correspond to 2 body, or radial interactions.
- Parameters:
n_symbols (list of int) – List of neighbors’ chemical numbers.
neighborsymbols (list of str) – List of symbols of all neighbor atoms.
neighborpositions (list of list of floats) – List of Cartesian atomic positions.
center_symbol (str) – Chemical symbol of the center atom.
eta (float) – Parameter of Gaussian symmetry functions.
Rs (float) – Parameter to shift the center of the peak.
cutoff (float) – Cutoff radius.
cutofffxn (object) – Cutoff function.
Ri (list) – Position of the center atom. Should be fed as a list of three floats.
normalized (bool) – Whether or not the symmetry function is normalized.
image_molecule (ase object, list) – List of atoms in an image.
n_indices (list) – List of indices of neighboring atoms from the image object.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- Returns:
feature – Radial feature.
- Return type:
float
- ml4chem.atomistic.features.aev.calculate_G4(n_numbers, neighborsymbols, neighborpositions, G_elements, theta, zeta, eta, Rs, cutoff, cutofffxn, Ri, normalized=True, image_molecule=None, n_indices=None, weighted=False)[source]
Calculate G4 symmetry function.
These are 3 body or angular interactions.
- Parameters:
n_symbols (list of int) – List of neighbors’ chemical numbers.
neighborsymbols (list of str) – List of symbols of neighboring atoms.
neighborpositions (list of list of floats) – List of Cartesian atomic positions of neighboring atoms.
G_elements (list of str) – A list of two members, each member is the chemical species of one of the neighboring atoms forming the triangle with the center atom.
theta (float) – Parameter of Gaussian symmetry functions.
zeta (float) – Parameter of Gaussian symmetry functions.
eta (float) – Parameter of Gaussian symmetry functions.
Rs (float) – Parameter to shift the center of the peak.
cutoff (float) – Cutoff radius.
cutofffxn (object) – Cutoff function.
Ri (list) – Position of the center atom. Should be fed as a list of three floats.
normalized (bool) – Whether or not the symmetry function is normalized.
image_molecule (ase object, list) – List of atoms in an image.
n_indices (list) – List of indices of neighboring atoms from the image object.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- Returns:
feature – G4 feature value.
- Return type:
float
Notes
The difference between the calculate_G3 and the calculate_G4 function is that calculate_G4 accounts for bond angles of 180 degrees.
ml4chem.atomistic.features.autoencoders module
- class ml4chem.atomistic.features.autoencoders.LatentFeatures(encoder=None, scheduler='distributed', filename='latent.db', preprocessor=None, features=None, save_preprocessor='latentfeatures.scaler')[source]
Bases:
AtomisticFeatures
Extraction of features using AutoEncoder model class.
The latent space represents a feature space from the inputs that an AutoEncoder model finds relevant about the underlying structure of the data. This class takes images in ASE format and returns them converted in a latent feature vector using the encoder layer of an AutoEncoder model already hashed to be used by ML4Chem. It also allows interoperability with the Potentials() class.
- Parameters:
encoder (dict) –
- Dictionary with structure:
>>> encoder = {'model': file.ml4c, 'params': file.params}
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Name to save on disk of serialized database.
preprocessor (tuple) – Use some scaling method to preprocess the data.
features (tuple) – Users can set the features keyword argument to a tuple with the structure (‘Name’, {kwargs})
save_preprocessor (str) – Save preprocessor to file.
- NAME = 'LatentFeatures'
- calculate(images, purpose='training', data=None, svm=False)[source]
Return features per atom in an atoms object
- Parameters:
images (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
- Return type:
dict
- load_encoder(encoder, **kwargs)[source]
Load an autoencoder in eval() mode
- Parameters:
encoder (dict) –
Dictionary with structure:
>>> encoder = {'model': file.ml4c, 'params': file.params}
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
autoencoder.eval() – Autoencoder model object in eval mode to get the latent space.
- Return type:
obj
ml4chem.atomistic.features.base module
- class ml4chem.atomistic.features.base.AtomisticFeatures(**kwargs)[source]
Bases:
ABC
- restack_atom(image_index, atom, scaled_feature_space)[source]
Restack atoms to a raveled list to use with SVM
- Parameters:
image_index (int) – Index of original hashed image.
atom (object) – An atom object.
scaled_feature_space (np.array) – A numpy array with the scaled features
- Returns:
symbol, features – The hashed key image and its corresponding features.
- Return type:
tuple
- restack_image(index, image, scaled_feature_space, svm)[source]
Restack images to correct dictionary’s structure to train
- Parameters:
index (int) – Index of original hashed image.
image (obj) – An ASE image object.
scaled_feature_space (np.array) – A numpy array with scaled features.
- Returns:
hash, features – Hash of image and its corresponding features.
- Return type:
tuple
ml4chem.atomistic.features.cartesian module
- class ml4chem.atomistic.features.cartesian.Cartesian(scheduler='distributed', filename='cartesians.db', preprocessor=('Normalizer', None), save_preprocessor='ml4chem', overwrite=True)[source]
Bases:
AtomisticFeatures
Cartesian Coordinates
Cartesian coordinates are features, too (not very useful ones though). This class takes images in ASE format and return them hashed to be used by ML4Chem.
- Parameters:
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Name to save on disk of serialized database.
preprocessor (tuple) – Use some scaling method to preprocess the data. Default Normalizer.
save_preprocessor (str) – Save preprocessor to file.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
- NAME = 'Cartesian'
- calculate(images=None, purpose='training', data=None, svm=False)[source]
Return features per atom in an atoms objects
- Parameters:
image (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
- Return type:
dict
- get_atomic_features
Delayed class method to get atomic features
- Parameters:
atom (object) – An ASE atom object.
svm (bool) – Is this SVM?
- restack_atom
Restack atoms to a raveled list to use with SVM
- Parameters:
image_index (int) – Index of original hashed image.
atom (object) – An atom object.
scaled_feature_space (np.array) – A numpy array with the scaled features
- Returns:
symbol, features – The hashed key image and its corresponding features.
- Return type:
tuple
- restack_image
Restack images to correct dictionary’s structure to train
- Parameters:
index (int) – Index of original hashed image.
image (obj) – An ASE image object.
scaled_feature_space (np.array) – A numpy array with the scaled features
- Returns:
key, features – The hashed key image and its corresponding features.
- Return type:
tuple
ml4chem.atomistic.features.coulombmatrix module
- class ml4chem.atomistic.features.coulombmatrix.CoulombMatrix(preprocessor=None, batch_size=None, filename='features.db', scheduler='distributed', save_preprocessor='ml4chem', overwrite=True, **kwargs)[source]
Bases:
AtomisticFeatures
,CoulombMatrix
Coulomb Matrix features
- Parameters:
filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.
preprocessor (str) – Use some scaling method to preprocess the data. Default None.
batch_size (int) – Number of data points per batch to use for training. Default is None.
scheduler (str) – The scheduler to be used with the dask backend.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
save_preprocessor (str) – Save preprocessor to file.
Notes
This class computes Coulomb matrix features using the dscribe module. As mentioned in ML4Chem’s paper, we avoid duplication of efforts and this module serves as a demonstration.
- NAME = 'CoulombMatrix'
- calculate(images=None, purpose='training', data=None, svm=False)[source]
Calculate the features per atom in an atoms objects
- Parameters:
image (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
reference_space (dict) – A reference space useful for SVM models.
ml4chem.atomistic.features.cutoff module
ml4chem.atomistic.features.gaussian module
- class ml4chem.atomistic.features.gaussian.Gaussian(cutoff=6.5, cutofffxn=None, normalized=True, preprocessor=('MinMaxScaler', None), custom=None, save_preprocessor='ml4chem', scheduler='distributed', filename='features.db', overwrite=True, angular_type='G3', weighted=False, batch_size=None)[source]
Bases:
AtomisticFeatures
Behler-Parrinello symmetry functions
This class builds local chemical environments for atoms based on the Behler-Parrinello Gaussian type symmetry functions. It is modular enough that can be used just for creating feature spaces.
- Parameters:
cutoff (float) – Cutoff radius used for computing features.
cutofffxn (object) – A Cutoff function object.
normalized (bool) – Set it to true if the features are being normalized with respect to the cutoff radius.
preprocessor (str) – Use some scaling method to preprocess the data. Default MinMaxScaler.
custom (dict, opt) –
Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:
>>> custom = {'G2': {'etas': etas}, 'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
save_preprocessor (str) – Save preprocessor to file.
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
angular_type (str) – Compute “G3” or “G4” angular symmetry functions.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
batch_size (int) – Number of data points per batch to use for training. Default is None.
References
Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
Gastegger, M., Schwiedrzik, L., Bittermann, M., Berzsenyi, F. & Marquetand, P. wACSF—Weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys. 148, 241709 (2018).
- NAME = 'Gaussian'
- calculate(images=None, purpose='training', data=None, svm=False, GP=None)[source]
Calculate the features per atom in an atoms objects
- Parameters:
image (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
reference_space (dict) – A reference space useful for SVM models.
- get_atomic_features
Delayed class method to compute atomic features
- Parameters:
atom (object) – An ASE atom object.
image (ase object, list) – List of atoms in an image.
index (int) – Index of atom in atoms object.
symbol (str) – Chemical symbol of atom in atoms object.
n_symbols (ndarray of str) – Array of neighbors’ symbols.
neighborpositions (ndarray of float) – Array of Cartesian atomic positions.
image_molecule (ase object, list) – List of atoms in an image.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- get_symmetry_functions(type, symbols, etas=None, zetas=None, gammas=None)[source]
Get requested symmetry functions
- Parameters:
type (str) – The desired symmetry function: ‘G2’, ‘G3’, or ‘G4’.
symbols (list) – List of chemical symbols.
etas (list) – List of etas to build the Gaussian function.
zetas (list) – List of zetas to build the Gaussian function.
gammas (list) – List of gammas to build the Gaussian function.
- make_symmetry_functions(symbols, custom=None, angular_type='G3')[source]
Function to make symmetry functions
This method needs at least unique symbols and defaults set to true. Parameters
>>> symbols = ['H', 'O']
- customdict, opt
Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:
>>> custom = {'G2': {'etas': etas}, 'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
- angular_typestr
Compute “G3” or “G4” angular symmetry functions.
- Returns:
GP – Symmetry function parameters.
- Return type:
dict
- ml4chem.atomistic.features.gaussian.calculate_G2(n_numbers, neighborsymbols, neighborpositions, center_symbol, eta, cutoff, cutofffxn, Ri, image_molecule=None, n_indices=None, normalized=True, weighted=False)[source]
Calculate G2 symmetry function.
These correspond to 2 body, or radial interactions.
- Parameters:
n_symbols (list of int) – List of neighbors’ chemical numbers.
neighborsymbols (list of str) – List of symbols of all neighbor atoms.
neighborpositions (list of list of floats) – List of Cartesian atomic positions.
center_symbol (str) – Chemical symbol of the center atom.
eta (float) – Parameter of Gaussian symmetry functions.
cutoff (float) – Cutoff radius.
cutofffxn (object) – Cutoff function.
Ri (list) – Position of the center atom. Should be fed as a list of three floats.
normalized (bool) – Whether or not the symmetry function is normalized.
image_molecule (ase object, list) – List of atoms in an image.
n_indices (list) – List of indices of neighboring atoms from the image object.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- Returns:
feature – Radial feature.
- Return type:
float
- ml4chem.atomistic.features.gaussian.calculate_G3(n_numbers, neighborsymbols, neighborpositions, G_elements, gamma, zeta, eta, cutoff, cutofffxn, Ri, normalized=True, image_molecule=None, n_indices=None, weighted=False)[source]
Calculate G3 symmetry function.
These are 3 body or angular interactions.
- Parameters:
n_symbols (list of int) – List of neighbors’ chemical numbers.
neighborsymbols (list of str) – List of symbols of neighboring atoms.
neighborpositions (list of list of floats) – List of Cartesian atomic positions of neighboring atoms.
G_elements (list of str) – A list of two members, each member is the chemical species of one of the neighboring atoms forming the triangle with the center atom.
gamma (float) – Parameter of Gaussian symmetry functions.
zeta (float) – Parameter of Gaussian symmetry functions.
eta (float) – Parameter of Gaussian symmetry functions.
cutoff (float) – Cutoff radius.
cutofffxn (object) – Cutoff function.
Ri (list) – Position of the center atom. Should be fed as a list of three floats.
normalized (bool) – Whether or not the symmetry function is normalized.
image_molecule (ase object, list) – List of atoms in an image.
n_indices (list) – List of indices of neighboring atoms from the image object.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- Returns:
feature – G3 feature value.
- Return type:
float
- ml4chem.atomistic.features.gaussian.calculate_G4(n_numbers, neighborsymbols, neighborpositions, G_elements, gamma, zeta, eta, cutoff, cutofffxn, Ri, normalized=True, image_molecule=None, n_indices=None, weighted=False)[source]
Calculate G4 symmetry function.
These are 3 body or angular interactions.
- Parameters:
n_symbols (list of int) – List of neighbors’ chemical numbers.
neighborsymbols (list of str) – List of symbols of neighboring atoms.
neighborpositions (list of list of floats) – List of Cartesian atomic positions of neighboring atoms.
G_elements (list of str) – A list of two members, each member is the chemical species of one of the neighboring atoms forming the triangle with the center atom.
gamma (float) – Parameter of Gaussian symmetry functions.
zeta (float) – Parameter of Gaussian symmetry functions.
eta (float) – Parameter of Gaussian symmetry functions.
cutoff (float) – Cutoff radius.
cutofffxn (object) – Cutoff function.
Ri (list) – Position of the center atom. Should be fed as a list of three floats.
normalized (bool) – Whether or not the symmetry function is normalized.
image_molecule (ase object, list) – List of atoms in an image.
n_indices (list) – List of indices of neighboring atoms from the image object.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- Returns:
feature – G4 feature value.
- Return type:
float
Notes
The difference between the calculate_G3 and the calculate_G4 function is that calculate_G4 accounts for bond angles of 180 degrees.
- ml4chem.atomistic.features.gaussian.weighted_h(image_molecule, n_indices)[source]
Calculate the atomic numbers of neighboring atoms for a molecule, then multiplies each neighor atomic number by each other.
- Parameters:
image_molecule (ase object, list) – List of atoms in an image.
n_indices (list) – List of indices of neighboring atoms from the image object.
Module contents
- class ml4chem.atomistic.features.Cartesian(scheduler='distributed', filename='cartesians.db', preprocessor=('Normalizer', None), save_preprocessor='ml4chem', overwrite=True)[source]
Bases:
AtomisticFeatures
Cartesian Coordinates
Cartesian coordinates are features, too (not very useful ones though). This class takes images in ASE format and return them hashed to be used by ML4Chem.
- Parameters:
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Name to save on disk of serialized database.
preprocessor (tuple) – Use some scaling method to preprocess the data. Default Normalizer.
save_preprocessor (str) – Save preprocessor to file.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
- NAME = 'Cartesian'
- calculate(images=None, purpose='training', data=None, svm=False)[source]
Return features per atom in an atoms objects
- Parameters:
image (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
- Return type:
dict
- get_atomic_features
Delayed class method to get atomic features
- Parameters:
atom (object) – An ASE atom object.
svm (bool) – Is this SVM?
- restack_atom
Restack atoms to a raveled list to use with SVM
- Parameters:
image_index (int) – Index of original hashed image.
atom (object) – An atom object.
scaled_feature_space (np.array) – A numpy array with the scaled features
- Returns:
symbol, features – The hashed key image and its corresponding features.
- Return type:
tuple
- restack_image
Restack images to correct dictionary’s structure to train
- Parameters:
index (int) – Index of original hashed image.
image (obj) – An ASE image object.
scaled_feature_space (np.array) – A numpy array with the scaled features
- Returns:
key, features – The hashed key image and its corresponding features.
- Return type:
tuple
- class ml4chem.atomistic.features.CoulombMatrix(preprocessor=None, batch_size=None, filename='features.db', scheduler='distributed', save_preprocessor='ml4chem', overwrite=True, **kwargs)[source]
Bases:
AtomisticFeatures
,CoulombMatrix
Coulomb Matrix features
- Parameters:
filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.
preprocessor (str) – Use some scaling method to preprocess the data. Default None.
batch_size (int) – Number of data points per batch to use for training. Default is None.
scheduler (str) – The scheduler to be used with the dask backend.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
save_preprocessor (str) – Save preprocessor to file.
Notes
This class computes Coulomb matrix features using the dscribe module. As mentioned in ML4Chem’s paper, we avoid duplication of efforts and this module serves as a demonstration.
- NAME = 'CoulombMatrix'
- calculate(images=None, purpose='training', data=None, svm=False)[source]
Calculate the features per atom in an atoms objects
- Parameters:
image (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
reference_space (dict) – A reference space useful for SVM models.
- class ml4chem.atomistic.features.Gaussian(cutoff=6.5, cutofffxn=None, normalized=True, preprocessor=('MinMaxScaler', None), custom=None, save_preprocessor='ml4chem', scheduler='distributed', filename='features.db', overwrite=True, angular_type='G3', weighted=False, batch_size=None)[source]
Bases:
AtomisticFeatures
Behler-Parrinello symmetry functions
This class builds local chemical environments for atoms based on the Behler-Parrinello Gaussian type symmetry functions. It is modular enough that can be used just for creating feature spaces.
- Parameters:
cutoff (float) – Cutoff radius used for computing features.
cutofffxn (object) – A Cutoff function object.
normalized (bool) – Set it to true if the features are being normalized with respect to the cutoff radius.
preprocessor (str) – Use some scaling method to preprocess the data. Default MinMaxScaler.
custom (dict, opt) –
Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:
>>> custom = {'G2': {'etas': etas}, 'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
save_preprocessor (str) – Save preprocessor to file.
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.
overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.
angular_type (str) – Compute “G3” or “G4” angular symmetry functions.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
batch_size (int) – Number of data points per batch to use for training. Default is None.
References
Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
Gastegger, M., Schwiedrzik, L., Bittermann, M., Berzsenyi, F. & Marquetand, P. wACSF—Weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys. 148, 241709 (2018).
- NAME = 'Gaussian'
- calculate(images=None, purpose='training', data=None, svm=False, GP=None)[source]
Calculate the features per atom in an atoms objects
- Parameters:
image (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
reference_space (dict) – A reference space useful for SVM models.
- get_atomic_features
Delayed class method to compute atomic features
- Parameters:
atom (object) – An ASE atom object.
image (ase object, list) – List of atoms in an image.
index (int) – Index of atom in atoms object.
symbol (str) – Chemical symbol of atom in atoms object.
n_symbols (ndarray of str) – Array of neighbors’ symbols.
neighborpositions (ndarray of float) – Array of Cartesian atomic positions.
image_molecule (ase object, list) – List of atoms in an image.
weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.
- get_symmetry_functions(type, symbols, etas=None, zetas=None, gammas=None)[source]
Get requested symmetry functions
- Parameters:
type (str) – The desired symmetry function: ‘G2’, ‘G3’, or ‘G4’.
symbols (list) – List of chemical symbols.
etas (list) – List of etas to build the Gaussian function.
zetas (list) – List of zetas to build the Gaussian function.
gammas (list) – List of gammas to build the Gaussian function.
- make_symmetry_functions(symbols, custom=None, angular_type='G3')[source]
Function to make symmetry functions
This method needs at least unique symbols and defaults set to true. Parameters
>>> symbols = ['H', 'O']
- customdict, opt
Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:
>>> custom = {'G2': {'etas': etas}, 'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
- angular_typestr
Compute “G3” or “G4” angular symmetry functions.
- Returns:
GP – Symmetry function parameters.
- Return type:
dict
- class ml4chem.atomistic.features.LatentFeatures(encoder=None, scheduler='distributed', filename='latent.db', preprocessor=None, features=None, save_preprocessor='latentfeatures.scaler')[source]
Bases:
AtomisticFeatures
Extraction of features using AutoEncoder model class.
The latent space represents a feature space from the inputs that an AutoEncoder model finds relevant about the underlying structure of the data. This class takes images in ASE format and returns them converted in a latent feature vector using the encoder layer of an AutoEncoder model already hashed to be used by ML4Chem. It also allows interoperability with the Potentials() class.
- Parameters:
encoder (dict) –
- Dictionary with structure:
>>> encoder = {'model': file.ml4c, 'params': file.params}
scheduler (str) – The scheduler to be used with the dask backend.
filename (str) – Name to save on disk of serialized database.
preprocessor (tuple) – Use some scaling method to preprocess the data.
features (tuple) – Users can set the features keyword argument to a tuple with the structure (‘Name’, {kwargs})
save_preprocessor (str) – Save preprocessor to file.
- NAME = 'LatentFeatures'
- calculate(images, purpose='training', data=None, svm=False)[source]
Return features per atom in an atoms object
- Parameters:
images (dict) – Hashed images using the Data class.
purpose (str) – The supported purposes are: ‘training’, ‘inference’.
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}
- Return type:
dict
- load_encoder(encoder, **kwargs)[source]
Load an autoencoder in eval() mode
- Parameters:
encoder (dict) –
Dictionary with structure:
>>> encoder = {'model': file.ml4c, 'params': file.params}
data (obj) – data object
svm (bool) – Whether or not these features are going to be used for kernel methods.
- Returns:
autoencoder.eval() – Autoencoder model object in eval mode to get the latent space.
- Return type:
obj