ml4chem.atomistic.features package

Submodules

ml4chem.atomistic.features.aev module

class ml4chem.atomistic.features.aev.AEV(cutoff=6.5, cutofffxn=None, normalized=True, preprocessor=('MinMaxScaler', None), custom=None, save_preprocessor='ml4chem', scheduler='distributed', filename='features.db', overwrite=True, angular_type='G3', weighted=False, batch_size=None)[source]

Bases: Gaussian

Atomic environment vector

This class build atomic environment vectors as shown in the ANI-1 potentials.

Parameters:
  • cutoff (float) – Cutoff radius used for computing features.

  • cutofffxn (object) – A Cutoff function object.

  • normalized (bool) – Set it to true if the features are being normalized with respect to the cutoff radius.

  • preprocessor (str) – Use some scaling method to preprocess the data. Default MinMaxScaler.

  • custom (dict, opt) –

    Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:

    >>> custom = {'G2': {'etas': etas, 'Rs': rs},
                  'G3': {'etas': a_etas, 'zetas': zetas, 'thetas': thetas, 'Rs': rs}}
    

  • save_preprocessor (str) – Save preprocessor to file.

  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

  • angular_type (str) – Compute “G3” or “G4” angular symmetry functions.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

  • batch_size (int) – Number of data points per batch to use for training. Default is None.

References

1. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

NAME = 'AEV'
get_atomic_features

Delayed class method to compute atomic features

Parameters:
  • atom (object) – An ASE atom object.

  • image (ase object, list) – List of atoms in an image.

  • index (int) – Index of atom in atoms object.

  • symbol (str) – Chemical symbol of atom in atoms object.

  • n_symbols (ndarray of str) – Array of neighbors’ symbols.

  • neighborpositions (ndarray of float) – Array of Cartesian atomic positions.

  • image_molecule (ase object, list) – List of atoms in an image.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

get_symmetry_functions(type, symbols, etas=None, zetas=None, Rs=None, Rs_a=None, thetas=None)[source]

Get requested symmetry functions

Parameters:
  • type (str) – The desired symmetry function: ‘G2’, ‘G3’, or ‘G4’.

  • symbols (list) – List of chemical symbols.

  • etas (list) – List of etas to build the Gaussian function.

  • zetas (list) – List of zetas to build the Gaussian function.

  • Rs (list) – List to shift the center of the gaussian distributions.

  • Rs_a (list) – List to shift the center of the gaussian distributions of angular symmetry functions.

  • thetas (list) – Number of shifts in the angular environment.

classmethod name()[source]

Returns name of class

print_features_params(GP)[source]

Print features parameters

ml4chem.atomistic.features.aev.calculate_G2(n_numbers, neighborsymbols, neighborpositions, center_symbol, eta, Rs, cutoff, cutofffxn, Ri, image_molecule=None, n_indices=None, normalized=True, weighted=False)[source]

Calculate G2 symmetry function.

These correspond to 2 body, or radial interactions.

Parameters:
  • n_symbols (list of int) – List of neighbors’ chemical numbers.

  • neighborsymbols (list of str) – List of symbols of all neighbor atoms.

  • neighborpositions (list of list of floats) – List of Cartesian atomic positions.

  • center_symbol (str) – Chemical symbol of the center atom.

  • eta (float) – Parameter of Gaussian symmetry functions.

  • Rs (float) – Parameter to shift the center of the peak.

  • cutoff (float) – Cutoff radius.

  • cutofffxn (object) – Cutoff function.

  • Ri (list) – Position of the center atom. Should be fed as a list of three floats.

  • normalized (bool) – Whether or not the symmetry function is normalized.

  • image_molecule (ase object, list) – List of atoms in an image.

  • n_indices (list) – List of indices of neighboring atoms from the image object.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

Returns:

feature – Radial feature.

Return type:

float

ml4chem.atomistic.features.aev.calculate_G4(n_numbers, neighborsymbols, neighborpositions, G_elements, theta, zeta, eta, Rs, cutoff, cutofffxn, Ri, normalized=True, image_molecule=None, n_indices=None, weighted=False)[source]

Calculate G4 symmetry function.

These are 3 body or angular interactions.

Parameters:
  • n_symbols (list of int) – List of neighbors’ chemical numbers.

  • neighborsymbols (list of str) – List of symbols of neighboring atoms.

  • neighborpositions (list of list of floats) – List of Cartesian atomic positions of neighboring atoms.

  • G_elements (list of str) – A list of two members, each member is the chemical species of one of the neighboring atoms forming the triangle with the center atom.

  • theta (float) – Parameter of Gaussian symmetry functions.

  • zeta (float) – Parameter of Gaussian symmetry functions.

  • eta (float) – Parameter of Gaussian symmetry functions.

  • Rs (float) – Parameter to shift the center of the peak.

  • cutoff (float) – Cutoff radius.

  • cutofffxn (object) – Cutoff function.

  • Ri (list) – Position of the center atom. Should be fed as a list of three floats.

  • normalized (bool) – Whether or not the symmetry function is normalized.

  • image_molecule (ase object, list) – List of atoms in an image.

  • n_indices (list) – List of indices of neighboring atoms from the image object.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

Returns:

feature – G4 feature value.

Return type:

float

Notes

The difference between the calculate_G3 and the calculate_G4 function is that calculate_G4 accounts for bond angles of 180 degrees.

ml4chem.atomistic.features.autoencoders module

class ml4chem.atomistic.features.autoencoders.LatentFeatures(encoder=None, scheduler='distributed', filename='latent.db', preprocessor=None, features=None, save_preprocessor='latentfeatures.scaler')[source]

Bases: AtomisticFeatures

Extraction of features using AutoEncoder model class.

The latent space represents a feature space from the inputs that an AutoEncoder model finds relevant about the underlying structure of the data. This class takes images in ASE format and returns them converted in a latent feature vector using the encoder layer of an AutoEncoder model already hashed to be used by ML4Chem. It also allows interoperability with the Potentials() class.

Parameters:
  • encoder (dict) –

    Dictionary with structure:
    >>> encoder = {'model': file.ml4c, 'params': file.params}
    

  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Name to save on disk of serialized database.

  • preprocessor (tuple) – Use some scaling method to preprocess the data.

  • features (tuple) – Users can set the features keyword argument to a tuple with the structure (‘Name’, {kwargs})

  • save_preprocessor (str) – Save preprocessor to file.

NAME = 'LatentFeatures'
calculate(images, purpose='training', data=None, svm=False)[source]

Return features per atom in an atoms object

Parameters:
  • images (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

Return type:

dict

load_encoder(encoder, **kwargs)[source]

Load an autoencoder in eval() mode

Parameters:
  • encoder (dict) –

    Dictionary with structure:

    >>> encoder = {'model': file.ml4c, 'params': file.params}
    

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

autoencoder.eval() – Autoencoder model object in eval mode to get the latent space.

Return type:

obj

classmethod name()[source]

Returns name of class

to_pandas()[source]

Convert features to pandas DataFrame

ml4chem.atomistic.features.base module

class ml4chem.atomistic.features.base.AtomisticFeatures(**kwargs)[source]

Bases: ABC

abstract calculate(**kwargs)[source]

Calculate features

abstract name()[source]

Return name of the class

restack_atom(image_index, atom, scaled_feature_space)[source]

Restack atoms to a raveled list to use with SVM

Parameters:
  • image_index (int) – Index of original hashed image.

  • atom (object) – An atom object.

  • scaled_feature_space (np.array) – A numpy array with the scaled features

Returns:

symbol, features – The hashed key image and its corresponding features.

Return type:

tuple

restack_image(index, image, scaled_feature_space, svm)[source]

Restack images to correct dictionary’s structure to train

Parameters:
  • index (int) – Index of original hashed image.

  • image (obj) – An ASE image object.

  • scaled_feature_space (np.array) – A numpy array with scaled features.

Returns:

hash, features – Hash of image and its corresponding features.

Return type:

tuple

abstract to_pandas()[source]

Convert features to pandas DataFrame

ml4chem.atomistic.features.cartesian module

class ml4chem.atomistic.features.cartesian.Cartesian(scheduler='distributed', filename='cartesians.db', preprocessor=('Normalizer', None), save_preprocessor='ml4chem', overwrite=True)[source]

Bases: AtomisticFeatures

Cartesian Coordinates

Cartesian coordinates are features, too (not very useful ones though). This class takes images in ASE format and return them hashed to be used by ML4Chem.

Parameters:
  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Name to save on disk of serialized database.

  • preprocessor (tuple) – Use some scaling method to preprocess the data. Default Normalizer.

  • save_preprocessor (str) – Save preprocessor to file.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

NAME = 'Cartesian'
calculate(images=None, purpose='training', data=None, svm=False)[source]

Return features per atom in an atoms objects

Parameters:
  • image (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

Return type:

dict

get_atomic_features

Delayed class method to get atomic features

Parameters:
  • atom (object) – An ASE atom object.

  • svm (bool) – Is this SVM?

classmethod name()[source]

Returns name of class

restack_atom

Restack atoms to a raveled list to use with SVM

Parameters:
  • image_index (int) – Index of original hashed image.

  • atom (object) – An atom object.

  • scaled_feature_space (np.array) – A numpy array with the scaled features

Returns:

symbol, features – The hashed key image and its corresponding features.

Return type:

tuple

restack_image

Restack images to correct dictionary’s structure to train

Parameters:
  • index (int) – Index of original hashed image.

  • image (obj) – An ASE image object.

  • scaled_feature_space (np.array) – A numpy array with the scaled features

Returns:

key, features – The hashed key image and its corresponding features.

Return type:

tuple

to_pandas()[source]

Convert features to pandas DataFrame

ml4chem.atomistic.features.coulombmatrix module

class ml4chem.atomistic.features.coulombmatrix.CoulombMatrix(preprocessor=None, batch_size=None, filename='features.db', scheduler='distributed', save_preprocessor='ml4chem', overwrite=True, **kwargs)[source]

Bases: AtomisticFeatures, CoulombMatrix

Coulomb Matrix features

Parameters:
  • filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.

  • preprocessor (str) – Use some scaling method to preprocess the data. Default None.

  • batch_size (int) – Number of data points per batch to use for training. Default is None.

  • scheduler (str) – The scheduler to be used with the dask backend.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

  • save_preprocessor (str) – Save preprocessor to file.

Notes

This class computes Coulomb matrix features using the dscribe module. As mentioned in ML4Chem’s paper, we avoid duplication of efforts and this module serves as a demonstration.

NAME = 'CoulombMatrix'
calculate(images=None, purpose='training', data=None, svm=False)[source]

Calculate the features per atom in an atoms objects

Parameters:
  • image (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

  • feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

  • reference_space (dict) – A reference space useful for SVM models.

classmethod name()[source]

Returns name of class

stack_features(symbols, image_index, stacked_features)[source]

Stack features

to_pandas()[source]

Convert features to pandas DataFrame

ml4chem.atomistic.features.cutoff module

class ml4chem.atomistic.features.cutoff.Cosine(cutoff)[source]

Bases: object

Cosine cutoff function

Parameters:

cutoff (float) – The cutoff radius.

ml4chem.atomistic.features.gaussian module

class ml4chem.atomistic.features.gaussian.Gaussian(cutoff=6.5, cutofffxn=None, normalized=True, preprocessor=('MinMaxScaler', None), custom=None, save_preprocessor='ml4chem', scheduler='distributed', filename='features.db', overwrite=True, angular_type='G3', weighted=False, batch_size=None)[source]

Bases: AtomisticFeatures

Behler-Parrinello symmetry functions

This class builds local chemical environments for atoms based on the Behler-Parrinello Gaussian type symmetry functions. It is modular enough that can be used just for creating feature spaces.

Parameters:
  • cutoff (float) – Cutoff radius used for computing features.

  • cutofffxn (object) – A Cutoff function object.

  • normalized (bool) – Set it to true if the features are being normalized with respect to the cutoff radius.

  • preprocessor (str) – Use some scaling method to preprocess the data. Default MinMaxScaler.

  • custom (dict, opt) –

    Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:

    >>> custom = {'G2': {'etas': etas},
                  'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
    

  • save_preprocessor (str) – Save preprocessor to file.

  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

  • angular_type (str) – Compute “G3” or “G4” angular symmetry functions.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

  • batch_size (int) – Number of data points per batch to use for training. Default is None.

References

  1. Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).

  2. Gastegger, M., Schwiedrzik, L., Bittermann, M., Berzsenyi, F. & Marquetand, P. wACSF—Weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys. 148, 241709 (2018).

NAME = 'Gaussian'
calculate(images=None, purpose='training', data=None, svm=False, GP=None)[source]

Calculate the features per atom in an atoms objects

Parameters:
  • image (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

  • feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

  • reference_space (dict) – A reference space useful for SVM models.

get_atomic_features

Delayed class method to compute atomic features

Parameters:
  • atom (object) – An ASE atom object.

  • image (ase object, list) – List of atoms in an image.

  • index (int) – Index of atom in atoms object.

  • symbol (str) – Chemical symbol of atom in atoms object.

  • n_symbols (ndarray of str) – Array of neighbors’ symbols.

  • neighborpositions (ndarray of float) – Array of Cartesian atomic positions.

  • image_molecule (ase object, list) – List of atoms in an image.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

get_symmetry_functions(type, symbols, etas=None, zetas=None, gammas=None)[source]

Get requested symmetry functions

Parameters:
  • type (str) – The desired symmetry function: ‘G2’, ‘G3’, or ‘G4’.

  • symbols (list) – List of chemical symbols.

  • etas (list) – List of etas to build the Gaussian function.

  • zetas (list) – List of zetas to build the Gaussian function.

  • gammas (list) – List of gammas to build the Gaussian function.

make_symmetry_functions(symbols, custom=None, angular_type='G3')[source]

Function to make symmetry functions

This method needs at least unique symbols and defaults set to true. Parameters

>>> symbols = ['H', 'O']
customdict, opt

Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:

>>> custom = {'G2': {'etas': etas},
              'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
angular_typestr

Compute “G3” or “G4” angular symmetry functions.

Returns:

GP – Symmetry function parameters.

Return type:

dict

classmethod name()[source]

Returns name of class

print_features_params(GP)[source]

Print features parameters

stack_features(indices, stacked_features)[source]

Stack features

to_pandas()[source]

Convert features to pandas DataFrame

ml4chem.atomistic.features.gaussian.calculate_G2(n_numbers, neighborsymbols, neighborpositions, center_symbol, eta, cutoff, cutofffxn, Ri, image_molecule=None, n_indices=None, normalized=True, weighted=False)[source]

Calculate G2 symmetry function.

These correspond to 2 body, or radial interactions.

Parameters:
  • n_symbols (list of int) – List of neighbors’ chemical numbers.

  • neighborsymbols (list of str) – List of symbols of all neighbor atoms.

  • neighborpositions (list of list of floats) – List of Cartesian atomic positions.

  • center_symbol (str) – Chemical symbol of the center atom.

  • eta (float) – Parameter of Gaussian symmetry functions.

  • cutoff (float) – Cutoff radius.

  • cutofffxn (object) – Cutoff function.

  • Ri (list) – Position of the center atom. Should be fed as a list of three floats.

  • normalized (bool) – Whether or not the symmetry function is normalized.

  • image_molecule (ase object, list) – List of atoms in an image.

  • n_indices (list) – List of indices of neighboring atoms from the image object.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

Returns:

feature – Radial feature.

Return type:

float

ml4chem.atomistic.features.gaussian.calculate_G3(n_numbers, neighborsymbols, neighborpositions, G_elements, gamma, zeta, eta, cutoff, cutofffxn, Ri, normalized=True, image_molecule=None, n_indices=None, weighted=False)[source]

Calculate G3 symmetry function.

These are 3 body or angular interactions.

Parameters:
  • n_symbols (list of int) – List of neighbors’ chemical numbers.

  • neighborsymbols (list of str) – List of symbols of neighboring atoms.

  • neighborpositions (list of list of floats) – List of Cartesian atomic positions of neighboring atoms.

  • G_elements (list of str) – A list of two members, each member is the chemical species of one of the neighboring atoms forming the triangle with the center atom.

  • gamma (float) – Parameter of Gaussian symmetry functions.

  • zeta (float) – Parameter of Gaussian symmetry functions.

  • eta (float) – Parameter of Gaussian symmetry functions.

  • cutoff (float) – Cutoff radius.

  • cutofffxn (object) – Cutoff function.

  • Ri (list) – Position of the center atom. Should be fed as a list of three floats.

  • normalized (bool) – Whether or not the symmetry function is normalized.

  • image_molecule (ase object, list) – List of atoms in an image.

  • n_indices (list) – List of indices of neighboring atoms from the image object.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

Returns:

feature – G3 feature value.

Return type:

float

ml4chem.atomistic.features.gaussian.calculate_G4(n_numbers, neighborsymbols, neighborpositions, G_elements, gamma, zeta, eta, cutoff, cutofffxn, Ri, normalized=True, image_molecule=None, n_indices=None, weighted=False)[source]

Calculate G4 symmetry function.

These are 3 body or angular interactions.

Parameters:
  • n_symbols (list of int) – List of neighbors’ chemical numbers.

  • neighborsymbols (list of str) – List of symbols of neighboring atoms.

  • neighborpositions (list of list of floats) – List of Cartesian atomic positions of neighboring atoms.

  • G_elements (list of str) – A list of two members, each member is the chemical species of one of the neighboring atoms forming the triangle with the center atom.

  • gamma (float) – Parameter of Gaussian symmetry functions.

  • zeta (float) – Parameter of Gaussian symmetry functions.

  • eta (float) – Parameter of Gaussian symmetry functions.

  • cutoff (float) – Cutoff radius.

  • cutofffxn (object) – Cutoff function.

  • Ri (list) – Position of the center atom. Should be fed as a list of three floats.

  • normalized (bool) – Whether or not the symmetry function is normalized.

  • image_molecule (ase object, list) – List of atoms in an image.

  • n_indices (list) – List of indices of neighboring atoms from the image object.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

Returns:

feature – G4 feature value.

Return type:

float

Notes

The difference between the calculate_G3 and the calculate_G4 function is that calculate_G4 accounts for bond angles of 180 degrees.

ml4chem.atomistic.features.gaussian.weighted_h(image_molecule, n_indices)[source]

Calculate the atomic numbers of neighboring atoms for a molecule, then multiplies each neighor atomic number by each other.

Parameters:
  • image_molecule (ase object, list) – List of atoms in an image.

  • n_indices (list) – List of indices of neighboring atoms from the image object.

Module contents

class ml4chem.atomistic.features.Cartesian(scheduler='distributed', filename='cartesians.db', preprocessor=('Normalizer', None), save_preprocessor='ml4chem', overwrite=True)[source]

Bases: AtomisticFeatures

Cartesian Coordinates

Cartesian coordinates are features, too (not very useful ones though). This class takes images in ASE format and return them hashed to be used by ML4Chem.

Parameters:
  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Name to save on disk of serialized database.

  • preprocessor (tuple) – Use some scaling method to preprocess the data. Default Normalizer.

  • save_preprocessor (str) – Save preprocessor to file.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

NAME = 'Cartesian'
calculate(images=None, purpose='training', data=None, svm=False)[source]

Return features per atom in an atoms objects

Parameters:
  • image (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

Return type:

dict

get_atomic_features

Delayed class method to get atomic features

Parameters:
  • atom (object) – An ASE atom object.

  • svm (bool) – Is this SVM?

classmethod name()[source]

Returns name of class

restack_atom

Restack atoms to a raveled list to use with SVM

Parameters:
  • image_index (int) – Index of original hashed image.

  • atom (object) – An atom object.

  • scaled_feature_space (np.array) – A numpy array with the scaled features

Returns:

symbol, features – The hashed key image and its corresponding features.

Return type:

tuple

restack_image

Restack images to correct dictionary’s structure to train

Parameters:
  • index (int) – Index of original hashed image.

  • image (obj) – An ASE image object.

  • scaled_feature_space (np.array) – A numpy array with the scaled features

Returns:

key, features – The hashed key image and its corresponding features.

Return type:

tuple

to_pandas()[source]

Convert features to pandas DataFrame

class ml4chem.atomistic.features.CoulombMatrix(preprocessor=None, batch_size=None, filename='features.db', scheduler='distributed', save_preprocessor='ml4chem', overwrite=True, **kwargs)[source]

Bases: AtomisticFeatures, CoulombMatrix

Coulomb Matrix features

Parameters:
  • filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.

  • preprocessor (str) – Use some scaling method to preprocess the data. Default None.

  • batch_size (int) – Number of data points per batch to use for training. Default is None.

  • scheduler (str) – The scheduler to be used with the dask backend.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

  • save_preprocessor (str) – Save preprocessor to file.

Notes

This class computes Coulomb matrix features using the dscribe module. As mentioned in ML4Chem’s paper, we avoid duplication of efforts and this module serves as a demonstration.

NAME = 'CoulombMatrix'
calculate(images=None, purpose='training', data=None, svm=False)[source]

Calculate the features per atom in an atoms objects

Parameters:
  • image (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

  • feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

  • reference_space (dict) – A reference space useful for SVM models.

classmethod name()[source]

Returns name of class

stack_features(symbols, image_index, stacked_features)[source]

Stack features

to_pandas()[source]

Convert features to pandas DataFrame

class ml4chem.atomistic.features.Gaussian(cutoff=6.5, cutofffxn=None, normalized=True, preprocessor=('MinMaxScaler', None), custom=None, save_preprocessor='ml4chem', scheduler='distributed', filename='features.db', overwrite=True, angular_type='G3', weighted=False, batch_size=None)[source]

Bases: AtomisticFeatures

Behler-Parrinello symmetry functions

This class builds local chemical environments for atoms based on the Behler-Parrinello Gaussian type symmetry functions. It is modular enough that can be used just for creating feature spaces.

Parameters:
  • cutoff (float) – Cutoff radius used for computing features.

  • cutofffxn (object) – A Cutoff function object.

  • normalized (bool) – Set it to true if the features are being normalized with respect to the cutoff radius.

  • preprocessor (str) – Use some scaling method to preprocess the data. Default MinMaxScaler.

  • custom (dict, opt) –

    Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:

    >>> custom = {'G2': {'etas': etas},
                  'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
    

  • save_preprocessor (str) – Save preprocessor to file.

  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Path to save database. Note that if the filename exists, the features will be loaded without being recomputed.

  • overwrite (bool) – If overwrite is set to True, ml4chem will not try to load existing databases. Default is True.

  • angular_type (str) – Compute “G3” or “G4” angular symmetry functions.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

  • batch_size (int) – Number of data points per batch to use for training. Default is None.

References

  1. Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).

  2. Gastegger, M., Schwiedrzik, L., Bittermann, M., Berzsenyi, F. & Marquetand, P. wACSF—Weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys. 148, 241709 (2018).

NAME = 'Gaussian'
calculate(images=None, purpose='training', data=None, svm=False, GP=None)[source]

Calculate the features per atom in an atoms objects

Parameters:
  • image (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

  • feature_space (dict) – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

  • reference_space (dict) – A reference space useful for SVM models.

get_atomic_features

Delayed class method to compute atomic features

Parameters:
  • atom (object) – An ASE atom object.

  • image (ase object, list) – List of atoms in an image.

  • index (int) – Index of atom in atoms object.

  • symbol (str) – Chemical symbol of atom in atoms object.

  • n_symbols (ndarray of str) – Array of neighbors’ symbols.

  • neighborpositions (ndarray of float) – Array of Cartesian atomic positions.

  • image_molecule (ase object, list) – List of atoms in an image.

  • weighted (bool) – True if applying weighted feature of Gaussian function. See Ref. 2.

get_symmetry_functions(type, symbols, etas=None, zetas=None, gammas=None)[source]

Get requested symmetry functions

Parameters:
  • type (str) – The desired symmetry function: ‘G2’, ‘G3’, or ‘G4’.

  • symbols (list) – List of chemical symbols.

  • etas (list) – List of etas to build the Gaussian function.

  • zetas (list) – List of zetas to build the Gaussian function.

  • gammas (list) – List of gammas to build the Gaussian function.

make_symmetry_functions(symbols, custom=None, angular_type='G3')[source]

Function to make symmetry functions

This method needs at least unique symbols and defaults set to true. Parameters

>>> symbols = ['H', 'O']
customdict, opt

Create custom symmetry functions, and override defaults. Default is None. The structure of the dictionary is as follows:

>>> custom = {'G2': {'etas': etas},
              'G3': {'etas': a_etas, 'zetas': zetas, 'gammas': gammas}}
angular_typestr

Compute “G3” or “G4” angular symmetry functions.

Returns:

GP – Symmetry function parameters.

Return type:

dict

classmethod name()[source]

Returns name of class

print_features_params(GP)[source]

Print features parameters

stack_features(indices, stacked_features)[source]

Stack features

to_pandas()[source]

Convert features to pandas DataFrame

class ml4chem.atomistic.features.LatentFeatures(encoder=None, scheduler='distributed', filename='latent.db', preprocessor=None, features=None, save_preprocessor='latentfeatures.scaler')[source]

Bases: AtomisticFeatures

Extraction of features using AutoEncoder model class.

The latent space represents a feature space from the inputs that an AutoEncoder model finds relevant about the underlying structure of the data. This class takes images in ASE format and returns them converted in a latent feature vector using the encoder layer of an AutoEncoder model already hashed to be used by ML4Chem. It also allows interoperability with the Potentials() class.

Parameters:
  • encoder (dict) –

    Dictionary with structure:
    >>> encoder = {'model': file.ml4c, 'params': file.params}
    

  • scheduler (str) – The scheduler to be used with the dask backend.

  • filename (str) – Name to save on disk of serialized database.

  • preprocessor (tuple) – Use some scaling method to preprocess the data.

  • features (tuple) – Users can set the features keyword argument to a tuple with the structure (‘Name’, {kwargs})

  • save_preprocessor (str) – Save preprocessor to file.

NAME = 'LatentFeatures'
calculate(images, purpose='training', data=None, svm=False)[source]

Return features per atom in an atoms object

Parameters:
  • images (dict) – Hashed images using the Data class.

  • purpose (str) – The supported purposes are: ‘training’, ‘inference’.

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

feature_space – A dictionary with key hash and value as a list with the following structure: {‘hash’: [(‘H’, [vector]]}

Return type:

dict

load_encoder(encoder, **kwargs)[source]

Load an autoencoder in eval() mode

Parameters:
  • encoder (dict) –

    Dictionary with structure:

    >>> encoder = {'model': file.ml4c, 'params': file.params}
    

  • data (obj) – data object

  • svm (bool) – Whether or not these features are going to be used for kernel methods.

Returns:

autoencoder.eval() – Autoencoder model object in eval mode to get the latent space.

Return type:

obj

classmethod name()[source]

Returns name of class

to_pandas()[source]

Convert features to pandas DataFrame