ml4chem package

Subpackages

Submodules

ml4chem.active module

class ml4chem.active.ActiveLearning(labeled, unlabeled, atomistic=True)[source]

Bases: object

Active Learning

Parameters:
  • labeled (list) – List of graphs or objects.

  • unlabeled (object) – List of graphs or objects.

  • atomistic (bool, optional) – Atomistic similarities?, by default False.

run(kernel, max_variance=10, max_iter=None)[source]

Run the ActiveLearning class

Parameters:
  • kernel (object) – A kernel to measure similarity.

  • max_variance (float, optional) – Maximum variance allowed, by default 10.

  • max_iter (int, optional) – Maximum number of iterations allowed, by default None.

ml4chem.metrics module

ml4chem.metrics.compute_mae(outputs, targets, atoms_per_image=None)[source]

Compute MAE

Useful when using futures.

Parameters:
  • outputs (list) – List of outputs.

  • targets (list) – List if targets.

  • atoms_per_image (list) – List of atoms per image.

Returns:

mae – Mean absolute error.

Return type:

float

ml4chem.metrics.compute_mse(outputs, targets, atoms_per_image=None)[source]

Compute MSE

Useful when using futures.

Parameters:
  • outputs (list) – List of outputs.

  • targets (list) – List if targets.

  • atoms_per_image (list) – List of atoms per image.

Returns:

mse – Mean squared error.

Return type:

float

ml4chem.metrics.compute_rmse(outputs, targets, atoms_per_image=None)[source]

Compute RMSE

Useful when using futures.

Parameters:
  • outputs (list) – List of outputs.

  • targets (list) – List if targets.

  • atoms_per_image (list) – List of atoms per image.

Returns:

rmse – Root-mean squared error.

Return type:

float

ml4chem.utils module

ml4chem.utils.convert_elapsed_time(seconds)[source]

Convert elapsed time in seconds to HH:MM:SS format

ml4chem.utils.dynamic_import(name, package, alt_name=None)[source]

A dynamic module importer

Parameters:
  • name (str) – Name of the module to be imported.

  • package (str) – Path to package. Example: ml4chem.atomistic.features

  • alt_name (str) – Alternative module_name.

Returns:

_class – An class object.

Return type:

obj

ml4chem.utils.get_chunks(sequence, chunk_size, svm=True)[source]

A function that yields a list in chunks

Parameters:
  • sequence (list or dictionary) – A list or a dictionary to be split.

  • chunk_size (int) – Number of elements in each group.

  • svm (bool) – Whether or not these chunks are going to be used for kernel methods.

ml4chem.utils.get_hash(image)[source]

Get the SHA1 hash of an image object

Parameters:

image (object) – An image to be hashed.

Returns:

_hash – Hash of image in string format

Return type:

str

ml4chem.utils.get_header_message()[source]

Function that returns ML4Chem header

ml4chem.utils.get_neighborlist(image, cutoff)[source]

Get the list of neighbors

Parameters:

image (object) – ASE image.

Return type:

A list of neighbors with offset distances.

ml4chem.utils.get_number_of_parameters(model)[source]

Get the number of parameters

Parameters:

model (obj) – Pytorch model to perform forward() and get gradients.

Returns:

  • (total_params, train_params) tuple with total number of parameters and

  • number of trainable parameters.

ml4chem.utils.lod_to_list(data, svm=False, requires_grad=False)[source]

List Of Dict (lod) to list

Parameters:
  • data (list) – A list with ml4chem dictionaries. Those ones coming from get_chunks()

  • svm (bool, optional.) – Whether or not these chunks are going to be used for kernel methods, by default False.

  • requires_grad (bool, optional.) – Do we require gradients?, by default False.

Returns:

A list of tensors or list of float.

Return type:

_list

ml4chem.utils.logger(filename=None, level=None, format=None, filemode='a')[source]

A wrapper to the logging python module

This module is useful for cases where we need to log in a for loop different files. It also will allow more flexibility later on how the logging format could evolve.

Parameters:
  • filename (str, optional) – Name of logfile. If no filename is provided, we output to stdout.

  • level (str, optional) – Level of logging messages, by default ‘info’. Supported are: ‘info’ and ‘debug’.

  • format (str, optional) – Format of logging messages, by default ‘%(message)s’.

  • filemode (str, optional) – If filename is specified, open the file in this mode. Defaults to “a”. Supported modes are: “r” (read), “w” (write), “a” (append).

Returns:

A logger object.

Return type:

logger

ml4chem.visualization module

ml4chem.visualization.parity(predictions, true, scores=False, filename=None, **kwargs)[source]

A parity plot function

Parameters:
  • predictions (list or ndarray) – Model predictions in a list.

  • true (list or ndarray) – Targets or true values.

  • scores (bool) – Print scores in parity plot.

  • filename (str) – A name to save the plot to a file. If filename is non existent, we call plt.show().

Notes

kwargs accepts all valid keyword arguments for matplotlib.pyplot.savefig.

ml4chem.visualization.plot_atomic_features(latent_space, method='PCA', dimensions=2, backend='seaborn', data_only=False, preprocessor=None, backend_kwargs=None, **kwargs)[source]

Plot high dimensional atomic feature vectors

This function can take a feature space dictionary, or a database file and plot the atomic features using PCA or t-SNE.

$ ml4chem –plot tsne –file path.db

Parameters:
  • latent_space (dict or str) – Dictionary of atomic features of path to database file.

  • method (str, optional) – Dimensionality reduction method to employed, by default “PCA”. Supported are: “PCA” and “TSNE”.

  • dimensions (int, optional) – Number of dimensions to reduce the high dimensional atomic feature vectors, by default 2.

  • backend (str, optional) – Select the backend to plot features. Supported are “plotly” and “seaborn”, by default “plotly”.

  • preprocessor (obj) – One of the preprocessors supported by sklearn e.g.: StandardScaler(), Normalizer().

  • backend_kwargs (dict) –

    Dictionary with extra keyword arguments to extend functionality of backends that cannot be set with the defaults keyword arguments of the plot_atomic_features function.

    For more information see:

  • data_only (bool) – If set to True, this function returns only data in a dataframe with the following structure:

ml4chem.visualization.read_log(logfile, metric='loss', refresh=None, data_only=False)[source]

Read the logfile

Parameters:
  • logfile (str) – Path to logfile.

  • metric (str) –

    The keys,values of the dictionary are:

    • ”loss”: Loss function values.

    • ”training”: Training error.

    • ”test”: Test error.

    • ”combined”: training + test errors in same plot.

  • refresh (float) – Interval in seconds before refreshing log file plot.

  • data_only (bool) – If set to True, this function returns only data in a dataframe with the following structure:

  • df.head() (>>>) – epochs loss training test

  • 793.3943 (0 1 33779.46 815.6884) –

Returns:

If data_only is true we return dataframe, otherwise a figure.

Return type:

pandas.DataFrame or matplotlib.pyplot object

Module contents