ml4chem.optim package
Submodules
ml4chem.optim.LBFGS module
- class ml4chem.optim.LBFGS.FullBatchLBFGS(params, lr=1, history_size=10, line_search='Wolfe', dtype=torch.float32, debug=False)[source]
Bases:
LBFGS
Implements full-batch or deterministic L-BFGS algorithm. Compatible with Powell damping. Can be used when evaluating a deterministic function and gradient. Wraps the LBFGS optimizer. Performs the two-loop recursion, updating, and curvature updating in a single step.
Implemented by: Hao-Jun Michael Shi and Dheevatsa Mudigere Last edited 11/15/18.
Warning
. Does not support per-parameter options and parameter groups. . All parameters have to be on a single device.
- Inputs:
lr (float): steplength or learning rate (default: 1) history_size (int): update history size (default: 10) line_search (str): designates line search to use (default: ‘Wolfe’)
- Options:
‘None’: uses steplength designated in algorithm ‘Armijo’: uses Armijo backtracking line search ‘Wolfe’: uses Armijo-Wolfe bracketing line search
dtype: data type (default: torch.float) debug (bool): debugging mode
- step(options={})[source]
Performs a single optimization step.
- Inputs:
options (dict): contains options for performing line search
- General Options:
‘eps’ (float): constant for curvature pair rejection or damping (default: 1e-2) ‘damping’ (bool): flag for using Powell damping (default: False)
- Options for Armijo backtracking line search:
‘closure’ (callable): reevaluates model and returns function value ‘current_loss’ (tensor): objective value at current iterate (default: F(x_k)) ‘gtd’ (tensor): inner product g_Ok’d in line search (default: g_Ok’d) ‘eta’ (tensor): factor for decreasing steplength > 0 (default: 2) ‘c1’ (tensor): sufficient decrease constant in (0, 1) (default: 1e-4) ‘max_ls’ (int): maximum number of line search steps permitted (default: 10) ‘interpolate’ (bool): flag for using interpolation (default: True) ‘inplace’ (bool): flag for inplace operations (default: True) ‘ls_debug’ (bool): debugging mode for line search
- Options for Wolfe line search:
‘closure’ (callable): reevaluates model and returns function value ‘current_loss’ (tensor): objective value at current iterate (default: F(x_k)) ‘gtd’ (tensor): inner product g_Ok’d in line search (default: g_Ok’d) ‘eta’ (float): factor for extrapolation (default: 2) ‘c1’ (float): sufficient decrease constant in (0, 1) (default: 1e-4) ‘c2’ (float): curvature condition constant in (0, 1) (default: 0.9) ‘max_ls’ (int): maximum number of line search steps permitted (default: 10) ‘interpolate’ (bool): flag for using interpolation (default: True) ‘inplace’ (bool): flag for inplace operations (default: True) ‘ls_debug’ (bool): debugging mode for line search
- Outputs (depends on line search):
- . No line search:
t (float): steplength
- . Armijo backtracking line search:
F_new (tensor): loss function at new iterate t (tensor): final steplength ls_step (int): number of backtracks closure_eval (int): number of closure evaluations desc_dir (bool): descent direction flag
True: p_k is descent direction with respect to the line search function False: p_k is not a descent direction with respect to the line search function
- fail (bool): failure flag
True: line search reached maximum number of iterations, failed False: line search succeeded
- . Wolfe line search:
F_new (tensor): loss function at new iterate g_new (tensor): gradient at new iterate t (float): final steplength ls_step (int): number of backtracks closure_eval (int): number of closure evaluations grad_eval (int): number of gradient evaluations desc_dir (bool): descent direction flag
True: p_k is descent direction with respect to the line search function False: p_k is not a descent direction with respect to the line search function
- fail (bool): failure flag
True: line search reached maximum number of iterations, failed False: line search succeeded
Notes
- . If encountering line search failure in the deterministic setting, one
should try increasing the maximum number of line search steps max_ls.
- class ml4chem.optim.LBFGS.LBFGS(params, lr=1, history_size=10, line_search='Wolfe', dtype=torch.float32, debug=False)[source]
Bases:
Optimizer
Implements the L-BFGS algorithm. Compatible with multi-batch and full-overlap L-BFGS implementations and (stochastic) Powell damping. Partly based on the original L-BFGS implementation in PyTorch, Mark Schmidt’s minFunc MATLAB code, and Michael Overton’s weak Wolfe line search MATLAB code.
Implemented by: Hao-Jun Michael Shi and Dheevatsa Mudigere Last edited 12/6/18.
Warning
. Does not support per-parameter options and parameter groups. . All parameters have to be on a single device.
- Inputs:
lr (float): steplength or learning rate (default: 1) history_size (int): update history size (default: 10) line_search (str): designates line search to use (default: ‘Wolfe’)
- Options:
‘None’: uses steplength designated in algorithm ‘Armijo’: uses Armijo backtracking line search ‘Wolfe’: uses Armijo-Wolfe bracketing line search
dtype: data type (default: torch.float) debug (bool): debugging mode
References: [1] Berahas, Albert S., Jorge Nocedal, and Martin Takác. “A Multi-Batch L-BFGS
Method for Machine Learning.” Advances in Neural Information Processing Systems. 2016.
- [2] Bollapragada, Raghu, et al. “A Progressive Batching L-BFGS Method for Machine
Learning.” International Conference on Machine Learning. 2018.
- [3] Lewis, Adrian S., and Michael L. Overton. “Nonsmooth Optimization via Quasi-Newton
Methods.” Mathematical Programming 141.1-2 (2013): 135-163.
- [4] Liu, Dong C., and Jorge Nocedal. “On the Limited Memory BFGS Method for
Large Scale Optimization.” Mathematical Programming 45.1-3 (1989): 503-528.
- [5] Nocedal, Jorge. “Updating Quasi-Newton Matrices With Limited Storage.”
Mathematics of Computation 35.151 (1980): 773-782.
- [6] Nocedal, Jorge, and Stephen J. Wright. “Numerical Optimization.” Springer New York,
- [7] Schmidt, Mark. “minFunc: Unconstrained Differentiable Multivariate Optimization
in Matlab.” Software available at http://www.cs.ubc.ca/~schmidtm/Software/minFunc.html (2005).
- [8] Schraudolph, Nicol N., Jin Yu, and Simon Günter. “A Stochastic Quasi-Newton
Method for Online Convex Optimization.” Artificial Intelligence and Statistics. 2007.
- [9] Wang, Xiao, et al. “Stochastic Quasi-Newton Methods for Nonconvex Stochastic
Optimization.” SIAM Journal on Optimization 27.2 (2017): 927-956.
- curvature_update(flat_grad, eps=0.01, damping=False)[source]
Performs curvature update.
- Inputs:
- flat_grad (tensor): 1-D tensor of flattened gradient for computing
gradient difference with previously stored gradient
eps (float): constant for curvature pair rejection or damping (default: 1e-2) damping (bool): flag for using Powell damping (default: False)
- line_search(line_search)[source]
Switches line search option.
- Inputs:
- line_search (str): designates line search to use
- Options:
‘None’: uses steplength designated in algorithm ‘Armijo’: uses Armijo backtracking line search ‘Wolfe’: uses Armijo-Wolfe bracketing line search
- step(p_k, g_Ok, g_Sk=None, options={})[source]
Performs a single optimization step (parameter update).
- Parameters:
closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.
- ml4chem.optim.LBFGS.is_legal(v)[source]
Checks that tensor is not NaN or Inf.
- Inputs:
v (tensor): tensor to be checked
- ml4chem.optim.LBFGS.polyinterp(points, x_min_bound=None, x_max_bound=None, plot=False)[source]
Gives the minimizer and minimum of the interpolating polynomial over given points based on function and derivative information. Defaults to bisection if no critical points are valid.
Based on polyinterp.m Matlab function in minFunc by Mark Schmidt with some slight modifications.
Implemented by: Hao-Jun Michael Shi and Dheevatsa Mudigere Last edited 12/6/18.
- Inputs:
points (nparray): two-dimensional array with each point of form [x f g] x_min_bound (float): minimum value that brackets minimum (default: minimum of points) x_max_bound (float): maximum value that brackets minimum (default: maximum of points) plot (bool): plot interpolating polynomial
- Outputs:
x_sol (float): minimizer of interpolating polynomial F_min (float): minimum of interpolating polynomial
Note
. Set f or g to np.nan if they are unknown
ml4chem.optim.handler module
- ml4chem.optim.handler.get_lr(optimizer)[source]
Get current learning rate
- Parameters:
optimizer (obj) – An optimizer object.
- Returns:
Current learning rate.
- Return type:
lr
- ml4chem.optim.handler.get_lr_scheduler(optimizer, lr_scheduler)[source]
Get a learning rate scheduler
With a learning rate scheduler it is possible to perform training with an adaptative learning rate.
- Parameters:
optimizer (obj) – An optimizer object.
lr_scheduler (tuple) –
Tuple with structure: scheduler’s name and a dictionary with keyword arguments.
>>> scheduler = ('ReduceLROnPlateau', {'mode': 'min', 'patience': 10})
- Returns:
scheduler – A learning rate scheduler object that can be used to train models.
- Return type:
obj
Notes
For a list of schedulers and respective keyword arguments, please refer to https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html
- ml4chem.optim.handler.get_optimizer(optimizer, params)[source]
Get optimizer to train pytorch models
There are several optimizers available in pytorch, and all of them take different parameters. This function takes as arguments an optimizer tuple with the following structure:
>>> optimizer = ('adam', {'lr': 1e-2, 'weight_decay': 1e-6})
and returns an optimizer object.
- Parameters:
optimizer (tuple) – Tuple with name of optimizer and keyword arguments of optimizer as shown above.
params (list) – Parameters obtained from model.parameters() method.
- Returns:
optimizer – An optimizer object.
- Return type:
obj
Notes
For a list of all supported optimizers please check: