Adept: Optimization features

Optimization features

The ability to differentiate your code is most often used in combination with a gradient-based optimization algorithm, whose job is to find the state vector x of input variables that minimizes the scalar function J(x) by repeated calculation of the value of J, its first derivative ∂J/∂x (a vector) and optionally its second derivative ∂²J/∂x² (the Hessian matrix). Since version 2.1, Adept includes several minimization algorithms, all of which can be used with or without box constraints on the values of x. Usage is described in Chapter 4 of the User Guide.

The video below illustrates three of these algorithms minimizing the two-dimensional Rosenbrock "banana" function using Adept's test_minimizer program, subject to the constraint that the solution cannot lie in the shaded area on the left. The colours indicate the value of the cost function J; Rosenbrock's function is a challenging test of a minimization algorithm because it is easy to find the valley (in yellow) but difficult to find the lowest point. Real-world problems are usually of much higher dimensionality, but are therefore more difficult to visualize.

The three minimization algorithms demonstrated are:

Conjugate Gradient. This is the most memory efficient algorithm and so suitable for problems of the highest dimensionality. Only the gradient vector is used, not the full Hessian matrix. Search directions are chosen to be orthogonal to the previous directions (up to the number of dimensions), and the blue dots in the video illustrate the samples taken during a line search along a new direction until a sufficient decrease of the scalar function J is found. Each successful line search performed is referred to as an iteration, although the overall computational cost of the minimization is proportional to the total number of samples. By default the Polak-Ribière formula is used to compute the next search direction, but the Fletcher-Reeves formula is available as an alternative.
Limited-memory BFGS (L-BFGS). This method also only uses the gradient vector, but stores a small number (default 6) of the gradients from previous iterations, enabling it to implicitly estimate the Hessian matrix. This improves the selection of a new search direction, and also enables the optimum step size to be estimated. Thus, typically fewer samples are needed to find the minimum than the Conjugate Gradient method, although more memory is used.
Levenberg-Marquardt. This method requires both the gradient and the Hessian matrix, which restricts the number of dimensions that can be handled without incurring a large computational cost. However, if the scalar function is nearly quadratic then rapid progress is made towards the minimum. The number of iterations required is typically fewer than the methods above, although each sample is more expensive because of the additional cost of computing the Hessian matrix. The related Levenberg method is also available.

Adept does not include any derivative-free methods at present; if you are using Adept you will invariably have the derivative available, in which case a minimization method that uses it will achieve much faster convergence.