The ability to differentiate your code is most often used in
combination with a gradient-based optimization algorithm, whose job
is to find the state vector x of input variables that
minimizes the scalar function J(x) by repeated
calculation of the value of J, its first derivative
∂J/∂x (a vector) and optionally its second
derivative ∂2J/∂x2
(the Hessian matrix). Since version 2.1, Adept includes several
minimization algorithms, all of which can be used with or without
box constraints on the values of x. Usage is described in
Chapter 4 of the User
Guide.
The video below illustrates three of these algorithms minimizing
the two-dimensional Rosenbrock
"banana" function using Adept's test_minimizer program,
subject to the constraint that the solution cannot lie in the shaded
area on the left. The colours indicate the value of the cost
function J; Rosenbrock's function is a challenging test of a
minimization algorithm because it is easy to find the valley (in
yellow) but difficult to find the lowest point. Real-world problems
are usually of much higher dimensionality, but are therefore more
difficult to visualize.
The three minimization algorithms demonstrated are:
Conjugate
Gradient. This is the most memory efficient algorithm and so
suitable for problems of the highest dimensionality. Only the
gradient vector is used, not the full Hessian matrix. Search
directions are chosen to be orthogonal to the previous directions (up
to the number of dimensions), and the blue dots in the video
illustrate the samples taken during
a line search
along a new direction until a sufficient decrease of the scalar
function J is found. Each successful line search performed is
referred to as an iteration, although the overall computational
cost of the minimization is proportional to the total number of
samples. By default the Polak-Ribière formula is used to compute the
next search direction, but the Fletcher-Reeves formula is available as
an alternative.
Limited-memory
BFGS (L-BFGS). This method also only uses the gradient vector,
but stores a small number (default 6) of the gradients from previous
iterations, enabling it to implicitly estimate the Hessian
matrix. This improves the selection of a new search direction, and
also enables the optimum step size to be estimated. Thus, typically
fewer samples are needed to find the minimum than the Conjugate
Gradient method, although more memory is used.
Levenberg-Marquardt.
This method requires both the gradient and the Hessian matrix, which
restricts the number of dimensions that can be handled without
incurring a large computational cost. However, if the scalar function
is nearly quadratic then rapid progress is made towards the
minimum. The number of iterations required is typically fewer than the
methods above, although each sample is more expensive because of the
additional cost of computing the Hessian matrix. The
related Levenberg method is also available.
Adept does not include
any derivative-free
methods at present; if you are using Adept you will invariably have
the derivative available, in which case a minimization method that
uses it will achieve much faster convergence.