Conjugate Gradient¶
Table of contents
Algorithm Description¶
The Nonlinear Conjugate Gradient (CG) algorithm is used solve to optimization problems of the form
where \(f\) is convex and (at least) once differentiable. The algorithm requires the gradient function to be known.
The updating rule for CG is described below. Let \(x^{(i)}\) denote the function input values at stage \(i\) of the algorithm.
Compute the descent direction using:
\[d^{(i)} = - [\nabla_x f(x^{(i)})] + \beta^{(i)} d^{(i-1)}\]
Determine if \(\beta^{(i)}\) should be reset (to zero), which occurs when
\[\dfrac{| [\nabla f(x^{(i)})]^\top [\nabla f(x^{(i-1)})] |}{ [\nabla f(x^{(i)})]^\top [\nabla f(x^{(i)})] } > \nu\]
where \(\nu\) is set via cg_settings.restart_threshold
.
Compute the optimal step size using line search:
\[\alpha^{(i)} = \arg \min_{\alpha} f(x^{(i)} + \alpha \times d^{(i)})\]
Update the candidate solution vector using:
\[x^{(i+1)} = x^{(i)} + \alpha^{(i)} \times d^{(i)}\]
The algorithm stops when at least one of the following conditions are met:
the norm of the gradient vector, \(\| \nabla f \|\), is less than
grad_err_tol
;the relative change between \(x^{(i+1)}\) and \(x^{(i)}\) is less than
rel_sol_change_tol
;the total number of iterations exceeds
iter_max
.
Updating Rules¶
cg_settings.method = 1
Fletcher–Reeves (FR):\[\beta_{\text{FR}} = \dfrac{ [\nabla_x f(x^{(i)})]^\top [\nabla_x f(x^{(i)})] }{ [\nabla_x f(x^{(i-1)})]^\top [\nabla_x f(x^{(i-1)})] }\]cg_settings.method = 2
Polak-Ribiere (PR):\[\beta_{\text{PR}} = \dfrac{ [\nabla_x f(x^{(i)})]^\top [\nabla_x f(x^{(i)})] }{ [\nabla_x f(x^{(i-1)})]^\top [\nabla_x f(x^{(i-1)})] }\]cg_settings.method = 3
FR-PR Hybrid:\[\beta = \begin{cases} - \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} < - \beta_{\text{FR}} \\ \beta_{\text{PR}} & \text{ if } |\beta_{\text{PR}}| \leq \beta_{\text{FR}} \\ \beta_{\text{FR}} & \text{ if } \beta_{\text{PR}} > \beta_{\text{FR}} \end{cases}\]cg_settings.method = 4
Hestenes-Stiefel:\[\beta_{\text{HS}} = \dfrac{[\nabla_x f(x^{(i)})] \cdot ([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})])}{([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]) \cdot d^{(i)}}\]cg_settings.method = 5
Dai-Yuan:\[\beta_{\text{DY}} = \dfrac{[\nabla_x f(x^{(i)})] \cdot [\nabla_x f(x^{(i)})]}{([\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})]) \cdot d^{(i)}}\]cg_settings.method = 6
Hager-Zhang:\[\begin{aligned} \beta_{\text{HZ}} &= \left( y^{(i)} - 2 \times \dfrac{[y^{(i)}] \cdot y^{(i)}}{y^{(i)} \cdot d^{(i)}} \times d^{(i)} \right) \cdot \dfrac{[\nabla_x f(x^{(i)})]}{y^{(i)} \cdot d^{(i)}} \\ y^{(i)} &:= [\nabla_x f(x^{(i)})] - [\nabla_x f(x^{(i-1)})] \end{aligned}\]
Finally, we set:
where \(\beta_{*}\) is the update method chosen.
Function Declarations¶
-
bool cg(ColVec_t &init_out_vals, std::function<fp_t(const ColVec_t &vals_inp, ColVec_t *grad_out, void *opt_data)> opt_objfn, void *opt_data)¶
The Nonlinear Conjugate Gradient (CG) Optimization Algorithm.
- Parameters
init_out_vals – a column vector of initial values, which will be replaced by the solution upon successful completion of the optimization algorithm.
opt_objfn – the function to be minimized, taking three arguments:
vals_inp
a vector of inputs;grad_out
a vector to store the gradient; andopt_data
additional data passed to the user-provided function.
opt_data – additional data passed to the user-provided function
- Returns
a boolean value indicating successful completion of the optimization algorithm.
-
bool cg(ColVec_t &init_out_vals, std::function<fp_t(const ColVec_t &vals_inp, ColVec_t *grad_out, void *opt_data)> opt_objfn, void *opt_data, algo_settings_t &settings)¶
The Nonlinear Conjugate Gradient (CG) Optimization Algorithm.
- Parameters
init_out_vals – a column vector of initial values, which will be replaced by the solution upon successful completion of the optimization algorithm.
opt_objfn – the function to be minimized, taking three arguments:
vals_inp
a vector of inputs;grad_out
a vector to store the gradient; andopt_data
additional data passed to the user-provided function.
opt_data – additional data passed to the user-provided function.
settings – parameters controlling the optimization routine.
- Returns
a boolean value indicating successful completion of the optimization algorithm.
Optimization Control Parameters¶
The basic control parameters are:
fp_t grad_err_tol
: the error tolerance value controlling how small the \(L_2\) norm of the gradient vector \(\| \nabla f \|\) should be before ‘convergence’ is declared.fp_t rel_sol_change_tol
: the error tolerance value controlling how small the proportional change in the solution vector should be before ‘convergence’ is declared.The relative change is computed using:
\[\left\| \dfrac{x^{(i)} - x^{(i-1)}}{ |x^{(i-1)}| + \epsilon } \right\|_1\]where \(\epsilon\) is a small number added for numerical stability.
size_t iter_max
: the maximum number of iterations/updates before the algorithm exits.bool vals_bound
: whether the search space of the algorithm is bounded. Iftrue
, thenColVec_t lower_bounds
: defines the lower bounds of the search space.ColVec_t upper_bounds
: defines the upper bounds of the search space.
Additional settings:
int cg_settings.method
: Update method.Default value:
2
.
fp_t cg_settings.restart_threshold
: parameter \(\nu\) from step 2 in the algorithm description.Default value:
0.1
.
bool use_rel_sol_change_crit
: whether to enable therel_sol_change_tol
stopping criterion.Default value:
false
.
fp_t cg_settings.wolfe_cons_1
: Line search tuning parameter that controls the tolerance on the Armijo sufficient decrease condition.Default value:
1E-03
.
fp_t cg_settings.wolfe_cons_2
: Line search tuning parameter that controls the tolerance on the curvature condition.Default value:
0.10
.
int print_level
: Set the level of detail for printing updates on optimization progress.Level
0
: Nothing (default).Level
1
: Print the iteration count and current error values.Level
2
: Level 1 plus the current candidate solution values, \(x^{(i+1)}\).Level
3
: Level 2 plus the direction vector, \(d^{(i)}\), and the gradient vector, \(\nabla_x f(x^{(i+1)})\).Level
4
: Level 3 plus \(\beta^{(i)}\).
Examples¶
Sphere Function¶
Code to run this example is given below.
Armadillo (Click to show/hide)
#define OPTIM_ENABLE_ARMA_WRAPPERS
#include "optim.hpp"
inline
double
sphere_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
double obj_val = arma::dot(vals_inp,vals_inp);
if (grad_out) {
*grad_out = 2.0*vals_inp;
}
return obj_val;
}
int main()
{
const int test_dim = 5;
arma::vec x = arma::ones(test_dim,1); // initial values (1,1,...,1)
bool success = optim::cg(x, sphere_fn, nullptr);
if (success) {
std::cout << "cg: sphere test completed successfully." << "\n";
} else {
std::cout << "cg: sphere test completed unsuccessfully." << "\n";
}
arma::cout << "cg: solution to sphere test:\n" << x << arma::endl;
return 0;
}
Eigen (Click to show/hide)
#define OPTIM_ENABLE_EIGEN_WRAPPERS
#include "optim.hpp"
inline
double
sphere_fn(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* opt_data)
{
double obj_val = vals_inp.dot(vals_inp);
if (grad_out) {
*grad_out = 2.0*vals_inp;
}
return obj_val;
}
int main()
{
const int test_dim = 5;
Eigen::VectorXd x = Eigen::VectorXd::Ones(test_dim); // initial values (1,1,...,1)
bool success = optim::cg(x, sphere_fn, nullptr);
if (success) {
std::cout << "cg: sphere test completed successfully." << "\n";
} else {
std::cout << "cg: sphere test completed unsuccessfully." << "\n";
}
std::cout << "cg: solution to sphere test:\n" << x << std::endl;
return 0;
}
Booth’s Function¶
Code to run this example is given below.
Armadillo Code (Click to show/hide)
#define OPTIM_ENABLE_ARMA_WRAPPERS
#include "optim.hpp"
inline
double
booth_fn(const arma::vec& vals_inp, arma::vec* grad_out, void* opt_data)
{
double x_1 = vals_inp(0);
double x_2 = vals_inp(1);
double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);
if (grad_out) {
(*grad_out)(0) = 10*x_1 + 8*x_2 2*(- 7.0) + 4*(x_2 - 5.0);
(*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
}
return obj_val;
}
int main()
{
arma::vec x_2 = arma::zeros(2,1); // initial values (0,0)
bool success_2 = optim::cg(x, booth_fn, nullptr);
if (success_2) {
std::cout << "cg: Booth test completed successfully." << "\n";
} else {
std::cout << "cg: Booth test completed unsuccessfully." << "\n";
}
arma::cout << "cg: solution to Booth test:\n" << x_2 << arma::endl;
return 0;
}
Eigen Code (Click to show/hide)
#define OPTIM_ENABLE_EIGEN_WRAPPERS
#include "optim.hpp"
inline
double
booth_fn(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* opt_data)
{
double x_1 = vals_inp(0);
double x_2 = vals_inp(1);
double obj_val = std::pow(x_1 + 2*x_2 - 7.0,2) + std::pow(2*x_1 + x_2 - 5.0,2);
if (grad_out) {
(*grad_out)(0) = 2*(x_1 + 2*x_2 - 7.0) + 2*(2*x_1 + x_2 - 5.0)*2;
(*grad_out)(1) = 2*(x_1 + 2*x_2 - 7.0)*2 + 2*(2*x_1 + x_2 - 5.0);
}
return obj_val;
}
int main()
{
Eigen::VectorXd x = Eigen::VectorXd::Zero(test_dim); // initial values (0,0)
bool success_2 = optim::cg(x, booth_fn, nullptr);
if (success_2) {
std::cout << "cg: Booth test completed successfully." << "\n";
} else {
std::cout << "cg: Booth test completed unsuccessfully." << "\n";
}
std::cout << "cg: solution to Booth test:\n" << x_2 << std::endl;
return 0;
}