POGS : Usage

This page contains general usage information about POGS. For specific instructions about installing and running the C++ code or MATLAB and R wrappers, see the Setup section on the left-hand side.

Input Variables

The solver takes 3 input variables.

1 A is the data matrix in the constraint \(y = Ax\). Currently only dense matrices are supported, but a sparse matrix implementation is on its way.
1 f and 1 g are structures representing the terms \(f(y)\) and \(g(x)\) in the objective function. It is assumed that \(f(y)\) and \(g(x)\) are separable and that each \(f_i(y_i)\) (resp. \(g_j(x_j)\)) can be represented by 5 parameters.

Parameters

The solver takes 6 parameters. They are all optional.

1 rel_tol (default 1 1e-3). The relative stopping criteria (see section below on stopping conditions).
1 abs_tol (default 1 1e-4). The absolute stopping criteria (see section below on stopping conditions).
1 rho (default 1 1.0). See section section below on choosing \( \rho \).
1 adaptive_rho (default 1 true). Adaptively choose 1 rho. If you have found a good value of rho that works for your problem, then it may make more sense to turn off adaptive rho updating. Otherwise it is highly recommended to leave this turned on.
1 max_iter (default 1 1000). The maximum number of iterations before the solver terminates. Most problems should terminate within ~500 iterations, but there is no guarantee.
1 quiet (default 1 false). Displays progress information every 10th iteration if set to 1 false.

Output Variables

The solver returns 3 solution variables.

1 x and 1 y are the primal solution variable \(x^\star\\text{pogs}\) and \(y^\star\\text{pogs}\).
1 l is the dual solution variable \(\lambda^\star\_\text{pogs}\) corresponding to the constraint \(Ax = y\).

Warm Starting / Factors

POGS pre-computes various factors relating to the matrix \(A\) before beginning to minimize \(f + g\) (this is known as factorization caching). As a result the first iteration will take a long time compared to subsequent iterations. Therefore, if you plan to solve multiple optimization problems with the same matrix \(A\) but different functions \(f\) and \(g\), then POGS can perform the factorization once and re-use it in subsequent iterations. To use this feature simply feed the factors back in the next time you call POGS. The exact process depends on whether you’re using C++ or one of the wrappers.

Stopping Criteria

The stopping criteria used by POGS is similar to the one used by DIMACS and the SeDuMi solver. They are

\[ \begin{aligned} \|r^{k}\| & \leq \epsilon_{\text{abs}} \sqrt{m} +\epsilon_{\text{rel}} \| (x^k, y^k) \| \\
\|s^{k}\| & \leq \epsilon_{\text{abs}} \sqrt{n} +\epsilon_{\text{rel}} \| (\lambda^k, \mu^k) \| \\
\ | p^k - d^k| & \leq \epsilon_{\text{abs}} + \epsilon_{\text{rel}} | p^k | \\
\end{aligned} \]

where \( r^k = Ax^k - y^k \) and \( s^k = A^T\lambda^k + \mu^k \) are the primal resp. dual residuals at the \( k \)’th iteration. The scalars \( p^k \) and \( d^k \) are the primal and dual optimal values, making \( p^k - d^k \) a pseduo-gap.

Choosing \( \rho \)

The convergence rate of POGS is tightly coupled with the choice of penalty parameter \( \rho \). Optimal choices of \( \rho \) can range from \( 10^{-3} \) to \( 10^{3} \) and in general objective functions with lower curvature are more sensitive to the choice of \( \rho \). We recommend letting POGS automatically choose \( \rho \). Advanced users may prefer to choose \( \rho \) themselves for best performance.

\(h\)-Functions

A total of 16 different functions that can be specified for \(h\). They are

kAbs – Absolute Value: Function: \(h(x) = |x|\).; Domain: \( \mathbf{R} \).; Uses: \(L_1\)-norm problems (eg. to promote sparsity).
kNegEntr – Negative Entropy: Function: \(h(x) = -x \log(x)\).; Domain: \( \mathbf{R}_+ \).; Uses: Useful in problems concerning probability distributions.
kExp – Exponential: Function: \(h(x) = e^x\).; Domain: \( \mathbf{R} \).; Uses: Useful as soft version of \(I(x \leq 0) \).
kHuber – Huber Loss: Function: \( h(x) = \left\{\begin{aligned} &(1/2)x^2 & |x| \leq 1 \\\ &|x| - (1/2) & |x| > 1 \end{aligned} \right. \).; Domain: \( \mathbf{R} \).; Uses: Robust estimation (use as robust alternative to \((1/2)x^2\)).
kIdentity – Identity: Function: \(h(x) = x\).; Domain: \( \mathbf{R} \).; Uses: Typical in linear programs and quadratic programs.
kIndBox01 – Indicator of [0, 1]-Box: Function: \(h(x) = I(0 \leq x \leq 1)\).; Domain: \( [0, 1] \).; Uses: Box constraint.
kIndEq0 – Indicator of the Origin: Function: \(h(x) = I(x = 0)\).; Domain: \( \{ 0 \} \).; Uses: Equality constraint.
kIndGe0 – Indicator of the Non-Negative Orthant: Function: \(h(x) = I(x \geq 0)\).; Domain: \( \mathbf{R}_+ \).; Uses: Inequality constraint.
kIndLe0 – Indicator of the Non-Positive Orthant: Function: \(h(x) = I(x \leq 0)\).; Domain: \( \mathbf{R}_- \).; Uses: Inequality constraint.
kLogistic – Integral of the Logistic Function: Function: \(h(x) = \log(1 + e^x)\).; Domain: \( \mathbf{R} \).; Uses: Logistic Regression.
kMaxNeg0 – Negative Part: Function: \( h(x) = \max(0, -x) \).; Domain: \( \mathbf{R} \).; Uses: Hingle loss (eg. SVM).
kMaxPos0 – Positive Part: Function: \( h(x) = \max(0, x) \).; Domain: \( \mathbf{R} \).; Uses: Hingle loss (eg. SVM).
kNegLog – Negative Log: Function: \( h(x) = -\log(x) \).; Domain: \( \mathbf{R}_+ \).; Uses: Analytic centering or barrier function (eg. for solving convex subproblem).
kSquare – Square: Function: \( h(x) = (1/2)x^2 \).; Domain: \( \mathbf{R} \).; Uses: \(L_2\)-norm problems (eg. least squares, ridge regression, etc.).
kRecipr – Reciprocal: Function: \( h(x) = 1/x \).; Domain: \( \mathbf{R}_+ \).; Uses: Analytic centering or barrier function (eg. for solving convex subproblem).
kZero – Zero: Function: \( h(x) = 0 \).; Domain: \( \mathbf{R} \).; Uses: Objective does not depend on specific term.

POGS ADMM-based graph form solver