Usage
This page contains general usage information about POGS. For specific instructions about installing and running the C++ code or MATLAB and R wrappers, see the Setup section on the left-hand side.
Input Variables
The solver takes 3 input variables.
is the data matrix in the constraint \(y = Ax\). Currently only dense matrices are supported, but a sparse matrix implementation is on its way.1
A
and1
f
are structures representing the terms \(f(y)\) and \(g(x)\) in the objective function. It is assumed that \(f(y)\) and \(g(x)\) are separable and that each \(f_i(y_i)\) (resp. \(g_j(x_j)\)) can be represented by 5 parameters.1
g
Parameters
The solver takes 6 parameters. They are all optional.
(default1
rel_tol
). The relative stopping criteria (see section below on stopping conditions).1
1e-3
(default1
abs_tol
). The absolute stopping criteria (see section below on stopping conditions).1
1e-4
(default1
rho
). See section section below on choosing \( \rho \).1
1.0
(default1
adaptive_rho
). Adaptively choose1
true
. If you have found a good value of rho that works for your problem, then it may make more sense to turn off adaptive rho updating. Otherwise it is highly recommended to leave this turned on.1
rho
(default1
max_iter
). The maximum number of iterations before the solver terminates. Most problems should terminate within ~500 iterations, but there is no guarantee.1
1000
(default1
quiet
). Displays progress information every 10th iteration if set to1
false
.1
false
Output Variables
The solver returns 3 solution variables.
and1
x
are the primal solution variable \(x^\star\\text{pogs}\) and \(y^\star\\text{pogs}\).1
y
is the dual solution variable \(\lambda^\star\_\text{pogs}\) corresponding to the constraint \(Ax = y\).1
l
Warm Starting / Factors
POGS pre-computes various factors relating to the matrix \(A\) before beginning to minimize \(f + g\) (this is known as factorization caching). As a result the first iteration will take a long time compared to subsequent iterations. Therefore, if you plan to solve multiple optimization problems with the same matrix \(A\) but different functions \(f\) and \(g\), then POGS can perform the factorization once and re-use it in subsequent iterations. To use this feature simply feed the factors back in the next time you call POGS. The exact process depends on whether you’re using C++ or one of the wrappers.
Stopping Criteria
The stopping criteria used by POGS is similar to the one used by DIMACS and the SeDuMi solver. They are
\[
\begin{aligned}
\|r^{k}\| & \leq \epsilon_{\text{abs}} \sqrt{m} +\epsilon_{\text{rel}} \| (x^k, y^k) \| \\
\|s^{k}\| & \leq \epsilon_{\text{abs}} \sqrt{n} +\epsilon_{\text{rel}} \| (\lambda^k, \mu^k) \| \\
\ | p^k - d^k| & \leq \epsilon_{\text{abs}} + \epsilon_{\text{rel}} | p^k | \\
\end{aligned}
\]
where \( r^k = Ax^k - y^k \) and \( s^k = A^T\lambda^k + \mu^k \) are the primal resp. dual residuals at the \( k \)’th iteration. The scalars \( p^k \) and \( d^k \) are the primal and dual optimal values, making \( p^k - d^k \) a pseduo-gap.
Choosing \( \rho \)
The convergence rate of POGS is tightly coupled with the choice of penalty parameter \( \rho \). Optimal choices of \( \rho \) can range from \( 10^{-3} \) to \( 10^{3} \) and in general objective functions with lower curvature are more sensitive to the choice of \( \rho \). We recommend letting POGS automatically choose \( \rho \). Advanced users may prefer to choose \( \rho \) themselves for best performance.
\(h\)-Functions
A total of 16 different functions that can be specified for \(h\). They are
- kAbs – Absolute Value
- Function: \(h(x) = |x|\).
- Domain: \( \mathbf{R} \).
- Uses: \(L_1\)-norm problems (eg. to promote sparsity).
- kNegEntr – Negative Entropy
- Function: \(h(x) = -x \log(x)\).
- Domain: \( \mathbf{R}_+ \).
- Uses: Useful in problems concerning probability distributions.
- kExp – Exponential
- Function: \(h(x) = e^x\).
- Domain: \( \mathbf{R} \).
- Uses: Useful as soft version of \(I(x \leq 0) \).
- kHuber – Huber Loss
- Function: \( h(x) = \left\{\begin{aligned} &(1/2)x^2 & |x| \leq 1 \\\ &|x| - (1/2) & |x| > 1 \end{aligned} \right. \).
- Domain: \( \mathbf{R} \).
- Uses: Robust estimation (use as robust alternative to \((1/2)x^2\)).
- kIdentity – Identity
- Function: \(h(x) = x\).
- Domain: \( \mathbf{R} \).
- Uses: Typical in linear programs and quadratic programs.
- kIndBox01 – Indicator of [0, 1]-Box
- Function: \(h(x) = I(0 \leq x \leq 1)\).
- Domain: \( [0, 1] \).
- Uses: Box constraint.
- kIndEq0 – Indicator of the Origin
- Function: \(h(x) = I(x = 0)\).
- Domain: \( \{ 0 \} \).
- Uses: Equality constraint.
- kIndGe0 – Indicator of the Non-Negative Orthant
- Function: \(h(x) = I(x \geq 0)\).
- Domain: \( \mathbf{R}_+ \).
- Uses: Inequality constraint.
- kIndLe0 – Indicator of the Non-Positive Orthant
- Function: \(h(x) = I(x \leq 0)\).
- Domain: \( \mathbf{R}_- \).
- Uses: Inequality constraint.
- kLogistic – Integral of the Logistic Function
- Function: \(h(x) = \log(1 + e^x)\).
- Domain: \( \mathbf{R} \).
- Uses: Logistic Regression.
- kMaxNeg0 – Negative Part
- Function: \( h(x) = \max(0, -x) \).
- Domain: \( \mathbf{R} \).
- Uses: Hingle loss (eg. SVM).
- kMaxPos0 – Positive Part
- Function: \( h(x) = \max(0, x) \).
- Domain: \( \mathbf{R} \).
- Uses: Hingle loss (eg. SVM).
- kNegLog – Negative Log
- Function: \( h(x) = -\log(x) \).
- Domain: \( \mathbf{R}_+ \).
- Uses: Analytic centering or barrier function (eg. for solving convex subproblem).
- kSquare – Square
- Function: \( h(x) = (1/2)x^2 \).
- Domain: \( \mathbf{R} \).
- Uses: \(L_2\)-norm problems (eg. least squares, ridge regression, etc.).
- kRecipr – Reciprocal
- Function: \( h(x) = 1/x \).
- Domain: \( \mathbf{R}_+ \).
- Uses: Analytic centering or barrier function (eg. for solving convex subproblem).
- kZero – Zero
- Function: \( h(x) = 0 \).
- Domain: \( \mathbf{R} \).
- Uses: Objective does not depend on specific term.