This is a high level description of the Alternating Direction Method of Multipliers (ADMM) specific to graph form problems. For more detail we recommmend reading the papers in the references section.

Augmented Lagrangian

Consider the convex optimiztion problem

\begin{equation} \begin{aligned} &\text{minimize} & & f(x) \\
& \text{subject to} & & x \in \mathcal{C}, \end{aligned} ~~~~~~~~~~~~~~~~~~~~~~~~~\text{(1)} \end{equation}

where \( f \) is a convex function and \( \mathcal{C} \) is a convex set. This is equivalent to

\[ \begin{aligned} &\text{minimize} & & f(x) + I(z \in \mathcal{C}) \\
& \text{subject to} & & x = z, \end{aligned} ~~~~~~~~~~\text{(2)} \]

where \( I(z \in \mathcal{C}) \) is the indicator function of the convex set \(\mathcal{C}\). We can augment the objective by \( (\rho/2) \|x - z\|^2 \) without changing the problem

\[ \begin{aligned} &\text{minimize} & & f(x) + I(z \in \mathcal{C}) + (\rho/2) \|x - z\|^2 \\
& \text{subject to} & & x = z. \end{aligned} \]

Re-writing the problem using Lagrange multipliers yields

\[ \inf_{x,z} \sup_y~~f(x) + I(z \in \mathcal{C}) + y^T(x - z) + (\rho/2) \|x - z\|^2_2 \]

The function \( L_\rho(x,y,z) = f(x) + I(z \in \mathcal{C}) + y^T(x - z) + (\rho/2) \|x - z\|^2 \) is known as the augmented Lagrangian of (2).

Alternating Direction Method of Multipliers

Consider the problem of finding \( (x^\star, y^\star, z^\star) \) such that

\[ (x^\star, y^\star, z^\star) = \arg\inf_{x,z} \sup_y L_\rho(x, y, z). \]

Solving this problem is equivalent to solving the original problem and is therefore just as hard. ADMM consists of alternating between minimizing \( L_\rho \) over \( x \), then \( z \) and lastly taking a gradient step with respect to \( y \). The algorithm can be written as

\[ \begin{aligned} x^{k+1} &=\arg\min_x~L_\rho(x, y^k, z^k) \\
z^{k+1} &=\arg\min_z~L_\rho(x^{k+1}, y^k, z) \\
y^{k+1} &= y^k + \rho (x^{k+1} - z^{k+1}), \end{aligned} \]

which can be simplified to

\[ \begin{aligned} x^{k+1} &=\mathbf{prox}_f(z^k - (1 / \rho) y^k) \\
z^{k+1} &=\Pi_\mathcal{C}(x^{k+1} + (1/\rho)y^{k}) \\
y^{k+1} &= y^k + \rho (x^{k+1} - z^{k+1}), \end{aligned} \]

where \( \Pi_\mathcal{C}(\cdot) \) is the orthogonal projection onto the convex set \( \mathcal{C} \) and \( \mathbf{prox}_f\) is the proximal operator, defined as

\[ \mathbf{prox}_{f}(v) = \arg\min_{x}\Big(f(x) + (\rho/2) \|x - v\|_2^2\Big). \]

Note, this is a simplified version of ADMM.

ADMM applied to Graph Form Problems

To apply ADMM to graph form problems use the equivalence table

\[ \begin{aligned} &\text{Problem (1)} & && \text{Graph Form} \\
&x & \leftrightarrow && (x, y) \\
&f & \leftrightarrow && f + g \\
&\mathcal{C} & \leftrightarrow && \{ (x, y) ~|~ y = Ax \} \end{aligned} \]

Further simplifications

  • The constraint set in graph form problems is affine, and as a result, the projection \( \Pi_\mathcal{C}(\cdot) \) can be expressed as a matrix multiplication.
  • We make the assumption that \( f \) and \( g \) are separable, which means the proximal operator can be expressed as \[ \mathbf{prox}_f(v) = \big(\mathbf{prox}_{f_1}(v_1), \ldots, \mathbf{prox}_{f_m}(v_m)\big) \] In POGS, we introduce a library of proximal operators (see
    1
    <pogs>/src/prox_lib.h
    
    ), thereby hiding the implementation details from the user.

Pros and Cons of using ADMM

Reasons why ADMM can be much faster than conventional methods

  • Alternating minimization makes it possible to perform much of the work in parallel.
  • By using factorization caching, a large portion of the work can be re-used between iterations.

Some considerations when using ADMM

  • The convergence rate is highly dependent on the choice of \( \rho \).
  • Convergence can be slow if you require more than 3-4 digits of accuracy.

References

  1. Block Splitting for Distributed Optimization – N. Parikh and S. Boyd
  2. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers – S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein
  3. Proximal Algorithms – N. Parikh and S. Boyd