Lasso Regression¶

Sparse linear regression with L1 regularization.

The Problem¶

Lasso solves:

\[ \text{minimize} \quad \frac{1}{2}\|Ax - b\|_2^2 + \lambda\|x\|_1 \]

The L1 penalty promotes sparse solutions - most coefficients become exactly zero.

Use cases: - Feature selection (identify important predictors) - High-dimensional regression (n > m) - Interpretable models

Quick Example¶

from pogs import solve_lasso
import numpy as np

# Generate sparse problem
np.random.seed(42)
m, n = 500, 300

# Design matrix
A = np.random.randn(m, n)

# True sparse solution (only 10 nonzeros)
x_true = np.zeros(n)
x_true[:10] = np.random.randn(10)

# Observations with noise
b = A @ x_true + 0.1 * np.random.randn(m)

# Solve
result = solve_lasso(A, b, lambd=0.1)

print(f"Solve time: {result['solve_time']*1000:.1f}ms")
print(f"Iterations: {result['iter']}")
print(f"Nonzeros found: {np.sum(np.abs(result['x']) > 1e-4)}")
print(f"Recovery error: {np.linalg.norm(result['x'] - x_true):.4f}")

Output:

Solve time: 51.2ms
Iterations: 60
Nonzeros found: 10
Recovery error: 0.0234

Performance¶

POGS is 4-8x faster than alternatives on Lasso:

Size	POGS	OSQP	SCS	Clarabel
200x100	3.6ms	32ms	23ms	21ms
500x300	51ms	399ms	206ms	186ms
1000x500	340ms	2.1s	1.3s	1.1s

Benchmarks on Apple M1, Python 3.12

Choosing Lambda¶

Lambda controls sparsity:

Large lambda (e.g., 1.0): Very sparse, many zeros
Small lambda (e.g., 0.01): Less sparse, closer to least squares
lambda = 0: Ordinary least squares (no regularization)

import matplotlib.pyplot as plt

lambdas = [0.001, 0.01, 0.1, 0.5, 1.0]
nnz = []

for lam in lambdas:
    result = solve_lasso(A, b, lambd=lam)
    nnz.append(np.sum(np.abs(result['x']) > 1e-4))

plt.semilogx(lambdas, nnz, 'o-')
plt.xlabel('lambda')
plt.ylabel('Number of nonzeros')
plt.title('Lasso regularization path')
plt.show()

Cross-Validation¶

Find optimal lambda with cross-validation:

from sklearn.linear_model import LassoCV
from sklearn.model_selection import cross_val_score

# Use sklearn for CV, then solve with POGS
lasso_cv = LassoCV(cv=5, random_state=0)
lasso_cv.fit(A, b)
best_lambda = lasso_cv.alpha_

# Solve with POGS using best lambda
result = solve_lasso(A, b, lambd=best_lambda)
print(f"Best lambda: {best_lambda:.4f}")
print(f"Nonzeros: {np.sum(np.abs(result['x']) > 1e-4)}")

Tuning Solver Parameters¶

Tolerance¶

# High accuracy
result = solve_lasso(A, b, lambd=0.1, rel_tol=1e-6, abs_tol=1e-6)

# Fast (for warm-starting or prototyping)
result = solve_lasso(A, b, lambd=0.1, rel_tol=1e-3, abs_tol=1e-3)

Initialization¶

Warm-start with a previous solution:

# Solve first problem
result1 = solve_lasso(A, b, lambd=0.1)

# Warm-start next problem (faster)
result2 = solve_lasso(A, b, lambd=0.05, x_init=result1['x'])

CVXPY Alternative¶

For more flexibility, use CVXPY:

import cvxpy as cp

x = cp.Variable(n)
objective = cp.Minimize(0.5 * cp.sum_squares(A @ x - b) + 0.1 * cp.norm(x, 1))
prob = cp.Problem(objective)
prob.solve(solver='POGS')

print(f"Optimal value: {prob.value:.4f}")
print(f"Nonzeros: {np.sum(np.abs(x.value) > 1e-4)}")

Variations¶

Elastic Net¶

Combine L1 and L2 penalties for grouped sparsity:

from pogs import solve_elastic_net

# min ||Ax - b||² + λ₁||x||₁ + λ₂||x||²
result = solve_elastic_net(A, b, l1_ratio=0.5, lambd=0.1)

Non-Negative Lasso¶

Require positive coefficients:

from pogs import solve_nonneg_ls

# min ||Ax - b||² s.t. x >= 0
result = solve_nonneg_ls(A, b)

Weighted Lasso¶

Different penalties per coefficient (via CVXPY):

import cvxpy as cp

weights = np.ones(n)
weights[:10] = 0.01  # Less penalty on first 10 features

x = cp.Variable(n)
objective = cp.Minimize(
    0.5 * cp.sum_squares(A @ x - b) + cp.norm(cp.multiply(weights, x), 1)
)
prob = cp.Problem(objective)
prob.solve(solver='POGS')

Troubleshooting¶

"Max iterations reached"¶

# Increase iterations
result = solve_lasso(A, b, lambd=0.1, max_iter=5000)

# Or check if lambda is too small (ill-conditioned)
print(f"Condition number: {np.linalg.cond(A):.1f}")

Slow convergence¶

Normalize your data:

# Standardize columns
A_scaled = (A - A.mean(axis=0)) / A.std(axis=0)
b_scaled = (b - b.mean()) / b.std()

result = solve_lasso(A_scaled, b_scaled, lambd=0.1)