Lasso Regression¶
Sparse linear regression with L1 regularization.
The Problem¶
Lasso solves:
\[
\text{minimize} \quad \frac{1}{2}\|Ax - b\|_2^2 + \lambda\|x\|_1
\]
The L1 penalty promotes sparse solutions - most coefficients become exactly zero.
Use cases: - Feature selection (identify important predictors) - High-dimensional regression (n > m) - Interpretable models
Quick Example¶
from pogs import solve_lasso
import numpy as np
# Generate sparse problem
np.random.seed(42)
m, n = 500, 300
# Design matrix
A = np.random.randn(m, n)
# True sparse solution (only 10 nonzeros)
x_true = np.zeros(n)
x_true[:10] = np.random.randn(10)
# Observations with noise
b = A @ x_true + 0.1 * np.random.randn(m)
# Solve
result = solve_lasso(A, b, lambd=0.1)
print(f"Solve time: {result['solve_time']*1000:.1f}ms")
print(f"Iterations: {result['iter']}")
print(f"Nonzeros found: {np.sum(np.abs(result['x']) > 1e-4)}")
print(f"Recovery error: {np.linalg.norm(result['x'] - x_true):.4f}")
Output:
Performance¶
POGS is 4-8x faster than alternatives on Lasso:
| Size | POGS | OSQP | SCS | Clarabel |
|---|---|---|---|---|
| 200x100 | 3.6ms | 32ms | 23ms | 21ms |
| 500x300 | 51ms | 399ms | 206ms | 186ms |
| 1000x500 | 340ms | 2.1s | 1.3s | 1.1s |
Benchmarks on Apple M1, Python 3.12
Choosing Lambda¶
Lambda controls sparsity:
- Large lambda (e.g., 1.0): Very sparse, many zeros
- Small lambda (e.g., 0.01): Less sparse, closer to least squares
- lambda = 0: Ordinary least squares (no regularization)
import matplotlib.pyplot as plt
lambdas = [0.001, 0.01, 0.1, 0.5, 1.0]
nnz = []
for lam in lambdas:
result = solve_lasso(A, b, lambd=lam)
nnz.append(np.sum(np.abs(result['x']) > 1e-4))
plt.semilogx(lambdas, nnz, 'o-')
plt.xlabel('lambda')
plt.ylabel('Number of nonzeros')
plt.title('Lasso regularization path')
plt.show()
Cross-Validation¶
Find optimal lambda with cross-validation:
from sklearn.linear_model import LassoCV
from sklearn.model_selection import cross_val_score
# Use sklearn for CV, then solve with POGS
lasso_cv = LassoCV(cv=5, random_state=0)
lasso_cv.fit(A, b)
best_lambda = lasso_cv.alpha_
# Solve with POGS using best lambda
result = solve_lasso(A, b, lambd=best_lambda)
print(f"Best lambda: {best_lambda:.4f}")
print(f"Nonzeros: {np.sum(np.abs(result['x']) > 1e-4)}")
Tuning Solver Parameters¶
Tolerance¶
# High accuracy
result = solve_lasso(A, b, lambd=0.1, rel_tol=1e-6, abs_tol=1e-6)
# Fast (for warm-starting or prototyping)
result = solve_lasso(A, b, lambd=0.1, rel_tol=1e-3, abs_tol=1e-3)
Initialization¶
Warm-start with a previous solution:
# Solve first problem
result1 = solve_lasso(A, b, lambd=0.1)
# Warm-start next problem (faster)
result2 = solve_lasso(A, b, lambd=0.05, x_init=result1['x'])
CVXPY Alternative¶
For more flexibility, use CVXPY:
import cvxpy as cp
x = cp.Variable(n)
objective = cp.Minimize(0.5 * cp.sum_squares(A @ x - b) + 0.1 * cp.norm(x, 1))
prob = cp.Problem(objective)
prob.solve(solver='POGS')
print(f"Optimal value: {prob.value:.4f}")
print(f"Nonzeros: {np.sum(np.abs(x.value) > 1e-4)}")
Variations¶
Elastic Net¶
Combine L1 and L2 penalties for grouped sparsity:
from pogs import solve_elastic_net
# min ||Ax - b||² + λ₁||x||₁ + λ₂||x||²
result = solve_elastic_net(A, b, l1_ratio=0.5, lambd=0.1)
Non-Negative Lasso¶
Require positive coefficients:
Weighted Lasso¶
Different penalties per coefficient (via CVXPY):
import cvxpy as cp
weights = np.ones(n)
weights[:10] = 0.01 # Less penalty on first 10 features
x = cp.Variable(n)
objective = cp.Minimize(
0.5 * cp.sum_squares(A @ x - b) + cp.norm(cp.multiply(weights, x), 1)
)
prob = cp.Problem(objective)
prob.solve(solver='POGS')
Troubleshooting¶
"Max iterations reached"¶
# Increase iterations
result = solve_lasso(A, b, lambd=0.1, max_iter=5000)
# Or check if lambda is too small (ill-conditioned)
print(f"Condition number: {np.linalg.cond(A):.1f}")
Slow convergence¶
Normalize your data:
# Standardize columns
A_scaled = (A - A.mean(axis=0)) / A.std(axis=0)
b_scaled = (b - b.mean()) / b.std()
result = solve_lasso(A_scaled, b_scaled, lambd=0.1)
See Also¶
- Logistic Regression - Classification with L1 penalty
- Ridge Regression - L2 regularization
- API Reference - Full function documentation