Skip to content

Logistic Regression

Binary classification with L1 regularization.


The Problem

L1-regularized logistic regression solves:

\[ \text{minimize} \quad \sum_{i=1}^m \log(1 + e^{-y_i (a_i^T x)}) + \lambda\|x\|_1 \]

The L1 penalty promotes sparse solutions - automatic feature selection for classification.

Use cases: - Binary classification - Feature selection for high-dimensional data - Interpretable predictive models


Quick Example

from pogs import solve_logistic
import numpy as np

# Generate classification data
np.random.seed(42)
m, n = 500, 100

# Feature matrix
A = np.random.randn(m, n)

# True sparse coefficients (only 10 features matter)
w_true = np.zeros(n)
w_true[:10] = np.random.randn(10)

# Generate binary labels {-1, +1}
prob = 1 / (1 + np.exp(-A @ w_true))
y = 2 * (np.random.rand(m) < prob) - 1

# Solve
result = solve_logistic(A, y, lambd=0.1)

print(f"Iterations: {result['iterations']}")
print(f"Optimal value: {result['optval']:.4f}")
print(f"Nonzero coefficients: {np.sum(np.abs(result['x']) > 1e-4)}")

# Compute accuracy
pred = np.sign(A @ result['x'])
accuracy = np.mean(pred == y)
print(f"Training accuracy: {accuracy*100:.1f}%")

Output:

Iterations: 85
Optimal value: 198.4521
Nonzero coefficients: 12
Training accuracy: 89.4%


Performance

POGS is 5-9x faster than alternatives on logistic regression:

Size POGS OSQP SCS Clarabel
200x50 12ms 98ms 67ms 58ms
500x100 34ms 312ms 198ms 167ms
1000x200 156ms 1.4s 890ms 720ms

Benchmarks on Apple M1, Python 3.12


Label Format

POGS expects labels in {-1, +1} format:

# Convert from {0, 1} to {-1, +1}
y = 2 * y_binary - 1

# Or from boolean
y = 2 * y_bool.astype(int) - 1

Choosing Lambda

Lambda controls model complexity:

  • Large lambda (e.g., 1.0): Few features, simpler model
  • Small lambda (e.g., 0.001): More features, complex model
  • lambda = 0: No regularization (may overfit)
lambdas = [0.001, 0.01, 0.1, 1.0]

for lam in lambdas:
    result = solve_logistic(A, y, lambd=lam)
    nnz = np.sum(np.abs(result['x']) > 1e-4)

    pred = np.sign(A @ result['x'])
    acc = np.mean(pred == y)

    print(f"lambda={lam}: {nnz} features, {acc*100:.1f}% accuracy")

Tuning Solver Parameters

# High accuracy
result = solve_logistic(A, y, lambd=0.1, rel_tol=1e-6, abs_tol=1e-6)

# More iterations for difficult problems
result = solve_logistic(A, y, lambd=0.1, max_iter=5000)

# Verbose output
result = solve_logistic(A, y, lambd=0.1, verbose=2)

Making Predictions

Class Labels

# Predict class labels
x = result['x']
pred = np.sign(A_test @ x)

Probabilities

# Predict probabilities
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

z = A_test @ x
prob_positive = sigmoid(z)
prob_negative = 1 - prob_positive

CVXPY Alternative

For more flexibility (e.g., intercept term):

import cvxpy as cp

x = cp.Variable(n)
b = cp.Variable()  # Intercept

objective = cp.Minimize(
    cp.sum(cp.logistic(-cp.multiply(y, A @ x + b))) + 0.1 * cp.norm(x, 1)
)
prob = cp.Problem(objective)
prob.solve(solver='POGS')

print(f"Intercept: {b.value:.4f}")

Variations

L2 Regularization

For ridge logistic regression (no sparsity):

import cvxpy as cp

x = cp.Variable(n)
objective = cp.Minimize(
    cp.sum(cp.logistic(-cp.multiply(y, A @ x))) + 0.1 * cp.sum_squares(x)
)
prob = cp.Problem(objective)
prob.solve(solver='POGS')

Elastic Net

Combine L1 and L2:

import cvxpy as cp

x = cp.Variable(n)
objective = cp.Minimize(
    cp.sum(cp.logistic(-cp.multiply(y, A @ x)))
    + 0.1 * cp.norm(x, 1)
    + 0.01 * cp.sum_squares(x)
)
prob = cp.Problem(objective)
prob.solve(solver='POGS')

Troubleshooting

"Max iterations reached"

Logistic regression can be harder to converge:

# More iterations
result = solve_logistic(A, y, lambd=0.1, max_iter=5000)

# Or increase lambda for better conditioning
result = solve_logistic(A, y, lambd=0.5)

Poor accuracy

Check data scaling:

# Standardize features
A_scaled = (A - A.mean(axis=0)) / A.std(axis=0)
result = solve_logistic(A_scaled, y, lambd=0.1)

See Also