nlsq.loss_functions module¶

Robust loss functions for outlier handling.

Robust loss functions for outlier-resistant curve fitting.

This module provides JIT-compiled implementations of robust loss functions including Huber, Cauchy, soft_l1, and arctan for nonlinear least squares optimization. These functions reduce the influence of outliers compared to standard least squares.

Robust loss functions replace the squared residual z with a function ρ(z) that grows more slowly for large residuals, making the optimization less sensitive to outliers while preserving accuracy for well-behaved data.

Available Loss Functions:

‘linear’ (default): Standard least squares, ρ(z) = z
‘huber’: Quadratic for small residuals, linear for large (recommended)
‘soft_l1’: Smooth approximation to L1 loss
‘cauchy’: Heavy-tailed, very robust to outliers
‘arctan’: Bounded loss function

Example

>>> from nlsq import curve_fit
>>> import jax.numpy as jnp
>>>
>>> def model(x, a, b): return a * jnp.exp(-b * x)
>>>
>>> # Fit with Huber loss to handle outliers
>>> popt, pcov = curve_fit(model, x, y, p0=[2.0, 0.5], loss='huber')

Robust Loss Function Theory¶

Standard least squares minimizes sum of squared residuals, making it sensitive to outliers. Robust loss functions ρ(z) replace z (squared residuals) with functions that grow more slowly for large residuals:

Standard LS: min Σ f_i² Robust LS: min Σ ρ(f_i²/σ²)

where σ is the scaling parameter (f_scale) and z = (f/σ)².

Available Loss Functions¶

linear: Standard least squares (ρ(z) = z) - No outlier protection - Fastest computation - Optimal for clean data without outliers
huber: Huber loss function - ρ(z) = z if z ≤ 1, else 2√z - 1 - Quadratic for small residuals, linear for large ones - Good balance between efficiency and robustness - Recommended for data with moderate outliers
soft_l1: Soft L1 loss function - ρ(z) = 2(√(1+z) - 1) - Smooth approximation to L1 norm - More robust than Huber for severe outliers - Preserves differentiability everywhere
cauchy: Cauchy (Lorentzian) loss function - ρ(z) = ln(1 + z) - Extremely robust to outliers - Can handle heavy-tailed error distributions - May converge slowly for well-behaved data
arctan: Arctangent loss function - ρ(z) = arctan(z) - Bounded loss function - Very robust to extreme outliers - Useful for data with unknown error characteristics

Mathematical Implementation¶

Each loss function computes three quantities: - ρ(z): Loss function value - ρ’(z): First derivative for gradient computation - ρ’’(z): Second derivative for Hessian approximation

The derivatives are used in the optimization algorithm: - Gradient: g = J^T (rho’(z) ⊙ f) - Hessian: H ≈ J^T diag(rho’(z)) J + J^T diag(rho’’(z) ⊙ f²) J

Performance Characteristics¶

JIT Compilation: All functions compiled for GPU/TPU acceleration
Vectorized Operations: Efficient batch processing of residuals
Memory Optimization: In-place operations where possible
Numerical Stability: Careful handling of edge cases and overflow

Usage Example¶

from nlsq.core.loss_functions import LossFunctionsJIT

# Initialize loss function handler
loss_jit = LossFunctionsJIT()

# Get robust loss function
huber_loss = loss_jit.get_loss_function('huber')

# Apply to residuals
residuals = jnp.array([0.1, 5.0, 0.2, 10.0])  # Contains outliers
f_scale = 1.0
data_mask = jnp.ones_like(residuals, dtype=bool)

# Compute loss with derivatives
rho = huber_loss(residuals, f_scale, data_mask, cost_only=False)
# rho[0] = loss values, rho[1] = first derivatives, rho[2] = second derivatives

# Compute total cost only
cost = huber_loss(residuals, f_scale, data_mask, cost_only=True)

Loss Function Selection Guidelines¶

Clean Data: Use ‘linear’ for maximum efficiency
Few Outliers: Use ‘huber’ for balanced robustness
Many Outliers: Use ‘soft_l1’ or ‘cauchy’
Unknown Data Quality: Start with ‘huber’, upgrade if needed
Extreme Outliers: Use ‘cauchy’ or ‘arctan’

Scale Parameter (f_scale)¶

The scale parameter σ (f_scale) determines the transition point between quadratic and robust behavior: - Too Small: All residuals treated as outliers - Too Large: No outlier protection - Optimal: ~median absolute residual or robust MAD estimate - Adaptive: Can be estimated during optimization

Technical Implementation Details¶

All functions handle both scalar and vector inputs
Derivatives computed analytically for accuracy
Special handling for z=0 to avoid numerical issues
Efficient masking for missing data points
Compatible with JAX transformations (grad, jit, vmap)

__init__()[source]

create_stack_rhos()[source]

Create JIT-compiled function to stack rho values into array.

Returns a function that stacks rho0, rho1, rho2 into a (3, n) array for efficient loss and derivative computation.

get_empty_rhos(z)[source]

Return zero arrays for rho1 and rho2 when only cost is needed.

Used when cost_only=True to avoid computing unnecessary derivatives.

create_huber_funcs()[source]

Create JIT-compiled Huber loss functions.

Creates huber1 (rho0 and mask) and huber2 (rho1, rho2 derivatives). Huber loss is quadratic for |z| <= 1 and linear for |z| > 1.

huber(z, cost_only)[source]: Compute Huber loss rho values.

create_soft_l1_funcs()[source]

Create JIT-compiled soft L1 loss functions.

Creates soft_l1_1 (rho0 and intermediate t) and soft_l1_2 (derivatives). Soft L1 is a smooth approximation to L1 loss: rho(z) = 2*(sqrt(1+z) - 1).

soft_l1(z, cost_only)[source]: Compute soft L1 loss rho values.

create_cauchy_funcs()[source]

Create JIT-compiled Cauchy (Lorentzian) loss functions.

Creates cauchy1 (rho0) and cauchy2 (derivatives). Cauchy loss: rho(z) = ln(1 + z). Very robust to outliers.

cauchy(z, cost_only)[source]: Compute Cauchy loss rho values.

create_arctan_funcs()[source]

Create JIT-compiled arctan loss functions.

Creates arctan1 (rho0) and arctan2 (derivatives). Arctan loss: rho(z) = arctan(z). Bounded loss for extreme outliers.

arctan(z, cost_only)[source]: Compute arctan loss rho values.

create_zscale()[source]

Create JIT-compiled function to compute scaled squared residuals.

Computes z = (f/f_scale)^2 for robust loss function input.

create_calculate_cost()[source]

Create JIT-compiled cost calculation function.

Computes total cost as 0.5 * f_scale^2 * sum(rho0) with masking.

create_scale_rhos()[source]

Create JIT-compiled function to scale rho values by f_scale.

Applies proper scaling: rho0 *= f_scale**2, rho2 /= f_scale**2.

construct_single_loss_function(loss)[source]

construct_all_loss_functions()[source]

get_loss_function(loss)[source]