nlsq.loss_functions module¶
Robust loss functions for outlier handling.
Robust loss functions for outlier-resistant curve fitting.
This module provides JIT-compiled implementations of robust loss functions including Huber, Cauchy, soft_l1, and arctan for nonlinear least squares optimization. These functions reduce the influence of outliers compared to standard least squares.
Robust loss functions replace the squared residual z with a function ρ(z) that grows more slowly for large residuals, making the optimization less sensitive to outliers while preserving accuracy for well-behaved data.
- Available Loss Functions:
‘linear’ (default): Standard least squares, ρ(z) = z
‘huber’: Quadratic for small residuals, linear for large (recommended)
‘soft_l1’: Smooth approximation to L1 loss
‘cauchy’: Heavy-tailed, very robust to outliers
‘arctan’: Bounded loss function
Example
>>> from nlsq import curve_fit
>>> import jax.numpy as jnp
>>>
>>> def model(x, a, b): return a * jnp.exp(-b * x)
>>>
>>> # Fit with Huber loss to handle outliers
>>> popt, pcov = curve_fit(model, x, y, p0=[2.0, 0.5], loss='huber')
See also
nlsq.curve_fit : Main fitting function that uses these loss functions nlsq.least_squares : Lower-level interface with loss function control
- class nlsq.core.loss_functions.LossFunctionsJIT[source]
Bases:
objectJIT-compiled robust loss functions for nonlinear least squares optimization.
This class provides GPU/TPU-accelerated implementations of robust loss functions that reduce the influence of outliers in curve fitting. All loss functions are JIT-compiled for maximum performance and include analytical derivatives required for efficient optimization.
Robust Loss Function Theory¶
Standard least squares minimizes sum of squared residuals, making it sensitive to outliers. Robust loss functions ρ(z) replace z (squared residuals) with functions that grow more slowly for large residuals:
Standard LS: min Σ f_i² Robust LS: min Σ ρ(f_i²/σ²)
where σ is the scaling parameter (f_scale) and z = (f/σ)².
Available Loss Functions¶
linear: Standard least squares (ρ(z) = z) - No outlier protection - Fastest computation - Optimal for clean data without outliers
huber: Huber loss function - ρ(z) = z if z ≤ 1, else 2√z - 1 - Quadratic for small residuals, linear for large ones - Good balance between efficiency and robustness - Recommended for data with moderate outliers
soft_l1: Soft L1 loss function - ρ(z) = 2(√(1+z) - 1) - Smooth approximation to L1 norm - More robust than Huber for severe outliers - Preserves differentiability everywhere
cauchy: Cauchy (Lorentzian) loss function - ρ(z) = ln(1 + z) - Extremely robust to outliers - Can handle heavy-tailed error distributions - May converge slowly for well-behaved data
arctan: Arctangent loss function - ρ(z) = arctan(z) - Bounded loss function - Very robust to extreme outliers - Useful for data with unknown error characteristics
Mathematical Implementation¶
Each loss function computes three quantities: - ρ(z): Loss function value - ρ’(z): First derivative for gradient computation - ρ’’(z): Second derivative for Hessian approximation
The derivatives are used in the optimization algorithm: - Gradient: g = J^T (rho’(z) ⊙ f) - Hessian: H ≈ J^T diag(rho’(z)) J + J^T diag(rho’’(z) ⊙ f²) J
Performance Characteristics¶
JIT Compilation: All functions compiled for GPU/TPU acceleration
Vectorized Operations: Efficient batch processing of residuals
Memory Optimization: In-place operations where possible
Numerical Stability: Careful handling of edge cases and overflow
Usage Example¶
from nlsq.core.loss_functions import LossFunctionsJIT # Initialize loss function handler loss_jit = LossFunctionsJIT() # Get robust loss function huber_loss = loss_jit.get_loss_function('huber') # Apply to residuals residuals = jnp.array([0.1, 5.0, 0.2, 10.0]) # Contains outliers f_scale = 1.0 data_mask = jnp.ones_like(residuals, dtype=bool) # Compute loss with derivatives rho = huber_loss(residuals, f_scale, data_mask, cost_only=False) # rho[0] = loss values, rho[1] = first derivatives, rho[2] = second derivatives # Compute total cost only cost = huber_loss(residuals, f_scale, data_mask, cost_only=True)
Loss Function Selection Guidelines¶
Clean Data: Use ‘linear’ for maximum efficiency
Few Outliers: Use ‘huber’ for balanced robustness
Many Outliers: Use ‘soft_l1’ or ‘cauchy’
Unknown Data Quality: Start with ‘huber’, upgrade if needed
Extreme Outliers: Use ‘cauchy’ or ‘arctan’
Scale Parameter (f_scale)¶
The scale parameter σ (f_scale) determines the transition point between quadratic and robust behavior: - Too Small: All residuals treated as outliers - Too Large: No outlier protection - Optimal: ~median absolute residual or robust MAD estimate - Adaptive: Can be estimated during optimization
Technical Implementation Details¶
All functions handle both scalar and vector inputs
Derivatives computed analytically for accuracy
Special handling for z=0 to avoid numerical issues
Efficient masking for missing data points
Compatible with JAX transformations (grad, jit, vmap)
- __init__()[source]
- create_stack_rhos()[source]
Create JIT-compiled function to stack rho values into array.
Returns a function that stacks rho0, rho1, rho2 into a (3, n) array for efficient loss and derivative computation.
- get_empty_rhos(z)[source]
Return zero arrays for rho1 and rho2 when only cost is needed.
Used when cost_only=True to avoid computing unnecessary derivatives.
- create_huber_funcs()[source]
Create JIT-compiled Huber loss functions.
Creates huber1 (rho0 and mask) and huber2 (rho1, rho2 derivatives). Huber loss is quadratic for
|z| <= 1and linear for|z| > 1.
- huber(z, cost_only)[source]
Compute Huber loss rho values.
- create_soft_l1_funcs()[source]
Create JIT-compiled soft L1 loss functions.
Creates soft_l1_1 (rho0 and intermediate t) and soft_l1_2 (derivatives). Soft L1 is a smooth approximation to L1 loss: rho(z) = 2*(sqrt(1+z) - 1).
- soft_l1(z, cost_only)[source]
Compute soft L1 loss rho values.
- create_cauchy_funcs()[source]
Create JIT-compiled Cauchy (Lorentzian) loss functions.
Creates cauchy1 (rho0) and cauchy2 (derivatives). Cauchy loss: rho(z) = ln(1 + z). Very robust to outliers.
- cauchy(z, cost_only)[source]
Compute Cauchy loss rho values.
- create_arctan_funcs()[source]
Create JIT-compiled arctan loss functions.
Creates arctan1 (rho0) and arctan2 (derivatives). Arctan loss: rho(z) = arctan(z). Bounded loss for extreme outliers.
- arctan(z, cost_only)[source]
Compute arctan loss rho values.
- create_zscale()[source]
Create JIT-compiled function to compute scaled squared residuals.
Computes z = (f/f_scale)^2 for robust loss function input.
- create_calculate_cost()[source]
Create JIT-compiled cost calculation function.
Computes total cost as 0.5 * f_scale^2 * sum(rho0) with masking.
- create_scale_rhos()[source]
Create JIT-compiled function to scale rho values by f_scale.
Applies proper scaling:
rho0 *= f_scale**2,rho2 /= f_scale**2.
- construct_single_loss_function(loss)[source]
- construct_all_loss_functions()[source]
- get_loss_function(loss)[source]