nlsq.loss_functions module

Robust loss functions for outlier handling.

Robust loss functions for outlier-resistant curve fitting.

This module provides JIT-compiled implementations of robust loss functions including Huber, Cauchy, soft_l1, and arctan for nonlinear least squares optimization. These functions reduce the influence of outliers compared to standard least squares.

Robust loss functions replace the squared residual z with a function ρ(z) that grows more slowly for large residuals, making the optimization less sensitive to outliers while preserving accuracy for well-behaved data.

Available Loss Functions:
  • ‘linear’ (default): Standard least squares, ρ(z) = z

  • ‘huber’: Quadratic for small residuals, linear for large (recommended)

  • ‘soft_l1’: Smooth approximation to L1 loss

  • ‘cauchy’: Heavy-tailed, very robust to outliers

  • ‘arctan’: Bounded loss function

Example

>>> from nlsq import curve_fit
>>> import jax.numpy as jnp
>>>
>>> def model(x, a, b): return a * jnp.exp(-b * x)
>>>
>>> # Fit with Huber loss to handle outliers
>>> popt, pcov = curve_fit(model, x, y, p0=[2.0, 0.5], loss='huber')

See also

nlsq.curve_fit : Main fitting function that uses these loss functions nlsq.least_squares : Lower-level interface with loss function control

class nlsq.core.loss_functions.LossFunctionsJIT[source]

Bases: object

JIT-compiled robust loss functions for nonlinear least squares optimization.

This class provides GPU/TPU-accelerated implementations of robust loss functions that reduce the influence of outliers in curve fitting. All loss functions are JIT-compiled for maximum performance and include analytical derivatives required for efficient optimization.

Robust Loss Function Theory

Standard least squares minimizes sum of squared residuals, making it sensitive to outliers. Robust loss functions ρ(z) replace z (squared residuals) with functions that grow more slowly for large residuals:

Standard LS: min Σ f_i² Robust LS: min Σ ρ(f_i²/σ²)

where σ is the scaling parameter (f_scale) and z = (f/σ)².

Available Loss Functions

  1. linear: Standard least squares (ρ(z) = z) - No outlier protection - Fastest computation - Optimal for clean data without outliers

  2. huber: Huber loss function - ρ(z) = z if z ≤ 1, else 2√z - 1 - Quadratic for small residuals, linear for large ones - Good balance between efficiency and robustness - Recommended for data with moderate outliers

  3. soft_l1: Soft L1 loss function - ρ(z) = 2(√(1+z) - 1) - Smooth approximation to L1 norm - More robust than Huber for severe outliers - Preserves differentiability everywhere

  4. cauchy: Cauchy (Lorentzian) loss function - ρ(z) = ln(1 + z) - Extremely robust to outliers - Can handle heavy-tailed error distributions - May converge slowly for well-behaved data

  5. arctan: Arctangent loss function - ρ(z) = arctan(z) - Bounded loss function - Very robust to extreme outliers - Useful for data with unknown error characteristics

Mathematical Implementation

Each loss function computes three quantities: - ρ(z): Loss function value - ρ’(z): First derivative for gradient computation - ρ’’(z): Second derivative for Hessian approximation

The derivatives are used in the optimization algorithm: - Gradient: g = J^T (rho’(z) ⊙ f) - Hessian: H ≈ J^T diag(rho’(z)) J + J^T diag(rho’’(z) ⊙ f²) J

Performance Characteristics

  • JIT Compilation: All functions compiled for GPU/TPU acceleration

  • Vectorized Operations: Efficient batch processing of residuals

  • Memory Optimization: In-place operations where possible

  • Numerical Stability: Careful handling of edge cases and overflow

Usage Example

from nlsq.core.loss_functions import LossFunctionsJIT

# Initialize loss function handler
loss_jit = LossFunctionsJIT()

# Get robust loss function
huber_loss = loss_jit.get_loss_function('huber')

# Apply to residuals
residuals = jnp.array([0.1, 5.0, 0.2, 10.0])  # Contains outliers
f_scale = 1.0
data_mask = jnp.ones_like(residuals, dtype=bool)

# Compute loss with derivatives
rho = huber_loss(residuals, f_scale, data_mask, cost_only=False)
# rho[0] = loss values, rho[1] = first derivatives, rho[2] = second derivatives

# Compute total cost only
cost = huber_loss(residuals, f_scale, data_mask, cost_only=True)

Loss Function Selection Guidelines

  • Clean Data: Use ‘linear’ for maximum efficiency

  • Few Outliers: Use ‘huber’ for balanced robustness

  • Many Outliers: Use ‘soft_l1’ or ‘cauchy’

  • Unknown Data Quality: Start with ‘huber’, upgrade if needed

  • Extreme Outliers: Use ‘cauchy’ or ‘arctan’

Scale Parameter (f_scale)

The scale parameter σ (f_scale) determines the transition point between quadratic and robust behavior: - Too Small: All residuals treated as outliers - Too Large: No outlier protection - Optimal: ~median absolute residual or robust MAD estimate - Adaptive: Can be estimated during optimization

Technical Implementation Details

  • All functions handle both scalar and vector inputs

  • Derivatives computed analytically for accuracy

  • Special handling for z=0 to avoid numerical issues

  • Efficient masking for missing data points

  • Compatible with JAX transformations (grad, jit, vmap)

__init__()[source]
create_stack_rhos()[source]

Create JIT-compiled function to stack rho values into array.

Returns a function that stacks rho0, rho1, rho2 into a (3, n) array for efficient loss and derivative computation.

get_empty_rhos(z)[source]

Return zero arrays for rho1 and rho2 when only cost is needed.

Used when cost_only=True to avoid computing unnecessary derivatives.

create_huber_funcs()[source]

Create JIT-compiled Huber loss functions.

Creates huber1 (rho0 and mask) and huber2 (rho1, rho2 derivatives). Huber loss is quadratic for |z| <= 1 and linear for |z| > 1.

huber(z, cost_only)[source]

Compute Huber loss rho values.

create_soft_l1_funcs()[source]

Create JIT-compiled soft L1 loss functions.

Creates soft_l1_1 (rho0 and intermediate t) and soft_l1_2 (derivatives). Soft L1 is a smooth approximation to L1 loss: rho(z) = 2*(sqrt(1+z) - 1).

soft_l1(z, cost_only)[source]

Compute soft L1 loss rho values.

create_cauchy_funcs()[source]

Create JIT-compiled Cauchy (Lorentzian) loss functions.

Creates cauchy1 (rho0) and cauchy2 (derivatives). Cauchy loss: rho(z) = ln(1 + z). Very robust to outliers.

cauchy(z, cost_only)[source]

Compute Cauchy loss rho values.

create_arctan_funcs()[source]

Create JIT-compiled arctan loss functions.

Creates arctan1 (rho0) and arctan2 (derivatives). Arctan loss: rho(z) = arctan(z). Bounded loss for extreme outliers.

arctan(z, cost_only)[source]

Compute arctan loss rho values.

create_zscale()[source]

Create JIT-compiled function to compute scaled squared residuals.

Computes z = (f/f_scale)^2 for robust loss function input.

create_calculate_cost()[source]

Create JIT-compiled cost calculation function.

Computes total cost as 0.5 * f_scale^2 * sum(rho0) with masking.

create_scale_rhos()[source]

Create JIT-compiled function to scale rho values by f_scale.

Applies proper scaling: rho0 *= f_scale**2, rho2 /= f_scale**2.

construct_single_loss_function(loss)[source]
construct_all_loss_functions()[source]
get_loss_function(loss)[source]