nlsq package

Package Overview

NLSQ (Nonlinear Least Squares) is a JAX-powered library that provides GPU/TPU-accelerated curve fitting with automatic differentiation. It offers a drop-in replacement for SciPy’s curve_fit function with significant performance improvements on modern hardware (150-270x speedup on GPUs).

Key Features:

  • GPU/TPU Acceleration: JAX JIT compilation to XLA for massive speedups

  • Automatic Differentiation: No manual Jacobian calculations required

  • Large Dataset Support: Automatic chunking and memory management for 100M+ points

  • Production-Ready: 3389 tests, 100% pass rate

  • Drop-in Compatibility: Minimal code changes from scipy.optimize.curve_fit

Quick Start Example

Basic exponential fit:

import numpy as np
from nlsq import curve_fit
import jax.numpy as jnp


# Define model function
def exponential(x, a, b):
    return a * jnp.exp(-b * x)


# Generate data
x = np.linspace(0, 5, 1000)
y = 2.5 * np.exp(-1.3 * x) + 0.01 * np.random.randn(1000)

# Fit (GPU-accelerated!)
popt, pcov = curve_fit(exponential, x, y, p0=[2, 1])
print(f"Fitted parameters: a={popt[0]:.2f}, b={popt[1]:.2f}")

Large dataset with automatic chunking:

from nlsq import curve_fit_large

# 50 million points - automatically chunked
x_large = np.linspace(0, 10, 50_000_000)
y_large = 2.0 * np.exp(-0.5 * x_large) + 0.3

popt, pcov = curve_fit_large(
    exponential,
    x_large,
    y_large,
    p0=[2.5, 0.6],
    memory_limit_gb=4.0,
    show_progress=True
)

See Also

Submodules

See NLSQ API Reference for complete documentation of all submodules.

Primary Entry Points

nlsq.fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, workflow=None, preset=None, multistart=None, n_starts=None, sampler='lhs', center_on_p0=True, scale_factor=1.0, memory_limit_gb=None, size_threshold=1000000, show_progress=False, chunk_size=None, **kwargs)[source]

Unified curve fitting function with preset-based configuration.

This function provides a simplified API for curve fitting with sensible defaults based on preset configurations. It automatically selects the appropriate backend (curve_fit, curve_fit_large, or streaming) based on the preset and dataset characteristics.

Parameters:
  • f (callable) – Model function f(x, *params) -> y. Must use jax.numpy operations.

  • xdata (array_like) – Independent variable data.

  • ydata (array_like) – Dependent variable data.

  • p0 (array_like, optional) – Initial parameter guess.

  • sigma (array_like, optional) – Uncertainties in ydata for weighted fitting.

  • absolute_sigma (bool, optional) – Whether sigma represents absolute uncertainties.

  • check_finite (bool, optional) – Check for finite input values.

  • bounds (tuple, optional) – Parameter bounds as (lower, upper).

  • method (str, optional) – Optimization algorithm (‘trf’, ‘lm’, or None for auto).

  • preset ({'fast', 'robust', 'global', 'streaming', 'large'}, optional) –

    Preset configuration to use:

    • ’fast’: Single-start optimization for maximum speed (n_starts=0)

    • ’robust’: Multi-start with 5 starts for robustness

    • ’global’: Thorough global search with 20 starts

    • ’streaming’: Streaming optimization for large datasets with multi-start

    • ’large’: Auto-detect dataset size and use appropriate strategy

    If None, defaults to ‘fast’ for small datasets or ‘large’ for datasets exceeding size_threshold.

  • multistart (bool, optional) – Override preset’s multi-start setting.

  • n_starts (int, optional) – Override preset’s n_starts setting.

  • sampler ({'lhs', 'sobol', 'halton'}, optional) – Sampling strategy for multi-start. Default: ‘lhs’.

  • center_on_p0 (bool, optional) – Center multi-start samples around p0. Default: True.

  • scale_factor (float, optional) – Scale factor for exploration region. Default: 1.0.

  • memory_limit_gb (float, optional) – Maximum memory usage in GB for large datasets.

  • size_threshold (int, optional) – Point threshold for large dataset processing (default: 1M).

  • show_progress (bool, optional) – Display progress bar for long operations.

  • chunk_size (int, optional) – Override automatic chunk size calculation.

  • **kwargs (Any) – Additional optimization parameters (ftol, xtol, gtol, max_nfev, loss).

Returns:

result – Optimization result. Contains popt, pcov, and multistart_diagnostics. Supports tuple unpacking: popt, pcov = fit(…)

Return type:

CurveFitResult or tuple

Examples

Basic usage with default preset:

>>> popt, pcov = fit(model_func, xdata, ydata, p0=[1, 2, 3])

Using ‘robust’ preset for multi-start:

>>> result = fit(model_func, xdata, ydata, p0=[1, 2, 3],
...              bounds=([0, 0, 0], [10, 10, 10]), preset='robust')

Using ‘global’ preset for thorough search:

>>> result = fit(model_func, xdata, ydata, p0=[1, 2, 3],
...              bounds=([0, 0, 0], [10, 10, 10]), preset='global')

Large dataset with auto-detection:

>>> result = fit(model_func, big_xdata, big_ydata,
...              preset='large', show_progress=True)

See also

curve_fit

Lower-level API with full control

curve_fit_large

Specialized API for large datasets

MemoryBudgetSelector

Memory-based optimizer selection

nlsq.curve_fit(f, xdata, ydata, *args, auto_bounds=False, bounds_safety_factor=10.0, stability=False, rescale_data=True, max_jacobian_elements_for_svd=10000000, fallback=False, max_fallback_attempts=10, fallback_verbose=False, multistart=False, n_starts=10, global_search=False, sampler='lhs', center_on_p0=True, scale_factor=1.0, method=None, cmaes_config=None, compute_diagnostics=False, diagnostics_level=DiagnosticLevel.BASIC, diagnostics_config=None, **kwargs)[source]

Use nonlinear least squares to fit a function to data with GPU/TPU acceleration.

This is the main user-facing function that provides a drop-in replacement for scipy.optimize.curve_fit with GPU/TPU acceleration via JAX. The function automatically handles JAX JIT compilation, double precision configuration, and optimization algorithm selection.

Parameters:
  • f (callable) – The model function f(x, *popt) -> y. Must be JAX-compatible, meaning it should use jax.numpy instead of numpy for mathematical operations to enable GPU acceleration and automatic differentiation.

  • xdata (array_like) – The independent variable where the data is measured.

  • ydata (array_like) – The dependent data, nominally f(xdata, *popt).

  • auto_bounds (bool, optional) –

    Enable automatic parameter bounds inference from data characteristics. When True, reasonable bounds are inferred based on:

    • Data ranges (x and y)

    • Initial parameter guess (p0)

    • Parameter positivity constraints

    • Safety factors to avoid over-constraining

    The inferred bounds are merged with any user-provided bounds via the bounds parameter. User bounds take precedence where specified. Default: False.

  • bounds_safety_factor (float, optional) – Safety multiplier for automatic bounds (larger = more conservative). Only used when auto_bounds=True. Default: 10.0.

  • stability ({'auto', 'check', False}, optional) –

    Control numerical stability checks and automatic fixes:

    • ’auto’: Check for stability issues and automatically apply fixes (optionally rescale data, normalize parameters, handle NaN/Inf)

    • ’check’: Check for stability issues and warn, but don’t apply fixes

    • False: Skip stability checks entirely (default)

    When ‘auto’, detected issues are fixed before optimization:

    • Ill-conditioned data (condition number > 1e10) is rescaled to [0, 1] (only if rescale_data=True)

    • Large data ranges (> 1e4) are normalized (only if rescale_data=True)

    • NaN/Inf values are replaced with mean

    • Parameter scale mismatches (ratio > 1e6) are normalized

    Default: False.

  • rescale_data (bool, optional) – When stability=’auto’, controls whether data is automatically rescaled to [0, 1] for ill-conditioned or large-range data. Set to False for applications where data must maintain physical units (e.g., time in seconds, frequency in Hz). NaN/Inf handling and parameter normalization are still applied when stability=’auto’. Default: True.

  • max_jacobian_elements_for_svd (int, optional) –

    Maximum number of elements in the Jacobian matrix (m x n) for which SVD will be computed during stability checks. For larger Jacobians, SVD is skipped to avoid O(min(m,n)^2 x max(m,n)) computation overhead. NaN/Inf checking is still performed for large Jacobians.

    Examples of element counts: - 1M data points x 7 params = 7M elements - 100K data points x 100 params = 10M elements

    Set to a larger value if you need condition number checks for large problems, or a smaller value to skip SVD more aggressively. Default: 10,000,000 (10M elements).

  • fallback (bool, optional) –

    Enable automatic fallback strategies for difficult optimization problems. When True, the optimizer will automatically try alternative approaches if the initial optimization fails, including:

    • Alternative optimization methods

    • Perturbed initial guesses

    • Relaxed tolerances

    • Inferred parameter bounds

    • Robust loss functions

    • Problem rescaling

    Default: False. Enabling this improves success rate on difficult problems but adds overhead when optimizations fail.

  • max_fallback_attempts (int, optional) – Maximum number of fallback attempts to try before giving up. Only used when fallback=True. Default: 10.

  • fallback_verbose (bool, optional) – Print detailed information about fallback attempts. Only used when fallback=True. Default: False.

  • multistart (bool, optional) – Enable multi-start optimization for global search. When True, generates multiple starting points using Latin Hypercube Sampling (or other samplers) and evaluates each, selecting the best result. This helps find global optima in problems with multiple local minima. Default: False.

  • n_starts (int, optional) – Number of starting points for multi-start optimization. Only used when multistart=True or global_search=True. Default: 10.

  • global_search (bool, optional) – Shorthand for enabling multi-start with n_starts=20. Equivalent to multistart=True, n_starts=20. Useful for thorough global search. Default: False.

  • sampler ({'lhs', 'sobol', 'halton'}, optional) –

    Sampling strategy for generating starting points in multi-start:

    • ’lhs’: Latin Hypercube Sampling (stratified random, default)

    • ’sobol’: Sobol quasi-random sequence (deterministic, low-discrepancy)

    • ’halton’: Halton quasi-random sequence (deterministic, prime bases)

    Only used when multistart=True or global_search=True. Default: ‘lhs’.

  • center_on_p0 (bool, optional) – When True, center multi-start samples around the initial guess p0 rather than uniformly across the full parameter bounds. This provides more focused exploration around a data-informed starting region. Only used when multistart=True or global_search=True. Default: True.

  • scale_factor (float, optional) – Scale factor for the exploration region when center_on_p0=True. Multiplier for the exploration range around p0. Smaller values (0.5) mean tighter exploration, larger values (2.0) mean wider exploration. Only used when multistart=True or global_search=True. Default: 1.0.

  • method ({'auto', 'cmaes', 'multi-start', 'trf'} | None, optional) –

    Optimization method to use:

    • ’auto’: Automatically select based on parameter scale ratio. Uses CMA-ES for multi-scale problems (>1000x scale difference) when evosax is installed, otherwise multi-start.

    • ’cmaes’: Use CMA-ES (Covariance Matrix Adaptation Evolution Strategy) for gradient-free global optimization. Requires bounds. Best for multi-scale parameters spanning many orders of magnitude. Falls back to multi-start if evosax is not installed.

    • ’multi-start’: Use multi-start optimization with Latin Hypercube Sampling for initial points.

    • ’trf’: Use Trust Region Reflective (default behavior).

    • None: Default behavior (TRF with optional multi-start if enabled).

    Default: None.

  • cmaes_config (CMAESConfig | None, optional) – Configuration for CMA-ES optimization. If None, uses default config or preset specified by method parameter. See CMAESConfig for options including max_generations, restart_strategy, and popsize. Only used when method=’cmaes’ or method=’auto’ selects CMA-ES. Default: None.

  • *args (Any) – Additional arguments passed to CurveFit.curve_fit method.

  • **kwargs (Any) – Additional arguments passed to CurveFit.curve_fit method.

Returns:

  • popt (ndarray) – Optimal values for the parameters.

  • pcov (ndarray) – The estimated covariance of popt.

  • When fallback=True, the returned object also contains

  • - fallback_strategy_used (str or None) – Name of the fallback strategy that succeeded, or None if original succeeded

  • - fallback_attempts (int) – Number of optimization attempts before success

  • When multistart=True or global_search=True, the returned object contains

  • - multistart_diagnostics (dict) – Dictionary with multi-start exploration details including n_starts, best_loss, all_losses, exploration time, etc.

Return type:

tuple[np.ndarray, np.ndarray] | CurveFitResult

Notes

This function creates a CurveFit instance internally and calls its curve_fit method. For multiple fits with the same function signature, consider creating a CurveFit instance directly to benefit from JAX compilation caching.

When fallback=True, the optimizer tries increasingly aggressive recovery strategies if the initial optimization fails. This is particularly useful for:

  • Poor initial parameter guesses

  • Ill-conditioned problems

  • Problems with outliers

  • Numerically challenging models

When multistart=True or global_search=True, the optimizer explores multiple starting points to find the global optimum. This is particularly useful for:

  • Problems with multiple local minima

  • Complex multi-modal objective functions

  • Cases where the initial guess may not be close to the global optimum

See also

CurveFit.curve_fit

The underlying method with full parameter documentation

fit

Unified entry point with automatic workflow selection

curve_fit_large

For datasets with millions of points requiring special handling

FallbackOrchestrator

Direct access to fallback system for custom configurations

MultiStartOrchestrator

Direct access to multi-start system

Examples

Basic usage without fallback:

>>> import jax.numpy as jnp
>>> import numpy as np
>>>
>>> def exponential(x, a, b):
...     return a * jnp.exp(-b * x)
>>>
>>> x = np.linspace(0, 4, 50)
>>> y = 2.5 * np.exp(-1.3 * x) + 0.1 * np.random.normal(size=len(x))
>>> popt, _pcov = curve_fit(exponential, x, y, p0=[2, 1])

Using multi-start for global search:

>>> # Enable multi-start with default n_starts=10
>>> result = curve_fit(exponential, x, y, p0=[2, 1],
...                   bounds=([0, 0], [10, 5]), multistart=True)
>>>
>>> # Use global_search shorthand for thorough exploration (n_starts=20)
>>> result = curve_fit(exponential, x, y, p0=[2, 1],
...                   bounds=([0, 0], [10, 5]), global_search=True)
>>>
>>> # Customize multi-start with different sampler
>>> result = curve_fit(exponential, x, y, p0=[2, 1],
...                   bounds=([0, 0], [10, 5]),
...                   multistart=True, n_starts=15, sampler='sobol')

Using fallback for difficult problems:

>>> # Very poor initial guess - may fail without fallback
>>> result = curve_fit(exponential, x, y, p0=[100, 50], fallback=True)
>>>
>>> # Check which strategy was used
>>> if result.fallback_strategy_used:
...     print(f"Recovered using: {result.fallback_strategy_used}")

Combined multi-start + fallback for maximum robustness:

>>> result = curve_fit(
...     exponential, x, y, p0=[2, 1],
...     bounds=([0, 0], [10, 5]),
...     global_search=True,
...     fallback=True
... )
nlsq.curve_fit_large(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, stability=False, rescale_data=True, max_jacobian_elements_for_svd=10000000, memory_limit_gb=None, auto_size_detection=True, size_threshold=1000000, show_progress=False, chunk_size=None, multistart=False, n_starts=10, global_search=False, sampler='lhs', center_on_p0=True, scale_factor=1.0, **kwargs)[source]

Curve fitting with automatic memory management for large datasets.

Automatically selects processing strategy based on dataset size: - Small (< 1M points): Standard curve_fit - Medium (1M - 100M points): Chunked processing - Large (> 100M points): Streaming optimization

Parameters:
  • f (callable) – Model function f(x, *params) -> y. Must use jax.numpy operations.

  • xdata (array_like) – Independent variable data.

  • ydata (array_like) – Dependent variable data.

  • p0 (array_like, optional) – Initial parameter guess.

  • sigma (array_like, optional) – Uncertainties in ydata for weighted fitting.

  • absolute_sigma (bool, optional) – Whether sigma represents absolute uncertainties.

  • check_finite (bool, optional) – Check for finite input values.

  • bounds (tuple, optional) – Parameter bounds as (lower, upper).

  • method (str, optional) – Optimization algorithm (‘trf’, ‘lm’, or None for auto).

  • memory_limit_gb (float, optional) – Maximum memory usage in GB.

  • auto_size_detection (bool, optional) – Auto-detect dataset size for processing strategy.

  • size_threshold (int, optional) – Point threshold for large dataset processing (default: 1M).

  • show_progress (bool, optional) – Display progress bar for long operations.

  • chunk_size (int, optional) – Override automatic chunk size calculation.

  • multistart (bool, optional) – Enable multi-start optimization for global search. Default: False.

  • n_starts (int, optional) – Number of starting points for multi-start optimization. Default: 10.

  • global_search (bool, optional) – Shorthand for multistart=True, n_starts=20. Default: False.

  • sampler ({'lhs', 'sobol', 'halton'}, optional) – Sampling strategy for multi-start. Default: ‘lhs’.

  • center_on_p0 (bool, optional) – Center multi-start samples around p0. Default: True.

  • scale_factor (float, optional) – Scale factor for exploration region. Default: 1.0.

  • **kwargs (Any) – Additional optimization parameters (ftol, xtol, gtol, max_nfev, loss)

Returns:

  • popt (ndarray) – Fitted parameters.

  • pcov (ndarray) – Parameter covariance matrix.

Return type:

tuple[ndarray, ndarray] | OptimizeResult

Notes

All large datasets use streaming optimization for 100% data utilization.

Important: Model Function Requirements for Chunking

When auto_size_detection triggers chunked processing (>1M points), your model function MUST respect the size of xdata. Model output shape must match ydata shape.

INCORRECT - Fixed-size output (causes shape errors):

>>> def bad_model(xdata, a, b):
...     # WRONG: Returns fixed-size array regardless of xdata
...     t_full = jnp.arange(10_000_000)
...     return a * jnp.exp(-b * t_full)  # Shape mismatch!

CORRECT - Output matches xdata size:

>>> def good_model(xdata, a, b):
...     # CORRECT: Uses xdata as indices
...     indices = xdata.astype(jnp.int32)
...     return a * jnp.exp(-b * indices)
>>> def direct_model(xdata, a, b):
...     # CORRECT: Operates on xdata directly
...     return a * jnp.exp(-b * xdata)

Examples

Basic usage:

>>> popt, _pcov = curve_fit_large(model_func, xdata, ydata, p0=[1, 2, 3])

Large dataset with progress bar:

>>> popt, _pcov = curve_fit_large(model_func, big_xdata, big_ydata,
...                             show_progress=True, memory_limit_gb=8)

With multi-start optimization:

>>> popt, _pcov = curve_fit_large(model_func, xdata, ydata,
...                             p0=[1, 2, 3], bounds=([0, 0, 0], [10, 10, 10]),
...                             multistart=True, n_starts=10)

Using external logger for diagnostics:

>>> import logging
>>> my_logger = logging.getLogger("myapp")
>>> fitter = LargeDatasetFitter(memory_limit_gb=8, logger=my_logger)
>>> result = fitter.fit(model_func, xdata, ydata, p0=[1, 2])
>>> # Chunk failures now appear in myapp's logs

Other Module Members

NLSQ: JAX-accelerated nonlinear least squares curve fitting.

GPU/TPU-accelerated curve fitting with automatic differentiation. Provides drop-in SciPy compatibility with curve_fit function. Supports large datasets through automatic chunking and streaming optimization.

Key Features

  • Drop-in replacement for scipy.optimize.curve_fit

  • GPU/TPU acceleration via JAX

  • Automatic memory management for datasets up to 100M+ points

  • Streaming optimization for unlimited data

  • Smart algorithm selection and numerical stability

  • Unified fit() entry point with automatic workflow selection

Examples

>>> import jax.numpy as jnp
>>> from nlsq import curve_fit, fit
>>> def model(x, a, b): return a * jnp.exp(-b * x)
>>> popt, pcov = curve_fit(model, xdata, ydata)
>>> # Or use unified fit() with automatic workflow selection:
>>> popt, pcov = fit(model, xdata, ydata, workflow="auto", goal="quality")
class nlsq.AdaptiveHybridStreamingOptimizer(config=None)[source]

Bases: object

Adaptive hybrid streaming optimizer with four-phase optimization.

This optimizer combines parameter normalization, L-BFGS warmup, streaming Gauss-Newton, and exact covariance computation to provide:

  • Fast convergence for parameters with different scales

  • Accurate uncertainty estimates on large datasets

  • Memory-efficient streaming for unlimited dataset sizes

  • Production-ready fault tolerance

The optimization proceeds through four phases:

  • Phase 0: Setup parameter normalization and bounds transformation

  • Phase 1: L-BFGS warmup with adaptive switching to Phase 2

  • Phase 2: Streaming Gauss-Newton with exact J^T J accumulation

  • Phase 3: Denormalize parameters and transform covariance matrix

Parameters:

config (HybridStreamingConfig, optional) – Configuration for all phases of optimization. If None, uses default configuration. See HybridStreamingConfig for details.

config

Configuration object controlling all phases

Type:

HybridStreamingConfig

current_phase

Current optimization phase (0, 1, 2, or 3)

Type:

int

phase_history

History of phase transitions with timing information

Type:

list

phase_start_time

Start time of current phase (seconds since epoch)

Type:

float or None

normalized_params

Current parameters in normalized space

Type:

jax.Array or None

normalizer

Parameter normalizer instance (created in Phase 0)

Type:

ParameterNormalizer or None

normalized_model

Wrapped model function operating in normalized space

Type:

NormalizedModelWrapper or None

normalized_bounds

Bounds transformed to normalized space

Type:

tuple of jax.Array or None

normalization_jacobian

Denormalization Jacobian for covariance transform

Type:

jax.Array or None

Examples

Basic usage with default configuration:

>>> from nlsq import AdaptiveHybridStreamingOptimizer, HybridStreamingConfig
>>> import jax.numpy as jnp
>>> config = HybridStreamingConfig()
>>> optimizer = AdaptiveHybridStreamingOptimizer(config)

With bounds-based normalization:

>>> config = HybridStreamingConfig(
...     normalize=True,
...     normalization_strategy='bounds'
... )
>>> optimizer = AdaptiveHybridStreamingOptimizer(config)

With custom warmup settings:

>>> config = HybridStreamingConfig(
...     warmup_iterations=300,
...     lbfgs_initial_step_size=0.5,
...     gauss_newton_tol=1e-10
... )
>>> optimizer = AdaptiveHybridStreamingOptimizer(config)

See also

HybridStreamingConfig

Configuration for all phases

ParameterNormalizer

Parameter normalization implementation

curve_fit

High-level interface with method=’hybrid_streaming’

Notes

Based on Adaptive Hybrid Streaming Optimizer specification: agent-os/specs/2025-12-18-adaptive-hybrid-streaming-optimizer/spec.md

__init__(config=None)[source]

Initialize adaptive hybrid streaming optimizer.

Parameters:

config (HybridStreamingConfig, optional) – Configuration for all phases. If None, uses default configuration.

set_residual_weights(weights)[source]

Set residual weights for weighted least squares optimization.

This method allows updating weights during optimization, for example when weights need to be recomputed based on current parameter estimates.

Parameters:

weights (np.ndarray) – Per-group weights of shape (n_groups,). Higher weights give more importance to residuals in that group. The group index for each data point is determined by the first column of x_data.

Notes

Weights must be positive. The weighted MSE is computed as:

wMSE = sum(w[group_idx] * residuals^2) / sum(w[group_idx])

clear_cache()[source]

Release cached padded arrays to free memory.

Call this after optimization completes or when reusing the optimizer with different data. The cache is automatically invalidated when data identity changes, but explicit clearing frees memory sooner.

property phase_status: dict[str, Any]

Get current phase status and history.

Returns:

status – Phase status dictionary with keys: - ‘current_phase’: Current phase number - ‘phase_name’: Name of current phase - ‘phase_history’: List of completed phases with timing - ‘total_phases’: Total number of phases (4)

Return type:

dict

Examples

>>> config = HybridStreamingConfig()
>>> optimizer = AdaptiveHybridStreamingOptimizer(config)
>>> status = optimizer.phase_status
>>> print(status['current_phase'])
0
>>> print(status['phase_name'])
Phase 0: Normalization Setup
class nlsq.AlgorithmSelector[source]

Bases: object

Automatically select best algorithm based on problem characteristics.

This class analyzes the optimization problem and recommends the best algorithm, loss function, and parameters based on: - Problem size and dimensionality - Data characteristics (outliers, noise, conditioning) - Memory constraints - Convergence requirements

__init__()[source]

Initialize algorithm selector.

analyze_problem(f, xdata, ydata, p0=None, bounds=None, memory_limit_gb=None)[source]

Analyze problem characteristics.

Parameters:
  • f (callable) – Model function to fit

  • xdata (np.ndarray) – Independent variable data

  • ydata (np.ndarray) – Dependent variable data

  • p0 (np.ndarray, optional) – Initial parameter guess

  • bounds (tuple, optional) – Parameter bounds

  • memory_limit_gb (float, optional) – Memory constraint in GB

Returns:

analysis – Problem characteristics and statistics

Return type:

dict

select_algorithm(problem_analysis, user_preferences=None)[source]

Select best algorithm based on problem analysis.

This method orchestrates algorithm selection by calling focused helper methods for each decision.

Parameters:
  • problem_analysis (dict) – Results from analyze_problem

  • user_preferences (dict, optional) – User preferences (e.g., prioritize speed vs accuracy)

Returns:

recommendations – Recommended algorithm and parameters

Return type:

dict

get_algorithm_explanation(recommendations)[source]

Get human-readable explanation of algorithm choice.

Parameters:

recommendations (dict) – Algorithm recommendations

Returns:

explanation – Explanation of the choices

Return type:

str

class nlsq.BoundsInference(xdata, ydata, p0, safety_factor=10.0, enforce_positivity=True)[source]

Bases: object

Infer reasonable parameter bounds from data characteristics.

This class implements heuristics to estimate parameter bounds that help constrain optimization without being overly restrictive. The bounds are based on data ranges, parameter scales, and common patterns.

Parameters:
  • xdata (array_like) – Independent variable data

  • ydata (array_like) – Dependent variable data

  • p0 (array_like) – Initial parameter guess

  • safety_factor (float, optional) – Multiplier for bound ranges (larger = more conservative). Default: 10.0

  • enforce_positivity (bool, optional) – Force all bounds to be non-negative if p0 is positive. Default: True

x_min, x_max

Range of independent variable

Type:

float

y_min, y_max

Range of dependent variable

Type:

float

x_range, y_range

Span of data

Type:

float

Examples

>>> x = np.linspace(0, 10, 100)
>>> y = 2.5 * np.exp(-0.5 * x) + 1.0
>>> p0 = [2.0, 0.5, 1.0]
>>>
>>> inference = BoundsInference(x, y, p0)
>>> bounds = inference.infer()
>>> print(bounds)
([0.0, 0.0, 0.0], [25.0, 5.0, 10.0])
__init__(xdata, ydata, p0, safety_factor=10.0, enforce_positivity=True)[source]

Initialize bounds inference.

infer()[source]

Infer bounds for all parameters.

Returns:

  • lower_bounds (ndarray) – Lower bounds for each parameter

  • upper_bounds (ndarray) – Upper bounds for each parameter

Return type:

tuple[ndarray, ndarray]

class nlsq.CallbackBase[source]

Bases: object

Base class for optimization callbacks.

Subclass this to create custom callbacks. Override the __call__ method to define what happens at each iteration.

Examples

>>> class CustomCallback(CallbackBase):
...     def __call__(self, iteration, cost, params, info):
...         print(f"Iter {iteration}: cost={cost:.6f}")
__call__(iteration, cost, params, info)[source]

Called after each optimization iteration.

Parameters:
  • iteration (int) – Current iteration number (0-indexed)

  • cost (float) – Current cost/objective function value

  • params (np.ndarray) – Current parameter values

  • info (dict) – Additional information (gradient_norm, nfev, etc.)

close()[source]

Clean up resources.

Override this method if your callback uses resources that need explicit cleanup (files, network connections, etc.).

class nlsq.CallbackChain(*callbacks)[source]

Bases: CallbackBase

Chain multiple callbacks together.

Calls each callback in sequence. If any callback raises StopOptimization, propagates it to stop the optimization.

Parameters:

*callbacks (CallbackBase) – Callbacks to chain together

Examples

>>> from nlsq.callbacks import CallbackChain, ProgressBar, EarlyStopping
>>> callback = CallbackChain(
...     ProgressBar(max_nfev=100),
...     EarlyStopping(patience=5)
... )
>>> popt, pcov = curve_fit(f, x, y, callback=callback)
__init__(*callbacks)[source]
__call__(iteration, cost, params, info)[source]

Call all callbacks in sequence.

close()[source]

Close all callbacks that have a close method.

__del__()[source]

Ensure all callbacks are closed.

class nlsq.CompilationCache(enable_stats=True, max_cache_size=512)[source]

Bases: object

Cache for JIT-compiled functions with LRU eviction.

Caches compiled versions of functions based on their signature to avoid repeated JIT compilation overhead.

Phase 3 Optimizations (2.2a, 2.1a):

  • Uses OrderedDict for LRU tracking with move_to_end() on hits

  • Evicts oldest entry with popitem(last=False) when at capacity

  • Uses composite key (id(func), id(func.__code__)) to prevent cache poisoning when functions are redefined with same name

Thread Safety

All public methods are guarded by a per-instance threading.Lock. The lock is held only for dict operations and released during jax.jit() compilation to avoid blocking concurrent threads.

cache

OrderedDict mapping function signatures to compiled functions (enables LRU eviction)

Type:

OrderedDict

max_cache_size

Maximum number of compiled functions to cache (default 512)

Type:

int

stats

Compilation cache statistics

Type:

dict

_func_hash_cache

Memoization cache for function code hashes using composite key (id(func), id(func.__code__)) for correctness in notebooks

Type:

dict

__init__(enable_stats=True, max_cache_size=512)[source]

Initialize compilation cache.

Parameters:
  • enable_stats (bool) – Track cache statistics

  • max_cache_size (int) – Maximum number of compiled functions to cache (default 512). Increased from 256 to reduce recompilation frequency. Caps memory usage at approximately 4GB for 512 cached functions.

compile(func, static_argnums=(), donate_argnums=())[source]

Compile function with JIT and cache result.

Parameters:
  • func (callable) – Function to compile

  • static_argnums (tuple of int) – Indices of static arguments

  • donate_argnums (tuple of int) – Indices of arguments to donate

Returns:

compiled_func – JIT-compiled function (may be cached)

Return type:

callable

get_or_compile(func, *args, static_argnums=(), **kwargs)[source]

Get cached compiled function or compile if not cached.

Parameters:
  • func (callable) – Function to compile

  • args (tuple) – Arguments to function (for signature generation)

  • static_argnums (tuple of int) – Indices of static arguments

  • kwargs (dict) – Keyword arguments

Returns:

  • compiled_func (callable) – Compiled function

  • signature (str) – Function signature

Return type:

tuple[Callable, str]

clear()[source]

Clear compilation cache, function hash memoization, and reset stats.

This method clears all cached data and resets statistics counters to zero, allowing accurate hit/miss tracking after the clear operation.

get_stats()[source]

Get cache statistics.

Returns:

stats – Cache hit rate and other statistics

Return type:

dict

__enter__()[source]

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)[source]

Context manager exit.

class nlsq.ConvergenceMonitor(window_size=10, sensitivity=1.0)[source]

Bases: object

Monitor convergence patterns and detect problems.

Detects: - Oscillation in parameter or cost values - Stagnation (no progress) - Divergence (increasing cost) - Slow convergence - Numerical instability

__init__(window_size=10, sensitivity=1.0)[source]

Initialize convergence monitor.

Parameters:
  • window_size (int) – Size of sliding window for pattern detection

  • sensitivity (float) – Sensitivity factor (1.0 = normal, <1 = less sensitive, >1 = more sensitive)

update(cost, params, gradient=None, step_size=None)[source]

Update monitor with new iteration data.

Parameters:
  • cost (float) – Current cost value

  • params (np.ndarray) – Current parameter values

  • gradient (np.ndarray, optional) – Current gradient

  • step_size (float, optional) – Step size taken

add_iteration(cost, params, gradient=None, step_size=None)[source]

Alias for update() method for backward compatibility.

Parameters:
  • cost (float) – Current cost value

  • params (np.ndarray) – Current parameter values

  • gradient (np.ndarray, optional) – Current gradient

  • step_size (float, optional) – Step size taken

detect_oscillation()[source]

Detect oscillation in optimization.

Returns:

  • is_oscillating (bool) – Whether oscillation is detected

  • oscillation_score (float) – Oscillation severity score (0-1)

Return type:

tuple[bool, float]

detect_stagnation()[source]

Detect stagnation in optimization.

Returns:

  • is_stagnant (bool) – Whether stagnation is detected

  • stagnation_score (float) – Stagnation severity score

Return type:

tuple[bool, float]

detect_divergence()[source]

Detect divergence in optimization.

Returns:

  • is_diverging (bool) – Whether divergence is detected

  • divergence_score (float) – Divergence severity score

Return type:

tuple[bool, float]

get_convergence_rate()[source]

Estimate convergence rate.

Returns:

rate – Convergence rate (negative = diverging, positive = converging)

Return type:

float or None

class nlsq.CurveFit(flength=None, use_dynamic_sizing=False, enable_stability=False, enable_recovery=False, enable_overflow_check=False, cache_config=None, max_jacobian_elements_for_svd=10000000)[source]

Bases: object

Main class for nonlinear least squares curve fitting with JAX acceleration.

This class provides the core curve fitting functionality with JAX JIT compilation, automatic differentiation for Jacobian computation, and multiple optimization algorithms. It handles data preprocessing, optimization algorithm selection, and covariance matrix computation.

The class maintains compiled versions of fitting functions to avoid recompilation overhead when fitting multiple datasets with the same function signature.

flength

Fixed data length for input padding to avoid JAX retracing.

Type:

float or None

use_dynamic_sizing

Whether to use dynamic sizing instead of fixed padding.

Type:

bool

logger

Internal logger for debugging and performance monitoring.

Type:

Logger

curve_fit : Main fitting method
create_sigma_transform_funcs : Internal method for sigma transformation setup
__init__(flength=None, use_dynamic_sizing=False, enable_stability=False, enable_recovery=False, enable_overflow_check=False, cache_config=None, max_jacobian_elements_for_svd=10000000)[source]

Initialize CurveFit instance.

Parameters:
  • flength (float, optional) – Fixed data length for JAX compilation. Input data is padded to this length to avoid recompilation when fitting datasets of different sizes. If None, no padding is applied and each dataset size triggers recompilation. Ignored when use_dynamic_sizing=True for large datasets.

  • use_dynamic_sizing (bool, default False) – Enable dynamic sizing to reduce memory usage. When True, padding is only applied when data size is smaller than flength. For large datasets, uses actual size to prevent excessive memory allocation. Default False maintains backward compatibility with fixed padding behavior.

  • enable_stability (bool, default False) – Enable numerical stability checks and fixes (validation, algorithm selection). Note: This does NOT include overflow checking which adds overhead.

  • enable_recovery (bool, default False) – Enable automatic recovery from optimization failures.

  • enable_overflow_check (bool, default False) – Enable overflow/underflow checking in function evaluations. This adds ~30% overhead so it’s separate from other stability features.

  • cache_config (dict, optional) – Configuration for the unified JIT compilation cache. Supported keys: 'maxsize' (int, default 128) — maximum number of compiled functions to cache; 'enable_stats' (bool, default True) — track cache statistics (hits, misses, compile_time_ms); 'disk_cache_enabled' (bool, default False) — enable disk caching tier (Phase 2 feature). If None, uses global cache with default settings.

  • max_jacobian_elements_for_svd (int, default 10_000_000) – Maximum Jacobian size (m x n elements) for SVD computation during stability checks. SVD is skipped for larger Jacobians to avoid O(min(m,n)^2 x max(m,n)) overhead. Only applies when enable_stability=True.

Notes

Fixed length compilation trades memory usage for compilation speed: - flength=None: Minimal memory, recompiles for each dataset size - flength=large_value: Higher memory, avoids recompilation - use_dynamic_sizing=True: Balanced approach for mixed dataset sizes

update_flength(flength)[source]

Set the fixed input data length.

Parameters:

flength (float) – The fixed input data length.

create_sigma_transform_funcs()[source]

Create JIT-compiled sigma transform functions.

This function creates two JIT-compiled functions: sigma_transform1d and sigma_transform2d, which are used to compute the sigma transform for 1D and 2D data, respectively. The functions are stored as attributes of the object on which the method is called.

create_covariance_svd()[source]

Create JIT-compiled SVD function for covariance computation.

pad_fit_data(xdata, ydata, xdims, len_diff)[source]

Pad fit data to match the fixed input data length.

This function pads the input data arrays with small values to match the fixed input data length to avoid JAX retracing the JITted functions. The padding is added along the second dimension of the xdata array if it’s multidimensional data otherwise along the first dimension. The small values are chosen to be EPS, a global constant defined as a very small positive value which avoids numerical issues.

Parameters:
  • xdata (np.ndarray) – The independent variables of the data.

  • ydata (np.ndarray) – The dependent variables of the data.

  • xdims (int) – The number of dimensions in the xdata array.

  • len_diff (int) – The difference in length between the data arrays and the fixed input data length.

Returns:

The padded xdata and ydata arrays.

Return type:

Tuple[np.ndarray, np.ndarray]

class nlsq.EarlyStopping(patience=10, min_delta=1e-06, verbose=True)[source]

Bases: CallbackBase

Stop optimization early if no improvement for patience iterations.

Parameters:
  • patience (int, optional) – Number of iterations with no improvement to wait before stopping. Default: 10

  • min_delta (float, optional) – Minimum change in cost to qualify as an improvement. Default: 1e-6

  • verbose (bool, optional) – Whether to print messages. Default: True

Examples

>>> from nlsq.callbacks import EarlyStopping
>>> callback = EarlyStopping(patience=5, min_delta=1e-4)
>>> popt, pcov = curve_fit(f, x, y, callback=callback)

Notes

Raises StopOptimization exception when patience is exceeded, which will be caught by the optimizer and treated as successful convergence.

__init__(patience=10, min_delta=1e-06, verbose=True)[source]
__call__(iteration, cost, params, info)[source]

Check for improvement and stop if patience exceeded.

class nlsq.FallbackOrchestrator(strategies=None, max_attempts=10, verbose=False)[source]

Bases: object

Orchestrates automatic fallback strategies for robust optimization.

The orchestrator tries multiple recovery strategies when optimization fails, including alternative methods, perturbed initial guesses, relaxed tolerances, and robust loss functions.

Parameters:
  • strategies (list of FallbackStrategy, optional) – List of strategies to try. If None, uses default strategies.

  • max_attempts (int, optional) – Maximum total attempts across all strategies. Default: 10

  • verbose (bool, optional) – Print progress messages. Default: False

strategies

Active fallback strategies, sorted by priority

Type:

list of FallbackStrategy

total_attempts

Total number of fit attempts made

Type:

int

successful_strategies

Count of successes per strategy

Type:

dict

Examples

>>> from nlsq.stability.fallback import FallbackOrchestrator
>>> import numpy as np
>>>
>>> def model(x, a, b):
...     return a * np.exp(-b * x)
>>>
>>> x = np.linspace(0, 10, 100)
>>> y = 2.5 * np.exp(-0.5 * x) + 0.1 * np.random.randn(100)
>>>
>>> orchestrator = FallbackOrchestrator(verbose=True)
>>> result = orchestrator.fit_with_fallback(
...     model, x, y, p0=[1, 0.1]  # Deliberately poor p0
... )
>>>
>>> if result.fallback_strategy_used:
...     print(f"Recovered using: {result.fallback_strategy_used}")
DEFAULT_STRATEGIES: ClassVar[list[StrategyFactory]] = [<class 'nlsq.stability.fallback.AlternativeMethodStrategy'>, <class 'nlsq.stability.fallback.PerturbInitialGuessStrategy'>, <class 'nlsq.stability.fallback.AdjustTolerancesStrategy'>, <class 'nlsq.stability.fallback.AddParameterBoundsStrategy'>, <class 'nlsq.stability.fallback.UseRobustLossStrategy'>, <class 'nlsq.stability.fallback.RescaleProblemStrategy'>]
__init__(strategies=None, max_attempts=10, verbose=False)[source]

Initialize fallback orchestrator.

successful_strategies: dict[str, int]
fit_with_fallback(f, xdata, ydata, **kwargs)[source]

Attempt curve fit with automatic fallback on failure.

Parameters:
  • f (callable) – Model function

  • xdata (array_like) – Independent variable data

  • ydata (array_like) – Dependent variable data

  • **kwargs – Additional arguments passed to curve_fit

Returns:

result – Optimization result with fallback metadata

Return type:

FallbackResult

Raises:

RuntimeError – If all fallback strategies fail

get_statistics()[source]

Get statistics on fallback strategy performance.

Returns:

stats – Dictionary with success rates and attempt counts

Return type:

dict

print_statistics()[source]

Print human-readable statistics.

class nlsq.FallbackResult(result, strategy_used=None, attempts=1)[source]

Bases: object

Enhanced optimization result with fallback information.

__init__(result, strategy_used=None, attempts=1)[source]

Initialize fallback result.

Parameters:
  • result (OptimizeResult) – Underlying optimization result

  • strategy_used (str, optional) – Name of fallback strategy that succeeded

  • attempts (int, optional) – Number of attempts before success

__getattr__(name)[source]

Delegate attribute access to underlying result.

class nlsq.FallbackStrategy(name, description, priority=0)[source]

Bases: object

Base class for fallback strategies.

__init__(name, description, priority=0)[source]

Initialize fallback strategy.

Parameters:
  • name (str) – Strategy name

  • description (str) – Human-readable description

  • priority (int, optional) – Execution priority (higher = earlier). Default: 0

apply(kwargs)[source]

Apply strategy by modifying fit parameters.

Parameters:

kwargs (dict) – Original curve_fit keyword arguments

Returns:

modified_kwargs – Modified keyword arguments

Return type:

dict

class nlsq.HybridStreamingConfig(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)[source]

Bases: object

Configuration for adaptive hybrid streaming optimizer.

This configuration class controls all aspects of the four-phase hybrid optimizer: - Phase 0: Parameter normalization setup - Phase 1: L-BFGS warmup with adaptive switching - Phase 2: Streaming Gauss-Newton with exact J^T J accumulation - Phase 3: Denormalization and covariance transform

Parameters:
  • normalize (bool, default=True) – Enable parameter normalization. When True, parameters are normalized to similar scales to improve gradient signal quality and convergence speed.

  • normalization_strategy (str, default='auto') –

    Strategy for parameter normalization. Options:

    • ’auto’: Use bounds-based if bounds provided, else p0-based

    • ’bounds’: Normalize to [0, 1] using parameter bounds

    • ’p0’: Scale by initial parameter magnitudes

    • ’none’: Identity transform (no normalization)

  • warmup_iterations (int, default=200) – Number of L-BFGS warmup iterations before checking switch criteria. With L-BFGS, typical values are 20-50 (5-10x fewer than Adam). More iterations allow better initial convergence before switching to Gauss-Newton.

  • max_warmup_iterations (int, default=500) – Maximum L-BFGS warmup iterations before forced switch to Phase 2. Safety limit to prevent indefinite warmup when loss plateaus slowly.

  • warmup_learning_rate (float, default=0.001) – Legacy warmup step size retained for backward compatibility. L-BFGS warmup uses lbfgs_initial_step_size and adaptive step sizes.

  • loss_plateau_threshold (float, default=1e-4) – Relative loss improvement threshold for plateau detection. Switch to Phase 2 if: abs(loss - prev_loss) / (abs(prev_loss) + eps) < threshold. Smaller values = stricter plateau detection = later switching.

  • gradient_norm_threshold (float, default=1e-3) – Gradient norm threshold for early Phase 2 switch. Switch to Phase 2 if: ||gradient|| < threshold. Indicates optimization is close to optimum and Gauss-Newton will be effective.

  • active_switching_criteria (list, default=['plateau', 'gradient', 'max_iter']) –

    List of active switching criteria for Phase 1 -> Phase 2 transition. Available criteria:

    • ’plateau’: Loss plateau detection (loss_plateau_threshold)

    • ’gradient’: Gradient norm below threshold (gradient_norm_threshold)

    • ’max_iter’: Maximum iterations reached (max_warmup_iterations)

    Switch occurs when ANY active criterion is met.

  • lbfgs_history_size (int, default=10) – Number of previous gradients and updates to store for L-BFGS Hessian approximation. Standard default from SciPy, PyTorch, and Nocedal & Wright. Larger values give better Hessian approximation but use more memory.

  • lbfgs_initial_step_size (float, default=0.1) – Initial step size for L-BFGS during cold start (first m iterations while history buffer fills). Small value prevents overshooting when Hessian approximation is poor (identity matrix initially).

  • lbfgs_line_search (str, default='wolfe') –

    Line search method for L-BFGS step acceptance. Options:

    • ’wolfe’: Standard Wolfe conditions (default)

    • ’strong_wolfe’: Strong Wolfe conditions (stricter)

    • ’backtracking’: Simple backtracking line search

  • lbfgs_exploration_step_size (float, default=0.1) – L-BFGS initial step size for exploration mode (high relative loss). Small value prevents first “Hessian=Identity” step from overshooting.

  • lbfgs_refinement_step_size (float, default=1.0) – L-BFGS initial step size for refinement mode (low relative loss). Larger value leverages L-BFGS’s near-Newton convergence speed when close to optimum.

  • gauss_newton_max_iterations (int, default=100) – Maximum iterations for Phase 2 Gauss-Newton optimization. Typical values: 50-200.

  • gauss_newton_tol (float, default=1e-8) – Convergence tolerance for Phase 2 (gradient norm threshold). Optimization stops if: ||gradient|| < tol.

  • trust_region_initial (float, default=1.0) – Initial trust region radius for Gauss-Newton step control. Radius is adapted based on actual vs predicted reduction ratio.

  • regularization_factor (float, default=1e-10) – Regularization factor for rank-deficient J^T J matrices. Added to diagonal: J^T J + regularization_factor * I.

  • cg_max_iterations (int, default=100) – Maximum iterations for Conjugate Gradient solver in Phase 2. Used when parameter count exceeds cg_param_threshold. Higher values allow better convergence but more computation.

  • cg_relative_tolerance (float, default=1e-4) – Relative tolerance for CG solver convergence. Convergence check: ||r|| < cg_relative_tolerance * ||J^T r_0||. Implements Inexact Newton strategy for efficiency.

  • cg_absolute_tolerance (float, default=1e-10) – Absolute tolerance floor for CG solver convergence. Safety floor to prevent over-iteration on well-conditioned systems.

  • cg_param_threshold (int, default=2000) –

    Parameter count threshold for auto-selecting CG vs materialized solver.

    • p < threshold: Use materialized J^T J with SVD solve (faster for small p)

    • p >= threshold: Use CG with implicit matvec (O(p) memory vs O(p^2))

    Threshold balances memory savings vs additional data passes for CG.

  • enable_group_variance_regularization (bool, default=False) –

    Enable variance regularization for parameter groups. When enabled, adds a penalty term to the loss function that penalizes variance within specified parameter groups. This is essential for preventing per-group parameter absorption in multi-component fitting.

    The regularized loss becomes L = MSE + group_variance_lambda * sum(Var(group_i)) where each group_i is a slice of parameters defined by group_variance_indices.

  • group_variance_lambda (float, default=0.01) – Regularization strength for group variance penalty. Larger values more strongly penalize variance within parameter groups. Use 0.001-0.01 for light regularization (allows moderate group variation), 0.1-1.0 for moderate regularization (constrains groups to be similar), or 10-1000 for strong regularization (forces groups to be nearly uniform). For multi-component fits with per-group parameters, use lambda ~ 0.1 * n_data / (n_groups * sigma^2) where sigma is the expected experimental variation (~0.05 for 5%).

  • group_variance_indices (list of tuple, default=None) –

    List of (start, end) tuples defining parameter groups for variance regularization. Each tuple specifies a slice [start:end] of the parameter vector that should have low internal variance.

    Example for 23 independent groups: group_variance_indices = [(0, 23), (23, 46)] constrains contrast params [0:23] and offset params [23:46] to each have low variance, preventing them from absorbing group-dependent physical signals.

    If None when enable_group_variance_regularization=True, no groups are regularized (effectively disabling the feature).

  • chunk_size (int, default=10000) – Size of data chunks for streaming J^T J accumulation. Larger chunks = faster but more memory. Typical: 5000-50000.

  • gc_chunk_interval (int, default=10) – Chunks between gc.collect() calls (FR-007). Controls how often garbage collection runs during chunked processing. Higher values reduce GC overhead but may increase memory usage. Default of 10 balances memory reclamation with performance.

  • enable_checkpoints (bool, default=True) – Enable checkpoint save/resume for fault tolerance.

  • checkpoint_frequency (int, default=100) – Save checkpoint every N iterations (across all phases).

  • validate_numerics (bool, default=True) – Enable NaN/Inf validation at gradient, parameter, and loss computation points.

  • enable_multi_device (bool, default=False) – Enable multi-GPU/TPU parallelism for Jacobian computation. Uses JAX pmap for data-parallel computation across devices.

  • callback_frequency (int, default=10) – Call progress callback every N iterations (if callback provided).

  • enable_multistart (bool, default=False) – Enable multi-start optimization with tournament selection during Phase 1. When enabled, generates multiple starting points using LHS sampling and uses tournament elimination to select the best candidate for Phase 2.

  • n_starts (int, default=10) – Number of starting points for multi-start optimization. Only used when enable_multistart=True.

  • multistart_sampler (str, default='lhs') – Sampling method for generating starting points. Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.

  • elimination_rounds (int, default=3) – Number of tournament elimination rounds. Each round eliminates elimination_fraction of candidates.

  • elimination_fraction (float, default=0.5) – Fraction of candidates to eliminate per round. Must be in (0, 1). Default 0.5 = eliminate half each round.

  • batches_per_round (int, default=50) – Number of data batches to use for evaluation in each tournament round. More batches = more reliable selection but slower.

Examples

Default configuration:

>>> from nlsq import HybridStreamingConfig
>>> config = HybridStreamingConfig()
>>> config.warmup_iterations
200

Aggressive profile (faster convergence with L-BFGS):

>>> config = HybridStreamingConfig.aggressive()
>>> config.warmup_iterations
50

Conservative profile (higher quality):

>>> config = HybridStreamingConfig.conservative()
>>> config.gauss_newton_tol < 1e-8
True

Memory-optimized profile:

>>> config = HybridStreamingConfig.memory_optimized()
>>> config.chunk_size < 10000
True

Custom configuration:

>>> config = HybridStreamingConfig(
...     warmup_iterations=50,
...     lbfgs_history_size=15,
...     chunk_size=5000,
... )

With multi-start tournament selection:

>>> config = HybridStreamingConfig(
...     enable_multistart=True,
...     n_starts=20,
...     elimination_rounds=3,
...     batches_per_round=50,
... )

See also

AdaptiveHybridStreamingOptimizer

Optimizer that uses this configuration

curve_fit

High-level interface with method=’hybrid_streaming’

TournamentSelector

Tournament selection for multi-start optimization

Notes

Based on Adaptive Hybrid Streaming Optimizer specification: agent-os/specs/2025-12-18-adaptive-hybrid-streaming-optimizer/spec.md

L-BFGS replaces Adam for warmup, providing 5-10x faster convergence to the basin of attraction through approximate Hessian information.

normalize: bool
normalization_strategy: str
warmup_iterations: int
max_warmup_iterations: int
warmup_learning_rate: float
loss_plateau_threshold: float
gradient_norm_threshold: float
active_switching_criteria: list[str] | None
lbfgs_history_size: int
lbfgs_initial_step_size: float
lbfgs_exploration_step_size: float
lbfgs_refinement_step_size: float
use_learning_rate_schedule: bool
lr_schedule_warmup_steps: int
lr_schedule_decay_steps: int
lr_schedule_end_value: float
gradient_clip_value: float | None
enable_warm_start_detection: bool
warm_start_threshold: float
enable_adaptive_warmup_lr: bool
warmup_lr_refinement: float
warmup_lr_careful: float
enable_cost_guard: bool
cost_increase_tolerance: float
enable_step_clipping: bool
max_warmup_step_size: float
gauss_newton_max_iterations: int
gauss_newton_tol: float
trust_region_initial: float
regularization_factor: float
cg_max_iterations: int
cg_relative_tolerance: float
cg_absolute_tolerance: float
cg_param_threshold: int
enable_group_variance_regularization: bool
group_variance_lambda: float
group_variance_indices: list[tuple[int, int]] | None
enable_residual_weighting: bool
residual_weights: list[float] | None
chunk_size: int
gc_chunk_interval: int
loop_strategy: Literal['auto', 'scan', 'loop']
enable_checkpoints: bool
checkpoint_frequency: int
checkpoint_dir: str | None
resume_from_checkpoint: str | None
validate_numerics: bool
enable_fault_tolerance: bool
max_retries_per_batch: int
min_success_rate: float
enable_multi_device: bool
callback_frequency: int
verbose: int
log_frequency: int
enable_multistart: bool
n_starts: int
multistart_sampler: Literal['lhs', 'sobol', 'halton']
elimination_rounds: int
elimination_fraction: float
batches_per_round: int
center_on_p0: bool
scale_factor: float
__post_init__()[source]

Validate configuration after initialization.

Delegates to specialized validator functions for each configuration group. ConfigValidationError from validators is re-raised as ValueError for backwards compatibility.

classmethod aggressive()[source]

Create aggressive profile: faster convergence with L-BFGS, looser tolerances.

This preset prioritizes speed over robustness: - L-BFGS warmup with reduced iterations (50 vs 300 with Adam) - Higher learning rate for faster progress - Looser tolerances for earlier Phase 2 switching - Larger chunks for better throughput

Returns:

Configuration with aggressive settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.aggressive()
>>> config.warmup_learning_rate
0.003
>>> config.warmup_iterations
50
classmethod conservative()[source]

Create conservative profile: slower but robust, tighter tolerances.

This preset prioritizes solution quality over speed: - L-BFGS warmup with conservative iterations - Lower learning rate for stability - Tighter tolerances for higher quality - More Gauss-Newton iterations

Returns:

Configuration with conservative settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.conservative()
>>> config.gauss_newton_tol
1e-10
>>> config.warmup_iterations
30
classmethod memory_optimized()[source]

Create memory-optimized profile: smaller chunks, efficient settings.

This preset minimizes memory footprint: - Smaller chunks to reduce memory usage - L-BFGS warmup with reduced iterations - Enable checkpoints for recovery (important when memory is tight) - Lower CG threshold for more aggressive CG usage (avoids O(p^2) J^T J)

Returns:

Configuration with memory-optimized settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.memory_optimized()
>>> config.chunk_size
5000
>>> config.warmup_iterations
40
>>> config.cg_param_threshold
1000
classmethod with_multistart(n_starts=10, **kwargs)[source]

Create configuration with multi-start tournament selection enabled.

This preset enables multi-start optimization for finding global optima: - Tournament selection during Phase 1 warmup - LHS sampling for generating starting points - Progressive elimination to select best candidate

Parameters:
  • n_starts (int, default=10) – Number of starting points to generate.

  • **kwargs – Additional configuration parameters to override.

Returns:

Configuration with multi-start enabled.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.with_multistart(n_starts=20)
>>> config.enable_multistart
True
>>> config.n_starts
20
classmethod defense_strict()[source]

Create strict defense layer profile for near-optimal scenarios.

This preset maximizes protection against divergence when initial parameters are expected to be close to optimal (warm starts, refinement): - Very low warm start threshold (triggers at 1% relative loss) - Ultra-conservative learning rates for refinement - Very tight cost guard tolerance (5% increase aborts) - Very small step clipping for stability

Use this when: - Continuing optimization from a previous fit - Refining parameters that are already close to optimal - Dealing with ill-conditioned problems - Prioritizing stability over speed

Returns:

Configuration with strict defense layer settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_strict()
>>> config.warm_start_threshold
0.01
>>> config.cost_increase_tolerance
0.05
>>> config.warmup_iterations
25
classmethod defense_relaxed()[source]

Create relaxed defense layer profile for exploration-heavy scenarios.

This preset reduces defense layer sensitivity for problems where significant parameter exploration is needed: - Higher warm start threshold (50% relative loss needed to skip) - More aggressive learning rates for exploration - Generous cost guard tolerance (50% increase allowed) - Larger step clipping for faster exploration

Use this when: - Starting from a rough initial guess - Exploring a wide parameter space - Problems with multiple local minima - Speed is more important than robustness

Returns:

Configuration with relaxed defense layer settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_relaxed()
>>> config.warm_start_threshold
0.5
>>> config.cost_increase_tolerance
0.5
>>> config.warmup_iterations
50
classmethod defense_disabled()[source]

Create profile with all defense layers disabled.

This preset completely disables the 4-layer defense strategy, reverting to pre-0.3.6 behavior. Use with caution as this removes protection against warmup divergence.

Use this when: - Debugging to isolate defense layer effects - Benchmarking without defense overhead - Backward compatibility with older code is required

Returns:

Configuration with all defense layers disabled.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_disabled()
>>> config.enable_warm_start_detection
False
classmethod scientific_default()[source]

Create profile optimized for scientific computing workflows.

This preset is tuned for scientific fitting scenarios like spectroscopy, decay curves, and other physics-based models: - Balanced defense layers that protect without being too aggressive - L-BFGS warmup with moderate iterations - Enabled checkpoints for long-running fits

Use this when: - Fitting physics-based models (spectroscopy, decay curves) - Numerical precision is important - Parameters may have multiple scales - Reproducibility is required

Returns:

Configuration optimized for scientific computing.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.scientific_default()
>>> config.warmup_iterations
35
__init__(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)
class nlsq.InputValidator(fast_mode=True)[source]

Bases: object

Comprehensive input validation for curve fitting functions.

__init__(fast_mode=True)[source]

Initialize the input validator.

Parameters:

fast_mode (bool, default True) – If True, skip expensive validation checks for better performance. If False, perform all validation checks.

validate_curve_fit_inputs(f, xdata, ydata, p0=None, bounds=None, sigma=None, absolute_sigma=True, check_finite=True)[source]

Validate inputs for curve_fit function.

This method orchestrates the validation pipeline by calling focused helper methods for each validation step.

Parameters:
  • f (callable) – Model function to fit

  • xdata (array_like) – Independent variable data

  • ydata (array_like) – Dependent variable data

  • p0 (array_like, optional) – Initial parameter guess

  • bounds (tuple, optional) – Parameter bounds

  • sigma (array_like, optional) – Uncertainties in ydata

  • absolute_sigma (bool) – Whether sigma is absolute or relative

  • check_finite (bool) – Whether to check for finite values

Returns:

  • errors (list) – List of error messages (empty if no errors)

  • warnings (list) – List of warning messages

  • xdata_clean (np.ndarray) – Cleaned and validated xdata

  • ydata_clean (np.ndarray) – Cleaned and validated ydata

Return type:

tuple[list[str], list[str], ndarray, ndarray]

validate_security_constraints(n_points, n_params, bounds=None, p0=None)[source]

Validate security constraints to prevent DoS and numerical issues.

This method combines all security-focused validation checks.

Parameters:
  • n_points (int) – Number of data points

  • n_params (int) – Number of parameters

  • bounds (tuple | None, optional) – Parameter bounds

  • p0 (array_like | None, optional) – Initial parameter guess

Returns:

  • errors (list) – List of critical error messages

  • warnings (list) – List of warning messages

Return type:

tuple[list[str], list[str]]

validate_least_squares_inputs(fun, x0, bounds=None, method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, max_nfev=None)[source]

Validate inputs for least_squares function.

This method orchestrates the validation pipeline by calling focused helper methods for each validation step.

Parameters:
  • fun (callable) – Residual function

  • x0 (array_like) – Initial parameter guess

  • bounds (tuple, optional) – Parameter bounds

  • method (str) – Optimization method

  • ftol (float) – Function tolerance

  • xtol (float) – Parameter tolerance

  • gtol (float) – Gradient tolerance

  • max_nfev (int, optional) – Maximum function evaluations

Returns:

  • errors (list) – List of error messages

  • warnings (list) – List of warning messages

  • x0_clean (np.ndarray) – Cleaned initial guess

Return type:

tuple[list[str], list[str], ndarray]

class nlsq.IterationLogger(filename=None, mode='w', log_params=False, file=None)[source]

Bases: CallbackBase

Log optimization progress to file or stdout.

Parameters:
  • filename (str, optional) – File to log to. If None and file is None, logs to stdout.

  • mode (str, optional) – File open mode. Default: ‘w’ (overwrite)

  • log_params (bool, optional) – Whether to log parameter values. Default: False

  • file (file-like object, optional) – File-like object to write to. If provided, filename is ignored.

Examples

>>> from nlsq.callbacks import IterationLogger
>>> callback = IterationLogger("optimization.log")
>>> popt, pcov = curve_fit(f, x, y, callback=callback)
__init__(filename=None, mode='w', log_params=False, file=None)[source]
__call__(iteration, cost, params, info)[source]

Log iteration information.

close()[source]

Close log file.

__del__()[source]

Ensure file is closed on deletion.

class nlsq.LDMemoryConfig(memory_limit_gb=8.0, safety_factor=0.8, min_chunk_size=1000, max_chunk_size=1000000, use_streaming=True, streaming_batch_size=50000, streaming_max_epochs=10, min_success_rate=0.5, save_diagnostics=False, gc_chunk_interval=10)[source]

Bases: object

Configuration for memory management in large dataset fitting.

memory_limit_gb

Maximum memory to use in GB (default: 8.0)

Type:

float

safety_factor

Safety factor for memory calculations (default: 0.8)

Type:

float

min_chunk_size

Minimum chunk size in data points (default: 1000)

Type:

int

max_chunk_size

Maximum chunk size in data points (default: 1000000)

Type:

int

use_streaming

Use adaptive hybrid streaming optimization for unlimited data (default: True)

Type:

bool

streaming_batch_size

Chunk size for adaptive hybrid streaming (default: 50000)

Type:

int

streaming_max_epochs

Maximum Gauss-Newton iterations for adaptive hybrid streaming (default: 10)

Type:

int

min_success_rate

Minimum success rate for chunked fitting (default: 0.5) If success rate falls below this threshold, fitting is considered failed

Type:

float

save_diagnostics

Whether to compute and save detailed diagnostic statistics (default: False) When False, skips statistical computations for successful chunks (5-10% faster)

Type:

bool

gc_chunk_interval

Chunks between gc.collect() calls (default: 10, FR-007) Controls how often garbage collection runs during chunked processing. Higher values reduce GC overhead but may increase memory usage.

Type:

int

memory_limit_gb: float
safety_factor: float
min_chunk_size: int
max_chunk_size: int
use_streaming: bool
streaming_batch_size: int
streaming_max_epochs: int
min_success_rate: float
save_diagnostics: bool
gc_chunk_interval: int
__init__(memory_limit_gb=8.0, safety_factor=0.8, min_chunk_size=1000, max_chunk_size=1000000, use_streaming=True, streaming_batch_size=50000, streaming_max_epochs=10, min_success_rate=0.5, save_diagnostics=False, gc_chunk_interval=10)
class nlsq.LargeDatasetFitter(memory_limit_gb=8.0, config=None, curve_fit_class=None, logger=None, multistart=False, n_starts=10, sampler='lhs')[source]

Bases: object

Large dataset curve fitting with automatic memory management and chunking.

This class handles datasets with millions to billions of points that exceed available memory through automatic chunking, progressive parameter refinement, and streaming optimization. It maintains fitting accuracy while preventing memory overflow through dynamic memory monitoring and chunk size optimization.

Core Capabilities

  • Automatic memory estimation based on data size and parameter count

  • Dynamic chunk size calculation considering available system memory

  • Sequential parameter refinement across data chunks with convergence tracking

  • Streaming optimization for unlimited datasets (no accuracy loss)

  • Real-time progress monitoring with ETA for long-running fits

  • Full integration with NLSQ optimization algorithms and GPU acceleration

  • Multi-start optimization for global search (uses full data)

Memory Management Algorithm

  1. Estimates total memory requirements from dataset size and parameter count

  2. Calculates optimal chunk sizes considering available memory and safety margins

  3. Monitors actual memory usage during processing to prevent overflow

  4. Uses streaming optimization for extremely large datasets (processes all data)

Processing Strategies

  • Single Pass: For datasets fitting within memory limits

  • Sequential Chunking: Processes data in optimal-sized chunks with parameter propagation

  • Streaming Optimization: Mini-batch gradient descent for unlimited datasets (no subsampling)

Multi-Start Optimization

For medium-sized datasets (1M-100M points), multi-start optimization explores multiple starting points on full data, and the best starting point is then used for the full chunked optimization.

Performance Characteristics

  • Maintains <1% parameter error for well-conditioned problems using chunking

  • Achieves 5-50x speedup over naive approaches through memory optimization

  • Scales to datasets of unlimited size using streaming (processes all data)

  • Provides linear time complexity with respect to chunk count

Model Validation Caching (Task Group 7 - 5.1a)

Model functions are validated once per unique function identity using a cache keyed by (id(func), id(func.__code__)). This avoids redundant validation across chunks, providing 1-5% performance gain in chunked processing.

param memory_limit_gb:

Maximum memory usage in GB. System memory is auto-detected if None.

type memory_limit_gb:

float, default 8.0

param config:

Advanced configuration for fine-tuning memory management behavior.

type config:

LDMemoryConfig, optional

param curve_fit_class:

Custom CurveFit instance for specialized fitting requirements.

type curve_fit_class:

nlsq.minpack.CurveFit, optional

param multistart:

Enable multi-start optimization for global search.

type multistart:

bool, default False

param n_starts:

Number of starting points for multi-start optimization.

type n_starts:

int, default 10

param sampler:

Sampling strategy for multi-start: ‘lhs’, ‘sobol’, or ‘halton’.

type sampler:

str, default ‘lhs’

config

Active memory management configuration

Type:

LDMemoryConfig

curve_fitter

Internal curve fitting engine with JAX acceleration

Type:

nlsq.minpack.CurveFit

logger

Internal logging for performance monitoring and debugging

Type:

Logger

fit : Main fitting method with automatic memory management
fit_with_progress : Fitting with real-time progress reporting and ETA
get_memory_recommendations : Pre-fitting memory analysis and strategy recommendations
Important: Chunking-Compatible Model Functions
-----------------------------------------------
When using chunked processing (for datasets > memory limit), your model function
MUST respect the size of xdata. During chunking, xdata will be a subset of the
full dataset, and your model must return output matching that subset size.
\*\*INCORRECT - Model ignores xdata size (will cause shape mismatch errors):**
>>> def bad_model(xdata, a, b):
...     # WRONG: Always returns full array, ignoring xdata size
...     t_full = jnp.arange(10_000_000)  # Fixed size!
...     return a * jnp.exp(-b * t_full)  # Shape mismatch during chunking
\*\*CORRECT - Model respects xdata size:**
>>> def good_model(xdata, a, b):
...     # CORRECT: Uses xdata as indices to return only requested subset
...     indices = xdata.astype(jnp.int32)
...     return a * jnp.exp(-b * indices)  # Shape matches xdata
\*\*Alternative - Direct computation on xdata:**
>>> def direct_model(xdata, a, b):
...     # CORRECT: Operates directly on xdata
...     return a * jnp.exp(-b * xdata)  # Shape automatically matches

Examples

Basic usage with automatic configuration:

>>> import numpy as np
>>> import jax.numpy as jnp
>>>
>>> # 10 million data points
>>> x = np.linspace(0, 10, 10_000_000)
>>> y = 2.5 * jnp.exp(-1.3 * x) + 0.1 + np.random.normal(0, 0.05, len(x))
>>>
>>> fitter = LargeDatasetFitter(memory_limit_gb=4.0)
>>> result = fitter.fit(
...     lambda x, a, b, c: a * jnp.exp(-b * x) + c,
...     x, y, p0=[2, 1, 0]
... )
>>> print(f"Parameters: {result.popt}")
>>> print(f"Chunks used: {result.n_chunks}")

Multi-start optimization:

>>> fitter = LargeDatasetFitter(
...     memory_limit_gb=4.0,
...     multistart=True,
...     n_starts=10,
...     sampler='lhs',
... )
>>> result = fitter.fit(
...     lambda x, a, b, c: a * jnp.exp(-b * x) + c,
...     x, y, p0=[2, 1, 0],
...     bounds=([0, 0, 0], [10, 5, 10])
... )

Advanced configuration with progress monitoring:

>>> config = LDMemoryConfig(
...     memory_limit_gb=8.0,
...     min_chunk_size=10000,
...     max_chunk_size=1000000,
...     use_streaming=True,
...     streaming_batch_size=50000
... )
>>> fitter = LargeDatasetFitter(config=config)
>>>
>>> # Fit with progress bar for long-running operation
>>> result = fitter.fit_with_progress(
...     exponential_model, x_huge, y_huge, p0=[2, 1, 0]
... )

Memory analysis before processing:

>>> recommendations = fitter.get_memory_recommendations(len(x), n_params=3)
>>> print(f"Strategy: {recommendations['processing_strategy']}")
>>> print(f"Memory estimate: {recommendations['memory_estimate_gb']:.2f} GB")
>>> print(f"Recommended chunks: {recommendations['n_chunks']}")

See also

curve_fit_large

High-level function with automatic dataset size detection

LDMemoryConfig

Configuration class for memory management parameters

estimate_memory_requirements

Standalone function for memory estimation

Notes

The sequential chunking algorithm maintains parameter accuracy by using each chunk’s result as the initial guess for the next chunk. This approach typically maintains fitting accuracy within 0.1% of single-pass results for well-conditioned problems while enabling processing of arbitrarily large datasets.

For extremely large datasets, streaming optimization processes all data using mini-batch gradient descent with no subsampling, ensuring zero accuracy loss compared to subsampling approaches (removed in v0.2.0).

__init__(memory_limit_gb=8.0, config=None, curve_fit_class=None, logger=None, multistart=False, n_starts=10, sampler='lhs')[source]

Initialize LargeDatasetFitter.

Parameters:
  • memory_limit_gb (float, optional) – Memory limit in GB (default: 8.0)

  • config (LDMemoryConfig, optional) – Custom memory configuration

  • curve_fit_class (nlsq.minpack.CurveFit, optional) – Custom CurveFit instance to use

  • logger (logging.Logger, optional) – External logger instance for integration with application logging. If None, uses NLSQ’s internal logger. This allows chunk failure warnings to appear in your application’s logs.

  • multistart (bool, optional) – Enable multi-start optimization for global search (default: False). When enabled, explores multiple starting points on full data before running the full chunked optimization.

  • n_starts (int, optional) – Number of starting points for multi-start optimization (default: 10). Set to 0 to disable multi-start even when multistart=True.

  • sampler (str, optional) – Sampling strategy for generating starting points (default: ‘lhs’). Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.

last_stats: DatasetStats | None
fit_history: list[dict]
estimate_requirements(n_points, n_params)[source]

Estimate memory requirements and processing strategy.

Parameters:
  • n_points (int) – Number of data points

  • n_params (int) – Number of parameters to fit

Returns:

Detailed statistics and recommendations

Return type:

DatasetStats

fit_with_progress(f, xdata, ydata, p0=None, bounds=(-inf, inf), method='trf', solver='auto', **kwargs)[source]

Fit curve with progress reporting for long-running fits.

Parameters:
  • f (callable) – The model function f(x, *params) -> y

  • xdata (np.ndarray) – Independent variable data

  • ydata (np.ndarray) – Dependent variable data

  • p0 (array-like, optional) – Initial parameter guess

  • bounds (tuple, optional) – Parameter bounds (lower, upper)

  • method (str, optional) – Optimization method (default: ‘trf’)

  • solver (str, optional) – Solver type (default: ‘auto’)

  • **kwargs – Additional arguments passed to curve_fit

Returns:

Optimization result with fitted parameters and statistics

Return type:

OptimizeResult

memory_monitor()[source]

Context manager for monitoring memory usage during fits.

get_memory_recommendations(n_points, n_params)[source]

Get memory usage recommendations for a dataset.

Parameters:
  • n_points (int) – Number of data points

  • n_params (int) – Number of parameters

Returns:

Recommendations and memory analysis

Return type:

dict

class nlsq.LeastSquares(enable_stability=False, enable_diagnostics=False, max_jacobian_elements_for_svd=10000000)[source]

Bases: object

Core least squares optimization engine with JAX acceleration.

This class implements the main optimization algorithms for nonlinear least squares problems, including Trust Region Reflective (TRF) and Levenberg-Marquardt (LM). It handles automatic differentiation, bound constraints, loss functions, and uncertainty propagation.

The class maintains separate automatic differentiation instances for different sigma configurations (no sigma, 1D sigma, 2D covariance matrix) to optimize compilation and execution performance.

trf

Trust Region Reflective algorithm implementation

Type:

TrustRegionReflective

ls

JIT-compiled loss function implementations

Type:

LossFunctionsJIT

logger

Internal logger for debugging and performance tracking

Type:

Logger

f

Current objective function being optimized

Type:

callable

jac

Current Jacobian function (None for automatic differentiation)

Type:

callable or None

adjn

Automatic differentiation instance for unweighted problems

Type:

AutoDiffJacobian

adj1d

Automatic differentiation instance for 1D sigma weighting

Type:

AutoDiffJacobian

adj2d

Automatic differentiation instance for 2D covariance matrix weighting

Type:

AutoDiffJacobian

least_squares : Main optimization method
__init__(enable_stability=False, enable_diagnostics=False, max_jacobian_elements_for_svd=10000000)[source]

Initialize LeastSquares with optimization algorithms and autodiff instances.

Sets up the Trust Region Reflective solver, loss functions, and separate automatic differentiation instances for different weighting schemes to maximize JAX compilation efficiency.

Parameters:
  • enable_stability (bool, default False) – Enable numerical stability checks and fixes

  • enable_diagnostics (bool, default False) – Enable optimization diagnostics collection

  • max_jacobian_elements_for_svd (int, default 10_000_000) – Maximum Jacobian size (m × n elements) for SVD computation during stability checks. SVD is skipped for larger Jacobians.

least_squares(fun, x0, jac=None, bounds=(-inf, inf), method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, x_scale=1.0, loss='linear', f_scale=1.0, diff_step=None, tr_solver=None, tr_options=None, jac_sparsity=None, max_nfev=None, verbose=0, jacobian_mode=None, xdata=None, ydata=None, data_mask=None, transform=None, timeit=False, callback=None, args=(), kwargs=None, prepared_bounds=None, **timeout_kwargs)[source]

Solve nonlinear least squares problem using JAX-accelerated algorithms.

This method orchestrates the optimization process by calling focused helper methods for each major step: validation, function setup, initial evaluation, stability checks, and optimization execution.

Parameters:
  • fun (callable) – Residual function. Must use jax.numpy operations.

  • x0 (array_like) – Initial parameter guess.

  • jac (callable or None, optional) – Jacobian function. If None, uses JAX autodiff.

  • bounds (2-tuple, optional) – Parameter bounds as (lower, upper).

  • method (str, optional) – Optimization algorithm (‘trf’).

  • ftol (float, optional) – Convergence tolerances for function, parameters, and gradient.

  • xtol (float, optional) – Convergence tolerances for function, parameters, and gradient.

  • gtol (float, optional) – Convergence tolerances for function, parameters, and gradient.

  • x_scale (str or array_like, optional) – Parameter scaling (‘jac’ for automatic).

  • loss (str or callable, optional) – Robust loss function (‘linear’, ‘huber’, ‘soft_l1’, etc.).

  • f_scale (float, optional) – Scale parameter for robust loss functions.

  • max_nfev (int, optional) – Maximum function evaluations.

  • verbose (int, optional) – Verbosity level (0, 1, or 2).

  • jacobian_mode ({'auto', 'fwd', 'rev'}, optional) – Jacobian automatic differentiation mode. If None, uses configuration from environment variable, config file, or auto-default. Default is None. - ‘auto’: Automatically select based on problem dimensions - ‘fwd’: Force forward-mode AD (jacfwd) - ‘rev’: Force reverse-mode AD (jacrev)

  • xdata (array_like, optional) – Data for curve fitting applications.

  • ydata (array_like, optional) – Data for curve fitting applications.

  • data_mask (array_like, optional) – Boolean mask for data exclusion.

  • transform (array_like, optional) – Transformation matrix for weighted fitting.

  • timeit (bool, optional) – Enable detailed timing analysis.

  • callback (callable or None, optional) – Callback function called after each optimization iteration with signature callback(iteration, cost, params, info). Useful for monitoring optimization progress, logging, or implementing custom stopping criteria. If None (default), no callback is invoked.

  • args (tuple, optional) – Additional arguments for objective function.

  • kwargs (dict, optional) – Additional optimization parameters.

  • prepared_bounds (tuple[np.ndarray, np.ndarray] or None, optional) – Pre-prepared bounds as (lb, ub). If provided, skips prepare_bounds call. This optimization avoids redundant bounds preparation when caller has already called prepare_bounds. Default is None.

Returns:

result – Optimization result with solution, convergence info, and statistics.

Return type:

OptimizeResult

autdiff_jac(jac, mode='fwd')[source]

We do this for all three sigma transformed functions such that if sigma is changed from none to 1D to covariance sigma then no retracing is needed.

Parameters:
  • jac (None) – Passed in to maintain compatibility with the user defined Jacobian function.

  • mode (str, optional) – Jacobian mode (‘fwd’ or ‘rev’), by default ‘fwd’

update_function(func)[source]

Wraps the given fit function to be a residual function using the data. The wrapped function is in a JAX JIT compatible format which is purely functional. This requires that both the data mask and the uncertainty transform are passed to the function. Even for the case where the data mask is all True and the uncertainty transform is None we still need to pass these arguments to the function due JAX’s functional nature.

Parameters:

func (Callable) – The fit function to wrap.

Return type:

None

wrap_jac(jac)[source]

Wraps an user defined Jacobian function to allow for data masking and uncertainty transforms. The wrapped function is in a JAX JIT compatible format which is purely functional. This requires that both the data mask and the uncertainty transform are passed to the function.

Using an analytical Jacobian of the fit function is equivalent to the Jacobian of the residual function.

Also note that the analytical Jacobian doesn’t require the independent ydata, but we still need to pass it to the function to maintain compatibility with autdiff version which does require the ydata.

Parameters:

jac (Callable) – The Jacobian function to wrap.

Returns:

The masked Jacobian of the function evaluated at args with respect to the data.

Return type:

jnp.ndarray

class nlsq.MemoryBudget(available_gb, threshold_gb, data_gb, jacobian_gb, peak_gb)[source]

Bases: object

Computed memory budget for optimizer selection.

This immutable dataclass represents the computed memory requirements and available resources for automatic optimizer strategy selection. Use the compute() factory method to create instances.

available_gb

Available system memory in GB (CPU or GPU depending on target).

Type:

float

threshold_gb

Safe memory threshold = available_gb × safety_factor.

Type:

float

data_gb

Memory required for data arrays (x_data, y_data).

Type:

float

jacobian_gb

Memory required for full Jacobian matrix.

Type:

float

peak_gb

Estimated peak memory = data_gb + 1.3 × jacobian_gb + solver overhead.

Type:

float

Examples

>>> budget = MemoryBudget.compute(n_points=10_000_000, n_params=10)
>>> print(f"Available: {budget.available_gb:.1f} GB")
>>> print(f"Peak estimate: {budget.peak_gb:.2f} GB")
>>> print(f"Fits in memory: {budget.fits_in_memory}")
available_gb: float
threshold_gb: float
data_gb: float
jacobian_gb: float
peak_gb: float
property fits_in_memory: bool

Check if estimated peak memory fits within safe threshold.

Returns:

True if peak_gb <= threshold_gb.

Return type:

bool

property data_fits: bool

Check if data arrays alone fit within safe threshold.

Returns:

True if data_gb <= threshold_gb.

Return type:

bool

classmethod compute(n_points, n_params, n_features=1, dtype_bytes=8, safety_factor=0.75, memory_limit_gb=None, use_gpu=False)[source]

Compute memory budget for a given dataset size.

Parameters:
  • n_points (int) – Number of data points.

  • n_params (int) – Number of fit parameters.

  • n_features (int, default=1) – Number of features in x_data (dimensions).

  • dtype_bytes (int, default=8) – Bytes per element (8 for float64, 4 for float32).

  • safety_factor (float, default=0.75) – Memory safety factor (0.75 means use 75% of available).

  • memory_limit_gb (float | None, default=None) – Override memory limit in GB. If None, auto-detect.

  • use_gpu (bool, default=False) – If True, use GPU memory instead of CPU memory.

Returns:

Computed memory budget with all fields populated.

Return type:

MemoryBudget

Raises:

ValueError – If n_points <= 0, n_params <= 0, or safety_factor not in (0, 1].

Examples

>>> budget = MemoryBudget.compute(n_points=1_000_000, n_params=5)
>>> budget.fits_in_memory
True
__init__(available_gb, threshold_gb, data_gb, jacobian_gb, peak_gb)
class nlsq.MemoryBudgetSelector(safety_factor=0.75)[source]

Bases: object

Selects optimal optimizer strategy based on memory budget.

This class computes memory requirements and selects between STREAMING, CHUNKED, and STANDARD strategies based on three sequential memory comparisons.

Decision Tree:
  1. data_gb > threshold_gb → STREAMING (data doesn’t fit)

  2. peak_gb > threshold_gb → CHUNKED (Jacobian doesn’t fit)

  3. else → STANDARD (everything fits)

Parameters:

safety_factor (float, default=0.75) – Memory safety factor (0.75 means use 75% of available memory).

Examples

>>> selector = MemoryBudgetSelector(safety_factor=0.75)
>>> strategy, config = selector.select(n_points=5_000_000, n_params=10)
>>> if strategy == "streaming":
...     # Use HybridStreamingOptimizer with config
...     pass
>>> elif strategy == "chunked":
...     # Use LargeDatasetFitter with config
...     pass
>>> else:
...     # Use standard curve_fit()
...     pass
__init__(safety_factor=0.75)[source]

Initialize selector with safety factor.

Parameters:

safety_factor (float, default=0.75) – Memory safety factor (0.75 means use 75% of available memory).

select(n_points, n_params, n_features=1, memory_limit_gb=None, goal=None, use_gpu=False, verbose=False)[source]

Select optimal optimizer strategy based on memory budget.

Parameters:
  • n_points (int) – Number of data points.

  • n_params (int) – Number of fit parameters.

  • n_features (int, default=1) – Number of features in x_data.

  • memory_limit_gb (float | None, default=None) – Override memory limit in GB. If None, auto-detect.

  • goal (OptimizationGoal | None, default=None) – Optimization goal (affects tolerances, not strategy selection).

  • use_gpu (bool, default=False) – If True, use GPU memory instead of CPU memory.

  • verbose (bool, default=False) – If True, log memory budget details and strategy selection reason.

Returns:

  • strategy: “streaming”, “chunked”, or “standard”

  • config: HybridStreamingConfig, LDMemoryConfig, or None

Return type:

tuple[str, config]

Raises:

ValueError – If n_points <= 0 or n_params <= 0.

class nlsq.MemoryManager(gc_threshold=0.8, safety_factor=1.2, enable_adaptive_safety=False, disable_padding=False, memory_cache_ttl=1.0, adaptive_ttl=True)[source]

Bases: object

Intelligent memory management for optimization algorithms.

This class provides: - Memory usage monitoring and prediction - Array pooling to reduce allocations with LRU eviction - Automatic garbage collection triggers - Context managers for memory-safe operations

LRU Memory Pool (Task Group 7 - 1.2a)

The memory pool uses an OrderedDict to track access order, enabling true LRU (Least Recently Used) eviction when at capacity. This improves cache utilization for frequently accessed array shapes by 5-10%.

Telemetry Circular Buffer (Task Group 9 - 1.3a)

The safety telemetry uses a deque with maxlen=1000 to prevent memory leak in multi-day optimization runs. This maintains the last 1000 telemetry records for adaptive safety factor calculation.

memory_pool

Pool of reusable arrays indexed by (shape, dtype) with LRU tracking

Type:

OrderedDict

allocation_history

History of memory allocations

Type:

list

gc_threshold

Memory usage threshold (0-1) for triggering garbage collection

Type:

float

safety_factor

Safety factor for memory predictions

Type:

float

__init__(gc_threshold=0.8, safety_factor=1.2, enable_adaptive_safety=False, disable_padding=False, memory_cache_ttl=1.0, adaptive_ttl=True)[source]

Initialize memory manager.

Parameters:
  • gc_threshold (float) – Trigger GC when memory usage exceeds this fraction (0-1)

  • safety_factor (float) – Multiply memory requirements by this factor for safety

  • enable_adaptive_safety (bool) – Enable adaptive safety factor reduction (1.2 -> 1.05 after warmup)

  • disable_padding (bool) – Disable padding/bucketing for strict memory environments (Task 5.6). When True: uses exact shapes, sets safety_factor=1.0. Use case: cloud quotas, strict memory limits.

  • memory_cache_ttl (float) – TTL in seconds for cached memory info (default: 1.0). Reduces psutil system call overhead by 90%.

  • adaptive_ttl (bool) – Enable adaptive TTL based on call frequency (default: True). High-frequency callers (>100 calls/sec) get 15s effective TTL. Medium-frequency callers (>10 calls/sec) get 10s effective TTL. Low-frequency callers use the default TTL. Reduces psutil overhead in streaming optimization by 15-20%.

get_available_memory()[source]

Get available memory in bytes.

Returns:

available – Available memory in bytes

Return type:

float

Notes

Uses TTL-based caching to reduce psutil system call overhead by 90%. When adaptive_ttl is enabled, the effective TTL is adjusted based on call frequency to further reduce overhead for streaming optimization.

get_memory_usage_bytes()[source]

Get current memory usage in bytes.

Returns:

usage – Current memory usage in bytes

Return type:

float

Notes

Uses TTL-based caching to reduce psutil system call overhead by 90%.

get_memory_usage_fraction()[source]

Get current memory usage as fraction of total.

Returns:

fraction – Memory usage fraction (0-1)

Return type:

float

Notes

Uses TTL-based caching to reduce psutil system call overhead by 90%.

predict_memory_requirement(n_points, n_params, algorithm='trf', dtype=<class 'jax.numpy.float64'>)[source]

Predict memory requirement for optimization.

Parameters:
  • n_points (int) – Number of data points

  • n_params (int) – Number of parameters

  • algorithm (str) – Algorithm name (‘trf’, ‘lm’, ‘dogbox’)

  • dtype (jnp.dtype, optional) – Data type for computations (default: jnp.float64). Affects memory calculations: float32 uses 4 bytes, float64 uses 8 bytes.

Returns:

bytes_needed – Estimated memory requirement in bytes

Return type:

int

Notes

Memory requirements scale linearly with precision: - float32: 4 bytes per element (50% memory savings) - float64: 8 bytes per element (default, higher precision)

check_memory_availability(bytes_needed)[source]

Check if enough memory is available.

Parameters:

bytes_needed (int) – Memory required in bytes

Returns:

  • available (bool) – Whether enough memory is available

  • message (str) – Descriptive message

Return type:

tuple[bool, str]

memory_guard(bytes_needed)[source]

Context manager to ensure memory availability.

Parameters:

bytes_needed (int) – Required memory in bytes

Raises:

MemoryError – If insufficient memory is available

get_safety_telemetry()[source]

Get safety factor telemetry statistics.

Returns:

telemetry – Safety factor telemetry with: - current_safety_factor: Current safety factor - initial_safety_factor: Initial safety factor (1.2) - min_safety_factor: Target minimum (1.05) - telemetry_entries: Number of telemetry entries collected - p95_safety_needed: 95th percentile of safety factors needed (if data available) - safety_factor_history: List of safety factors over time

Return type:

dict

allocate_array(shape, dtype=<class 'numpy.float64'>, zero=True)[source]

Allocate array with memory pooling and LRU tracking.

Parameters:
  • shape (tuple) – Shape of array to allocate

  • dtype (type) – Data type of array

  • zero (bool) – Whether to zero-initialize the array

Returns:

array – Allocated array

Return type:

np.ndarray

Raises:

MemoryError – If allocation fails

Notes

Task Group 7 (1.2a): Uses LRU tracking via OrderedDict. When an array is reused from the pool, it is moved to the end (most recently used) to enable proper LRU eviction.

free_array(arr)[source]

Return array to pool for reuse.

Parameters:

arr (np.ndarray) – Array to free

Notes

Task Group 7 (1.2a): Uses LRU tracking via OrderedDict. The returned array is added/moved to the end of the pool, marking it as recently used.

clear_pool()[source]

Clear memory pool and run garbage collection.

get_memory_stats()[source]

Get memory usage statistics.

Returns:

stats – Memory statistics including current usage, peak, pool size

Return type:

dict

optimize_memory_pool(max_arrays=100)[source]

Optimize memory pool using LRU eviction.

Parameters:

max_arrays (int) – Maximum number of arrays to keep in pool

Notes

Task Group 7 (1.2a): Uses LRU eviction via popitem(last=False). Arrays are evicted in order of least recent use, keeping the most recently used arrays in the pool.

temporary_allocation(shape, dtype=<class 'numpy.float64'>)[source]

Context manager for temporary array allocation.

Parameters:
  • shape (tuple) – Shape of array

  • dtype (type) – Data type

Yields:

array (np.ndarray) – Temporary array that will be returned to pool on exit

estimate_chunking_strategy(n_points, n_params, algorithm='trf', memory_limit_gb=None)[source]

Estimate optimal chunking strategy for large datasets.

Parameters:
  • n_points (int) – Number of data points

  • n_params (int) – Number of parameters

  • algorithm (str) – Algorithm to use

  • memory_limit_gb (float, optional) – Memory limit in GB (uses available memory if None)

Returns:

strategy – Chunking strategy with chunk_size and n_chunks

Return type:

dict

class nlsq.MemoryPool(max_pool_size=10, enable_stats=False, enable_bucketing=True)[source]

Bases: object

Memory pool for reusable array buffers.

Pre-allocates buffers for common array shapes to avoid repeated allocations during optimization iterations.

pools

Dictionary mapping (shape, dtype) to list of available buffers

Type:

dict

allocated

Dictionary tracking allocated buffers

Type:

dict

max_pool_size

Maximum number of buffers per shape/dtype combination

Type:

int

stats

Statistics on pool usage

Type:

dict

__init__(max_pool_size=10, enable_stats=False, enable_bucketing=True)[source]

Initialize memory pool.

Parameters:
  • max_pool_size (int) – Maximum number of buffers to keep per shape/dtype

  • enable_stats (bool) – Track allocation statistics

  • enable_bucketing (bool) – Enable size-class bucketing for better reuse (Task 5.4)

allocate(shape, dtype=<class 'jax.numpy.float64'>)[source]

Allocate array from pool or create new one.

Parameters:
  • shape (tuple) – Shape of array to allocate

  • dtype (type) – Data type of array

Returns:

array – Allocated array (may be reused from pool)

Return type:

jnp.ndarray

Notes

When bucketing is enabled, arrays are pooled by size classes (1KB/10KB/100KB) for better reuse rates (Task 5.4).

release(arr)[source]

Return array to pool for reuse.

Parameters:

arr (jnp.ndarray) – Array to return to pool

Notes

When bucketing is enabled, arrays are stored in size-class buckets for better reuse (Task 5.4).

clear()[source]

Clear all pools and reset statistics.

get_stats()[source]

Get pool usage statistics.

Returns:

stats – Pool usage statistics including reuse_rate (Task 5.5)

Return type:

dict

Notes

reuse_rate = reused_allocations / total_allocations With bucketing enabled, expect 5x higher reuse rates.

__enter__()[source]

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)[source]

Context manager exit - clear pool.

class nlsq.OptimizationDiagnostics(verbosity=1)[source]

Bases: object

Comprehensive optimization diagnostics and reporting.

Tracks: - Iteration data (parameters, cost, gradients) - Convergence metrics - Memory usage - Timing information - Numerical stability indicators

__init__(verbosity=1)[source]

Initialize diagnostics system.

Parameters:

verbosity (int) – Verbosity level: - 0: Minimal diagnostics (no condition number computation) - 1: Normal (cheap 1-norm condition estimate, O(nm)) - 2: Detailed (full SVD condition number, O(mn²))

start_optimization(x0=None, problem_name='optimization', *, n_params=None, n_data=None, method=None, loss=None)[source]

Initialize diagnostics for new optimization.

Parameters:
  • x0 (np.ndarray, optional) – Initial parameters (legacy API)

  • problem_name (str) – Name for this optimization problem

  • n_params (int, optional) – Number of parameters (new API from LeastSquares)

  • n_data (int, optional) – Number of data points (new API from LeastSquares)

  • method (str, optional) – Optimization method (new API from LeastSquares)

  • loss (str, optional) – Loss function name (new API from LeastSquares)

record_iteration(iteration, x, cost, gradient=None, jacobian=None, step_size=None, method_info=None)[source]

Record data for current iteration.

Parameters:
  • iteration (int) – Iteration number

  • x (np.ndarray) – Current parameters

  • cost (float) – Current cost value

  • gradient (np.ndarray, optional) – Current gradient

  • jacobian (np.ndarray, optional) – Current Jacobian matrix

  • step_size (float, optional) – Step size taken

  • method_info (dict, optional) – Algorithm-specific information

record_event(event_type, data=None)[source]

Record a special event during optimization.

Parameters:
  • event_type (str) – Type of event (e.g., ‘recovery_success’, ‘recovery_failed’)

  • data (dict, optional) – Additional event data

get_summary_statistics()[source]

Get summary statistics for the optimization.

Returns:

stats – Summary statistics

Return type:

dict

generate_report(verbose=True)[source]

Generate human-readable optimization report.

Parameters:

verbose (bool) – Whether to include detailed information

Returns:

report – Formatted report

Return type:

str

plot_convergence(save_path=None)[source]

Plot convergence history.

Parameters:

save_path (str, optional) – Path to save plot

class nlsq.OptimizationRecovery(max_retries=3, enable_diagnostics=True)[source]

Bases: object

Automatic recovery from optimization failures.

This class provides multiple recovery strategies for handling optimization failures including: - Parameter perturbation - Algorithm switching - Regularization adjustment - Problem reformulation - Multi-start optimization

max_retries

Maximum number of recovery attempts

Type:

int

strategies

List of recovery strategies to try

Type:

list

diagnostics

Diagnostics collector for monitoring

Type:

nlsq.diagnostics.OptimizationDiagnostics

stability_guard

Numerical stability checker

Type:

nlsq.stability.NumericalStabilityGuard

__init__(max_retries=3, enable_diagnostics=True)[source]

Initialize recovery system.

Parameters:
  • max_retries (int) – Maximum recovery attempts

  • enable_diagnostics (bool) – Enable diagnostic collection

recover_from_failure(failure_type, optimization_state, optimization_func, **kwargs)[source]

Attempt recovery from optimization failure.

Parameters:
  • failure_type (str) – Type of failure (‘convergence’, ‘numerical’, ‘memory’, etc.)

  • optimization_state (dict) – Current state of optimization

  • optimization_func (callable) – Function to retry optimization

  • **kwargs – Additional arguments for optimization function

Returns:

  • success (bool) – Whether recovery succeeded

  • result (dict) – Recovered optimization result or error info

Return type:

tuple[bool, dict]

class nlsq.OptimizeResult[source]

Bases: dict

Optimization result container for NLSQ curve fitting operations.

This class stores the complete results from nonlinear least squares optimization performed using JAX-accelerated algorithms. It extends dict to provide both dictionary-style and attribute-style access to optimization results.

Core Attributes

xjax.numpy.ndarray or numpy.ndarray

Optimized parameter vector containing the final fitted parameters. These represent the solution to the nonlinear least squares problem.

successbool

Indicates whether the optimization terminated successfully. True means convergence criteria were satisfied within tolerance limits.

statusint

Numerical termination status code indicating why optimization stopped:

  • 1: Gradient convergence (||g||_inf < gtol)

  • 2: Step size convergence (||dx||/||x|| < xtol)

  • 3: Function value convergence (delta_f/f < ftol)

  • 0: Maximum iterations reached

  • -1: Evaluation limit exceeded

  • -3: Inner loop iteration limit (algorithm-specific)

messagestr

Human-readable description of termination cause. Provides detailed information about convergence status or failure reasons.

Objective Function Results

funjax.numpy.ndarray

Final residual vector f(x) at the solution. For curve fitting, these are the differences between model predictions and data points.

costfloat

Final cost function value: 0.5 * ||f(x)||² for standard least squares, or 0.5 * sum(ρ(f_i²/σ²)) for robust loss functions.

jacjax.numpy.ndarray

Final Jacobian matrix J(x) with shape (m, n) where m is number of data points and n is number of parameters. Computed using JAX autodiff.

gradjax.numpy.ndarray

Final gradient vector g = J^T * f with shape (n,). Used for convergence checking and parameter uncertainty estimation.

Convergence Metrics

optimalityfloat

Final gradient norm ||g||_inf used for convergence assessment. Should be less than gtol for successful convergence.

active_masknumpy.ndarray

Boolean mask indicating which parameters hit bounds (for bounded optimization). Shape (n,) with True for parameters at constraints.

Iteration Statistics

nfevint

Total number of objective function evaluations during optimization. Each evaluation computes residuals f(x) for given parameters.

njevint

Total number of Jacobian evaluations. With JAX autodiff, this equals the number of combined function+gradient evaluations.

nitint

Number of optimization iterations completed. Not always available for all algorithms.

Algorithm-Specific Results

pcovjax.numpy.ndarray, optional

Parameter covariance matrix with shape (n, n). Provides parameter uncertainty estimates. Available when uncertainty estimation is requested. Computed as: pcov = inv(J^T * J) * residual_variance

active_masknumpy.ndarray

For bounded optimization, indicates which parameters are at bounds.

all_timesdict, optional

Detailed timing information for algorithm profiling. Contains timing data for different optimization phases (function evaluation, Jacobian computation, linear algebra operations, etc.).

Usage Examples

Basic result access:

import nlsq

# Perform curve fitting
result = nlsq.curve_fit(model_func, x_data, y_data, p0=initial_guess)

# Access optimized parameters
fitted_params = result.x

# Check convergence
if result.success:
    print(f"Optimization converged: {result.message}")
    print(f"Final cost: {result.cost}")
    print(f"Function evaluations: {result.nfev}")
else:
    print(f"Optimization failed: {result.message}")

# Parameter uncertainties (if covariance computed)
if hasattr(result, 'pcov'):
    param_errors = jnp.sqrt(jnp.diag(result.pcov))
    print(f"Parameter uncertainties: {param_errors}")

Advanced result inspection:

# Examine residuals and fit quality
final_residuals = result.fun
rms_error = jnp.sqrt(jnp.mean(final_residuals**2))

# Check gradient convergence
gradient_norm = result.optimality
print(f"Final gradient norm: {gradient_norm}")

# Analyze Jacobian condition
jacobian = result.jac
condition_number = jnp.linalg.cond(jacobian)
print(f"Jacobian condition number: {condition_number}")

# For bounded problems, check active constraints
if hasattr(result, 'active_mask'):
    constrained_params = jnp.where(result.active_mask)[0]
    print(f"Parameters at bounds: {constrained_params}")

Integration with SciPy

This class maintains compatibility with scipy.optimize.OptimizeResult while adding JAX-specific features and NLSQ-specific results. It can be used interchangeably with SciPy optimization results in most contexts.

Technical Notes

  • All JAX arrays are automatically converted to NumPy arrays for compatibility

  • Covariance matrices use double precision for numerical stability

  • Large dataset results may include memory management statistics

  • GPU timing results require explicit timing mode activation

  • Progress monitoring data is stored in algorithm-specific attributes

exception nlsq.OptimizeWarning[source]

Bases: UserWarning

Warning class for optimization-related issues.

This warning is raised when non-critical issues are encountered during optimization, such as unknown solver options, convergence concerns, or numerical stability warnings that don’t prevent the optimization from completing but should be brought to the user’s attention.

Common scenarios:
  • Unknown or deprecated solver options passed to optimizer

  • Convergence achieved but with warnings about numerical conditioning

  • Parameter bounds adjusted automatically

  • Automatic algorithm selection overrides

Example

>>> import warnings
>>> warnings.filterwarnings('error', category=OptimizeWarning)
>>> # Now OptimizeWarning will raise an exception instead of warning

See also

nlsq.error_messages.OptimizationError : Exception for critical failures

class nlsq.PerformanceProfiler[source]

Bases: object

Performance profiler for NLSQ optimization.

Tracks and analyzes performance metrics across optimization runs.

Examples

>>> profiler = PerformanceProfiler()
>>> with profiler.profile("my_optimization"):
...     result = curve_fit(model, x, y, p0=[1, 2])
>>>
>>> report = profiler.get_report()
>>> print(report)
__init__()[source]

Initialize performance profiler.

start_profile(name='default')[source]

Start profiling a new optimization run.

Parameters:

name (str) – Name for this profiling session

Returns:

metrics – Metrics object for this profile

Return type:

ProfileMetrics

end_profile(metrics=None)[source]

End current profiling session.

Parameters:

metrics (ProfileMetrics, optional) – Metrics to finalize. If None, uses current profile.

profile(name='default')[source]

Context manager for profiling.

Parameters:

name (str) – Name for this profiling session

Examples

>>> profiler = PerformanceProfiler()
>>> with profiler.profile("test_1"):
...     result = curve_fit(model, x, y)
record_timing(category, duration)[source]

Record timing for a specific category.

Parameters:
  • category (str) – Timing category (e.g., ‘jit_compile’, ‘optimization’)

  • duration (float) – Duration in seconds

update_current(**kwargs)[source]

Update current profile with arbitrary metrics.

Parameters:

**kwargs – Metrics to update

get_metrics(name='default')[source]

Get all metrics for a named profile.

Parameters:

name (str) – Profile name

Returns:

metrics – All metrics for this profile

Return type:

list of ProfileMetrics

get_summary(name='default')[source]

Get summary statistics for a named profile.

Parameters:

name (str) – Profile name

Returns:

summary – Summary statistics

Return type:

dict

get_report(name='default', detailed=False)[source]

Generate a formatted performance report.

Parameters:
  • name (str) – Profile name

  • detailed (bool) – Include detailed metrics

Returns:

report – Formatted report

Return type:

str

compare_profiles(name1, name2)[source]

Compare two profiling sessions.

Parameters:
  • name1 (str) – Names of profiles to compare

  • name2 (str) – Names of profiles to compare

Returns:

comparison – Comparison metrics

Return type:

dict

clear(name=None)[source]

Clear profiling data.

Parameters:

name (str, optional) – Name of profile to clear. If None, clears all.

export_to_dict()[source]

Export all profiling data to dictionary.

Returns:

data – All profiling data

Return type:

dict

class nlsq.ProfileMetrics(total_time=0.0, jit_compile_time=0.0, optimization_time=0.0, jacobian_time=0.0, n_iterations=0, n_function_evals=0, n_jacobian_evals=0, n_data_points=0, n_parameters=0, data_dimension=1, final_cost=0.0, initial_cost=0.0, cost_reduction=0.0, final_gradient_norm=0.0, success=False, method='', backend='cpu', metadata=<factory>)[source]

Bases: object

Container for performance metrics from a single optimization run.

total_time: float
jit_compile_time: float
optimization_time: float
jacobian_time: float
n_iterations: int
n_function_evals: int
n_jacobian_evals: int
n_data_points: int
n_parameters: int
data_dimension: int
final_cost: float
initial_cost: float
cost_reduction: float
final_gradient_norm: float
success: bool
method: str
backend: str
metadata: dict
speedup_vs_scipy()[source]

Estimate speedup vs SciPy (rough heuristic).

iterations_per_second()[source]

Calculate iterations per second.

function_evals_per_second()[source]

Calculate function evaluations per second.

to_dict()[source]

Convert to dictionary.

__init__(total_time=0.0, jit_compile_time=0.0, optimization_time=0.0, jacobian_time=0.0, n_iterations=0, n_function_evals=0, n_jacobian_evals=0, n_data_points=0, n_parameters=0, data_dimension=1, final_cost=0.0, initial_cost=0.0, cost_reduction=0.0, final_gradient_norm=0.0, success=False, method='', backend='cpu', metadata=<factory>)
class nlsq.ProfilerVisualization(profiler)[source]

Bases: object

Visualization tools for performance profiling data.

Examples

>>> from nlsq.utils.profiler import get_global_profiler
>>> from nlsq.utils.profiler_visualization import ProfilerVisualization
>>> profiler = get_global_profiler()
>>> viz = ProfilerVisualization(profiler)
>>> fig = viz.plot_timing_comparison(["test1", "test2"])
>>> viz.save_html_report("report.html")
__init__(profiler)[source]

Initialize visualization tools.

Parameters:

profiler (PerformanceProfiler) – Profiler instance to visualize

plot_timing_series(name='default', figsize=(10, 6))[source]

Plot timing metrics across runs.

Parameters:
  • name (str) – Profile name to visualize

  • figsize (tuple) – Figure size (width, height)

Returns:

fig – Matplotlib figure or None if matplotlib not available

Return type:

Figure or None

plot_timing_comparison(names, figsize=(10, 6))[source]

Compare timing metrics across multiple profiles.

Parameters:
  • names (list of str) – Profile names to compare

  • figsize (tuple) – Figure size (width, height)

Returns:

fig – Matplotlib figure or None if matplotlib not available

Return type:

Figure or None

plot_timing_distribution(name='default', figsize=(10, 6))[source]

Plot distribution of timing metrics.

Parameters:
  • name (str) – Profile name to visualize

  • figsize (tuple) – Figure size (width, height)

Returns:

fig – Matplotlib figure or None if matplotlib not available

Return type:

Figure or None

generate_html_report(output_path=None)[source]

Generate HTML report with all profiling data.

Parameters:

output_path (str or Path, optional) – Path to save HTML file. If None, returns HTML string.

Returns:

html – HTML report content

Return type:

str

export_json(output_path)[source]

Export profiling data to JSON.

Parameters:

output_path (str or Path) – Path to save JSON file

export_csv(name, output_path)[source]

Export profile data to CSV.

Parameters:
  • name (str) – Profile name to export

  • output_path (str or Path) – Path to save CSV file

class nlsq.ProfilingDashboard(profiler)[source]

Bases: object

Interactive dashboard for profiling data.

Examples

>>> from nlsq.utils.profiler import get_global_profiler
>>> from nlsq.utils.profiler_visualization import ProfilingDashboard
>>> profiler = get_global_profiler()
>>> dashboard = ProfilingDashboard(profiler)
>>> dashboard.add_profile("test1")
>>> dashboard.add_profile("test2")
>>> dashboard.generate_comparison_report()
__init__(profiler)[source]

Initialize dashboard.

Parameters:

profiler (PerformanceProfiler) – Profiler instance to visualize

add_profile(name)[source]

Add a profile to track in the dashboard.

Parameters:

name (str) – Profile name to track

remove_profile(name)[source]

Remove a profile from tracking.

Parameters:

name (str) – Profile name to remove

generate_comparison_report()[source]

Generate comparison report for tracked profiles.

Returns:

report – Formatted comparison report

Return type:

str

plot_all_comparisons(figsize=(15, 10))[source]

Generate comprehensive comparison plots.

Parameters:

figsize (tuple) – Figure size (width, height)

Returns:

fig – Matplotlib figure or None if matplotlib not available

Return type:

Figure or None

save_dashboard(output_dir)[source]

Save complete dashboard to directory.

Parameters:

output_dir (str or Path) – Directory to save dashboard files

class nlsq.ProgressBar(max_nfev=None, desc='Optimizing', **tqdm_kwargs)[source]

Bases: CallbackBase

Progress bar callback using tqdm.

Displays a progress bar showing optimization progress with current cost, gradient norm, and iteration statistics.

Parameters:
  • max_nfev (int, optional) – Maximum number of function evaluations. If provided, progress bar will be based on nfev. Otherwise, shows indefinite progress.

  • desc (str, optional) – Description to display in progress bar. Default: “Optimizing”

  • **tqdm_kwargs – Additional keyword arguments passed to tqdm

Examples

>>> from nlsq import curve_fit
>>> from nlsq.callbacks import ProgressBar
>>> callback = ProgressBar(max_nfev=100)
>>> popt, pcov = curve_fit(f, x, y, callback=callback)
__init__(max_nfev=None, desc='Optimizing', **tqdm_kwargs)[source]
__call__(iteration, cost, params, info)[source]

Update progress bar.

close()[source]

Close progress bar.

__del__()[source]

Ensure progress bar is closed on deletion.

class nlsq.RobustDecomposition(enable_logging=False)[source]

Bases: object

Multi-level fallback for matrix decompositions.

This class provides robust matrix decomposition methods that automatically fall back through multiple strategies if the primary method fails. The fallback chain goes from: 1. JAX on GPU (if available) 2. JAX on CPU 3. SciPy (if available) 4. NumPy 5. Safe mode with regularization

fallback_chain

Ordered list of (name, method) tuples for fallback strategies

Type:

list

logger

Logger for debugging decomposition issues

Type:

logging.Logger

__init__(enable_logging=False)[source]

Initialize robust decomposition handler.

Parameters:

enable_logging (bool) – Whether to enable detailed logging of fallback attempts

svd(matrix, full_matrices=False)[source]

Compute SVD with automatic fallback.

Parameters:
  • matrix (jnp.ndarray) – Matrix to decompose

  • full_matrices (bool) – Whether to compute full matrices

Returns:

  • U (jnp.ndarray) – Left singular vectors

  • s (jnp.ndarray) – Singular values

  • Vt (jnp.ndarray) – Right singular vectors (transposed)

Raises:

RuntimeError – If all decomposition methods fail

Return type:

tuple[Array, Array, Array]

qr(matrix, mode='reduced')[source]

Compute QR decomposition with fallback.

Parameters:
  • matrix (jnp.ndarray) – Matrix to decompose

  • mode (str) – QR mode (‘reduced’ or ‘complete’)

Returns:

  • Q (jnp.ndarray) – Orthogonal matrix

  • R (jnp.ndarray) – Upper triangular matrix

Raises:

RuntimeError – If all decomposition methods fail

Return type:

tuple[Array, Array]

cholesky(matrix, lower=True)[source]

Compute Cholesky decomposition with fallback and regularization.

Parameters:
  • matrix (jnp.ndarray) – Positive definite matrix to decompose

  • lower (bool) – Whether to return lower triangular matrix

Returns:

L – Cholesky factor (lower or upper triangular)

Return type:

jnp.ndarray

Raises:

RuntimeError – If all decomposition methods fail

solve_least_squares(A, b)[source]

Solve least squares problem with robust decomposition.

Parameters:
  • A (jnp.ndarray) – Coefficient matrix

  • b (jnp.ndarray) – Right-hand side

Returns:

x – Solution vector

Return type:

jnp.ndarray

class nlsq.SmartCache(cache_dir='.nlsq_cache', max_memory_items=1000, disk_cache_enabled=True, enable_stats=True)[source]

Bases: object

Intelligent caching system for optimization computations.

This class provides:

  • Memory and disk caching with LRU eviction

  • Automatic cache key generation from function arguments

  • Cache persistence across sessions

  • Cache invalidation and warming strategies

Phase 3 Optimizations (3.2a):

  • Array hash optimization: uses stride-based sampling only for arrays with >10000 elements when xxhash is unavailable

  • For smaller arrays, hashes full array directly without redundant sampling, providing 15-20% improvement in cache key generation

All dict operations are protected by a per-instance threading.Lock so that concurrent threads can safely call get/set/invalidate.

cache_dir

Directory for disk cache storage

Type:

str

memory_cache

In-memory cache storage

Type:

dict

disk_cache_enabled

Whether disk caching is enabled

Type:

bool

max_memory_items

Maximum items in memory cache

Type:

int

cache_stats

Cache hit/miss statistics

Type:

dict

__init__(cache_dir='.nlsq_cache', max_memory_items=1000, disk_cache_enabled=True, enable_stats=True)[source]

Initialize smart cache.

Parameters:
  • cache_dir (str) – Directory for disk cache

  • max_memory_items (int) – Maximum items in memory cache

  • disk_cache_enabled (bool) – Enable disk caching

  • enable_stats (bool) – Track cache statistics

cache_key(*args, **kwargs)[source]

Generate cache key from arguments.

Parameters:
  • *args (tuple) – Positional arguments

  • **kwargs (dict) – Keyword arguments

Returns:

key – Hash of arguments (xxhash if available, BLAKE2b fallback)

Return type:

str

Notes

Uses xxhash (xxh64) when available for ~10x faster hashing compared to SHA256/BLAKE2b. Falls back to BLAKE2b if xxhash is not installed. All cache keys are prefixed with CACHE_VERSION to ensure old cache entries are invalidated when the hash algorithm changes.

Task 9.3 (3.2a): Array hash optimization - Arrays <= 10000 elements: hash full array directly (no sampling) - Arrays > 10000 elements: use stride-based sampling for efficiency - Removes redundant sampling when computing full hash in fallback path

get(key)[source]

Get value from cache.

Parameters:

key (str) – Cache key

Returns:

value – Cached value or None if not found

Return type:

Any or None

set(key, value)[source]

Set value in cache.

Parameters:
  • key (str) – Cache key

  • value (Any) – Value to cache

invalidate(key=None)[source]

Invalidate cache entries.

Parameters:

key (str, optional) – Specific key to invalidate, or None to clear all

get_stats()[source]

Get cache statistics.

Returns:

stats – Cache statistics including hit rate

Return type:

dict

optimize_cache()[source]

Optimize cache by removing rarely accessed items.

Computes threshold from snapshot, then re-checks live counts under lock before invalidating to avoid evicting keys that became hot between the snapshot and eviction.

class nlsq.SparseJacobianComputer(sparsity_threshold=0.01)[source]

Bases: object

Compute and manage sparse Jacobians for large-scale problems.

For many curve fitting problems, the Jacobian has a sparse structure where each data point only depends on a subset of parameters. This class exploits that structure to reduce memory usage by 10-100x.

__init__(sparsity_threshold=0.01)[source]

Initialize sparse Jacobian computer.

Parameters:

sparsity_threshold (float) – Elements with absolute value below this threshold are considered zero. Default is 0.01 which works well for most problems.

detect_sparsity_pattern(func, x0, xdata_sample, n_samples=100)[source]

Detect sparsity pattern of Jacobian from sample evaluations.

Parameters:
  • func (Callable) – Function to evaluate

  • x0 (np.ndarray) – Initial parameter values

  • xdata_sample (np.ndarray or list) – Sample of x data points. Can be a single array for 1D problems, a list of arrays [X, Y] for 2D problems, or a 2D array with shape (k, N) for multi-dimensional coordinates.

  • n_samples (int) – Number of samples to use for pattern detection

Returns:

  • pattern (np.ndarray) – Boolean array indicating non-zero elements

  • sparsity (float) – Fraction of zero elements

Return type:

tuple[ndarray, float]

compute_sparse_jacobian(jac_func, x, xdata, ydata, data_mask=None, chunk_size=10000, func=None)[source]

Compute Jacobian in sparse format with chunking.

Parameters:
  • jac_func (Callable) – Jacobian function

  • x (np.ndarray) – Current parameter values

  • xdata (np.ndarray) – Independent variable data

  • ydata (np.ndarray) – Dependent variable data

  • data_mask (np.ndarray, optional) – Mask for valid data points

  • chunk_size (int) – Size of chunks for computation

  • func (Callable, optional) – Original function for finite difference fallback

Returns:

J_sparse – Sparse Jacobian matrix

Return type:

csr_matrix

sparse_matrix_vector_product(J_sparse, v)[source]

Efficient sparse matrix-vector product.

Parameters:
  • J_sparse (csr_matrix) – Sparse Jacobian matrix

  • v (np.ndarray) – Vector to multiply

Returns:

result – J @ v

Return type:

np.ndarray

sparse_normal_equations(J_sparse, f)[source]

Set up normal equations with sparse Jacobian.

Solves (J^T @ J) @ p = -J^T @ f without forming J^T @ J explicitly.

Parameters:
  • J_sparse (csr_matrix) – Sparse Jacobian matrix

  • f (np.ndarray) – Residual vector

Returns:

  • matvec (callable) – Function that computes (J^T @ J) @ v

  • rhs (np.ndarray) – Right-hand side -J^T @ f

Return type:

tuple[Callable, ndarray]

estimate_memory_usage(n_data, n_params, sparsity=0.99)[source]

Estimate memory usage for sparse vs dense Jacobian.

Parameters:
  • n_data (int) – Number of data points

  • n_params (int) – Number of parameters

  • sparsity (float) – Fraction of zero elements (0-1)

Returns:

memory_info – Memory usage estimates in GB

Return type:

dict

class nlsq.SparseOptimizer(sparsity_threshold=0.01, min_sparsity=0.9, auto_detect=True)[source]

Bases: object

Optimizer that uses sparse Jacobians for large-scale problems.

This optimizer automatically detects when sparse Jacobians would be beneficial and switches to sparse computations transparently.

__init__(sparsity_threshold=0.01, min_sparsity=0.9, auto_detect=True)[source]

Initialize sparse optimizer.

Parameters:
  • sparsity_threshold (float) – Threshold for considering elements as zero

  • min_sparsity (float) – Minimum sparsity level to use sparse methods

  • auto_detect (bool) – Automatically detect and use sparsity

should_use_sparse(n_data, n_params, force_check=False)[source]

Determine if sparse methods should be used.

Parameters:
  • n_data (int) – Number of data points

  • n_params (int) – Number of parameters

  • force_check (bool) – Force sparsity detection even if auto_detect is False

Returns:

use_sparse – Whether to use sparse methods

Return type:

bool

optimize_with_sparsity(func, x0, xdata, ydata, **kwargs)[source]

Optimize using sparse Jacobian methods.

Parameters:
  • func (Callable) – Objective function

  • x0 (np.ndarray) – Initial parameters

  • xdata (np.ndarray) – Independent variable data

  • ydata (np.ndarray) – Dependent variable data

  • **kwargs (Any) – Additional optimization parameters

Returns:

result – Optimization result

Return type:

dict

exception nlsq.StopOptimization[source]

Bases: Exception

Exception raised by callbacks to request early termination.

class nlsq.TRFMemoryPool(m, n, dtype=<class 'jax.numpy.float64'>)[source]

Bases: object

Specialized memory pool for Trust Region Reflective algorithm.

Pre-allocates buffers for common TRF operations.

Parameters:
  • m (int) – Number of residuals

  • n (int) – Number of parameters

  • dtype (type) – Data type for arrays

__init__(m, n, dtype=<class 'jax.numpy.float64'>)[source]

Initialize TRF memory pool.

Parameters:
  • m (int) – Number of residuals

  • n (int) – Number of parameters

  • dtype (type) – Data type

get_jacobian_buffer()[source]

Get Jacobian buffer (m×n).

get_residual_buffer()[source]

Get residual buffer (m).

get_gradient_buffer()[source]

Get gradient buffer (n).

get_step_buffer()[source]

Get step buffer (n).

get_x_buffer()[source]

Get parameter buffer (n).

reset()[source]

Reset all buffers to zero.

nlsq.apply_automatic_fixes(xdata, ydata, p0=None, stability_report=None, rescale_data=True)[source]

Automatically apply fixes for detected stability issues.

This function applies common fixes such as rescaling data and parameters based on the stability report.

Parameters:
  • xdata (array_like) – Independent variable data

  • ydata (array_like) – Dependent variable data

  • p0 (array_like, optional) – Initial parameter guess

  • stability_report (dict, optional) – Report from check_problem_stability(). If None, will be computed.

  • rescale_data (bool, optional) – If True (default), rescale xdata/ydata to [0, 1] when ill-conditioned or large range is detected. Set to False for applications requiring unit preservation where data must maintain physical units (e.g., time in seconds, frequency in Hz). NaN/Inf handling and parameter normalization are still applied when stability=’auto’. Default: True.

Returns:

  • xdata_fixed (ndarray) – Fixed xdata

  • ydata_fixed (ndarray) – Fixed ydata

  • p0_fixed (ndarray or None) – Fixed p0

  • fix_info (dict) – Information about applied fixes with keys: - ‘applied_fixes’: list of str - ‘x_scale’: float - ‘y_scale’: float - ‘x_offset’: float - ‘y_offset’: float

Return type:

tuple[ndarray, ndarray, ndarray | None, dict]

Examples

>>> x = np.linspace(0, 1e6, 100)
>>> y = 2.0 * x + 1.0
>>> x_fixed, y_fixed, p0_fixed, info = apply_automatic_fixes(x, y, [2.0, 1.0])
>>> print(f"Applied fixes: {info['applied_fixes']}")
>>> # For applications requiring unit preservation, disable rescaling
>>> x_fixed, y_fixed, p0_fixed, info = apply_automatic_fixes(
...     x, y, [2.0, 1.0], rescale_data=False
... )
nlsq.auto_select_algorithm(f, xdata, ydata, p0=None, bounds=None, memory_limit_gb=None, user_preferences=None)[source]

Automatically select best optimization algorithm.

Parameters:
  • f (callable) – Model function

  • xdata (np.ndarray) – Independent variable

  • ydata (np.ndarray) – Dependent variable

  • p0 (np.ndarray, optional) – Initial guess

  • bounds (tuple, optional) – Parameter bounds

  • memory_limit_gb (float, optional) – Memory limit

  • user_preferences (dict, optional) – User preferences

Returns:

recommendations – Algorithm recommendations

Return type:

dict

nlsq.cached_function(cache=None, ttl=None)[source]

Decorator for caching function results.

Parameters:
  • cache (SmartCache, optional) – Cache instance to use (creates new if None)

  • ttl (float, optional) – Time-to-live in seconds for cached values

Returns:

decorator – Decorator function

Return type:

function

nlsq.cached_jacobian(cache=None)[source]

Decorator specifically for caching Jacobian evaluations.

Parameters:

cache (SmartCache, optional) – Cache instance to use

Returns:

decorator – Decorator function

Return type:

function

nlsq.cached_jit(func=None, static_argnums=(), donate_argnums=())[source]

Decorator for caching JIT-compiled functions.

Parameters:
  • func (callable, optional) – Function to decorate

  • static_argnums (tuple of int) – Indices of static arguments

  • donate_argnums (tuple of int) – Indices of arguments to donate

Returns:

decorated – Decorated function with cached compilation

Return type:

callable

Examples

>>> @cached_jit
... def my_function(x):
...     return x ** 2
>>> @cached_jit(static_argnums=(1,))
... def my_function_with_static(x, n):
...     return x ** n
nlsq.check_problem_stability(xdata, ydata, p0=None, f=None)[source]

Comprehensive pre-flight stability check for optimization problem.

Identifies potential numerical issues before optimization begins, providing warnings and recommendations for fixes.

Parameters:
  • xdata (array_like) – Independent variable data

  • ydata (array_like) – Dependent variable data

  • p0 (array_like, optional) – Initial parameter guess

  • f (callable, optional) – Model function (currently unused, reserved for future checks)

Returns:

report – Stability report with keys: - ‘issues’: list of (issue_type, message, severity) tuples - ‘condition_number’: float - ‘parameter_scale_ratio’: float or None - ‘has_collinearity’: bool - ‘recommendations’: list of str - ‘severity’: str (‘ok’, ‘warning’, ‘critical’)

Return type:

dict

Examples

>>> x = np.linspace(0, 1e6, 100)
>>> y = 2.0 * x + 1.0
>>> p0 = [2.0, 1.0]
>>> report = check_problem_stability(x, y, p0)
>>> print(f"Severity: {report['severity']}")
>>> for issue_type, message, severity in report['issues']:
...     print(f"{severity}: {message}")
nlsq.clear_all_caches()[source]

Clear all global caches.

nlsq.clear_compilation_cache()[source]

Clear the global compilation cache, function hash cache, and reset stats.

This function is useful in interactive environments (Jupyter notebooks, IPython, REPL) where model functions may be redefined during development. When a function is redefined with the same name but different code, the compilation cache may return stale compiled versions. Calling this function clears the cache and allows fresh compilation of redefined functions.

The function also resets all statistics counters (hits, misses, compilations) to zero, enabling accurate cache performance tracking after the clear.

Examples

In a Jupyter notebook, after redefining a model function:

>>> import jax.numpy as jnp
>>> from nlsq.caching.compilation_cache import clear_compilation_cache
>>> from nlsq import curve_fit
>>>
>>> # Initial model definition
>>> def model(x, a, b):
...     return a * jnp.exp(-b * x)
>>>
>>> # ... some fitting work ...
>>>
>>> # Redefine the model with different implementation
>>> def model(x, a, b):
...     return a * jnp.exp(-b * x) + 0.1  # Added offset
>>>
>>> # Clear cache to ensure new model is compiled
>>> clear_compilation_cache()
>>>
>>> # Now curve_fit will use the updated model
>>> popt, pcov = curve_fit(model, xdata, ydata, p0=[1.0, 0.1])
nlsq.clear_global_pool()[source]

Clear the global memory pool.

Notes

For test isolation, this resets the global pool to None, forcing fresh initialization on next access.

nlsq.clear_memory_pool()[source]

Clear the global memory pool.

nlsq.clear_profiling_data()[source]

Clear all global profiling data.

nlsq.configure_for_large_datasets(memory_limit_gb=8.0, enable_chunking=True, progress_reporting=True)[source]

Configure NLSQ for optimal large dataset performance.

This function sets up memory management, chunking, streaming, and other settings for handling large datasets efficiently.

Parameters:
  • memory_limit_gb (float, optional) – Maximum memory to use in GB (default: 8.0)

  • enable_chunking (bool, optional) – Enable automatic data chunking (default: True)

  • progress_reporting (bool, optional) – Enable progress reporting for long operations (default: True)

Notes

All large datasets use streaming optimization for 100% data utilization.

Examples

>>> from nlsq.config import configure_for_large_datasets
>>> # Configure for large datasets with 16GB memory limit
>>> configure_for_large_datasets(
...     memory_limit_gb=16.0,
...     progress_reporting=True
... )
nlsq.detect_collinearity(xdata, threshold=0.95)[source]

Detect collinearity in multidimensional input data.

Collinearity occurs when predictors are highly correlated, leading to unstable parameter estimates.

Parameters:
  • xdata (array_like) – Independent variable data (multidimensional)

  • threshold (float, optional) – Correlation threshold for detecting collinearity. Default: 0.95

Returns:

  • has_collinearity (bool) – True if any pair of variables is highly correlated

  • collinear_pairs (list of tuple) – List of (i, j, correlation) for collinear variable pairs

Return type:

tuple[bool, list]

Examples

>>> x1 = np.linspace(0, 10, 100)
>>> x2 = 2 * x1 + 0.01 * np.random.randn(100)  # Nearly collinear
>>> xdata = np.column_stack([x1, x2])
>>> has_coll, pairs = detect_collinearity(xdata)
>>> print(f"Collinear: {has_coll}")
Collinear: True
nlsq.detect_jacobian_sparsity(func, x0, xdata_sample, threshold=0.01)[source]

Detect and analyze Jacobian sparsity for a given problem.

Parameters:
  • func (Callable) – Objective function

  • x0 (np.ndarray) – Initial parameters

  • xdata_sample (np.ndarray or list) – Sample of x data. Can be a single array for 1D problems, a list of arrays [X, Y] for 2D problems, or a 2D array with shape (k, N) for multi-dimensional coordinates.

  • threshold (float) – Threshold for zero elements

Returns:

  • sparsity (float) – Fraction of zero elements

  • info (dict) – Additional sparsity information

Return type:

tuple[float, dict]

nlsq.detect_parameter_scale_mismatch(p0, threshold=1000000.0)[source]

Detect if parameter scales differ by too many orders of magnitude.

Large scale differences can cause numerical issues and slow convergence.

Parameters:
  • p0 (array_like) – Initial parameter guess

  • threshold (float, optional) – Ratio threshold for detecting mismatch. Default: 1e6

Returns:

  • has_mismatch (bool) – True if parameter scales differ by more than threshold

  • scale_ratio (float) – Ratio of largest to smallest parameter magnitude

Return type:

tuple[bool, float]

Examples

>>> p0 = np.array([1e-3, 1e3, 1.0])
>>> has_mismatch, ratio = detect_parameter_scale_mismatch(p0)
>>> print(f"Mismatch: {has_mismatch}, Ratio: {ratio:.2e}")
Mismatch: True, Ratio: 1.00e+06
nlsq.estimate_condition_number(xdata)[source]

Estimate the condition number of the data matrix.

This checks if the independent variable data is well-conditioned for least squares fitting. High condition numbers indicate numerical instability.

Parameters:

xdata (array_like) – Independent variable data (can be 1D or 2D)

Returns:

condition_number – Estimated condition number. Values > 1e10 indicate potential problems.

Return type:

float

Notes

For 1D data, constructs a Vandermonde-like matrix with [1, x, x^2]. For multidimensional data, computes the condition number directly.

nlsq.estimate_memory_requirements(n_points, n_params)[source]

Estimate memory requirements for a dataset.

Parameters:
  • n_points (int) – Number of data points

  • n_params (int) – Number of parameters

Returns:

Memory requirements and processing recommendations

Return type:

DatasetStats

Examples

>>> from nlsq.streaming.large_dataset import estimate_memory_requirements
>>>
>>> # Estimate requirements for 50M points, 3 parameters
>>> stats = estimate_memory_requirements(50_000_000, 3)
>>> print(f"Estimated memory: {stats.total_memory_estimate_gb:.2f} GB")
>>> print(f"Recommended chunk size: {stats.recommended_chunk_size:,}")
>>> print(f"Number of chunks: {stats.n_chunks}")
nlsq.fit_large_dataset(f, xdata, ydata, p0=None, memory_limit_gb=8.0, show_progress=False, logger=None, multistart=False, n_starts=10, sampler='lhs', **kwargs)[source]

Convenience function for fitting large datasets.

Parameters:
  • f (callable) – The model function f(x, *params) -> y

  • xdata (np.ndarray) – Independent variable data

  • ydata (np.ndarray) – Dependent variable data

  • p0 (array-like, optional) – Initial parameter guess

  • memory_limit_gb (float, optional) – Memory limit in GB (default: 8.0)

  • show_progress (bool, optional) – Whether to show progress (default: False)

  • logger (logging.Logger, optional) – External logger for application integration (default: None)

  • multistart (bool, optional) – Enable multi-start optimization for global search (default: False). When enabled, explores multiple starting points on full data before running the full chunked optimization.

  • n_starts (int, optional) – Number of starting points for multi-start optimization (default: 10). Set to 0 to disable multi-start even when multistart=True.

  • sampler (str, optional) – Sampling strategy for generating starting points (default: ‘lhs’). Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.

  • **kwargs – Additional arguments passed to curve_fit

Returns:

Optimization result

Return type:

OptimizeResult

Examples

>>> from nlsq.streaming.large_dataset import fit_large_dataset
>>> import numpy as np
>>> import jax.numpy as jnp
>>>
>>> # Generate large dataset
>>> x_large = np.linspace(0, 10, 5_000_000)
>>> y_large = 2.5 * np.exp(-1.3 * x_large) + np.random.normal(0, 0.1, len(x_large))
>>>
>>> # Fit with automatic memory management
>>> result = fit_large_dataset(
...     lambda x, a, b: a * jnp.exp(-b * x),
...     x_large, y_large,
...     p0=[2.0, 1.0],
...     memory_limit_gb=4.0,
...     show_progress=True
... )
>>> print(f"Fitted parameters: {result.popt}")
>>> print(f"Success rate: {result.success_rate:.1%}")
>>>
>>> # Fit with multi-start optimization
>>> result = fit_large_dataset(
...     lambda x, a, b: a * jnp.exp(-b * x),
...     x_large, y_large,
...     p0=[2.0, 1.0],
...     bounds=([0, 0], [10, 5]),
...     multistart=True,
...     n_starts=10,
...     sampler='lhs'
... )
>>>
>>> # Check failure diagnostics if some chunks failed
>>> if result.failure_summary['total_failures'] > 0:
...     print(f"Failed chunks: {result.failure_summary['failed_chunk_indices']}")
...     print(f"Common errors: {result.failure_summary['common_errors']}")
nlsq.get_global_cache()[source]

Get global cache instance.

Returns:

cache – Global cache instance

Return type:

SmartCache

nlsq.get_global_compilation_cache()[source]

Get or create global compilation cache (thread-safe).

Uses double-checked locking to avoid acquiring the lock on every call once the singleton has been initialized.

Returns:

cache – Global compilation cache instance

Return type:

CompilationCache

nlsq.get_global_pool(enable_stats=False)[source]

Get or create global memory pool.

Parameters:

enable_stats (bool) – Enable statistics tracking

Returns:

pool – Global memory pool instance

Return type:

MemoryPool

nlsq.get_global_profiler()[source]

Get the global profiler instance.

Returns:

profiler – Global profiler

Return type:

PerformanceProfiler

nlsq.get_jit_cache()[source]

Get JIT compilation cache.

Returns:

cache – JIT compilation cache

Return type:

JITCompilationCache

nlsq.get_large_dataset_config()[source]

Get the current large dataset configuration.

Returns:

Current large dataset configuration

Return type:

LargeDatasetConfig

nlsq.get_memory_config()[source]

Get the current memory configuration.

Returns:

Current memory configuration

Return type:

MemoryConfig

nlsq.get_memory_manager()[source]

Get or create global memory manager instance.

Returns:

manager – Global memory manager instance

Return type:

MemoryManager

nlsq.get_memory_stats()[source]

Get memory usage statistics.

Returns:

stats – Memory statistics

Return type:

dict

nlsq.infer_bounds(xdata, ydata, p0, safety_factor=10.0, enforce_positivity=True)[source]

Infer reasonable parameter bounds from data and initial guess.

This is a convenience function that creates a BoundsInference instance and returns the inferred bounds.

Parameters:
  • xdata (array_like) – Independent variable data

  • ydata (array_like) – Dependent variable data

  • p0 (array_like) – Initial parameter guess

  • safety_factor (float, optional) – Multiplier for bound ranges (larger = more conservative). Default: 10.0

  • enforce_positivity (bool, optional) – Force all bounds to be non-negative if p0 is positive. Default: True

Returns:

  • lower_bounds (ndarray) – Lower bounds for each parameter

  • upper_bounds (ndarray) – Upper bounds for each parameter

Return type:

tuple[ndarray, ndarray]

Examples

Basic usage:

>>> import numpy as np
>>> x = np.linspace(0, 10, 100)
>>> y = 2.5 * np.exp(-0.5 * x) + 1.0
>>> p0 = [2.0, 0.5, 1.0]
>>> bounds = infer_bounds(x, y, p0)
>>> print(f"Lower: {bounds[0]}")
>>> print(f"Upper: {bounds[1]}")

With custom safety factor:

>>> bounds = infer_bounds(x, y, p0, safety_factor=20.0)  # More conservative

Allow negative parameters:

>>> bounds = infer_bounds(x, y, p0, enforce_positivity=False)
nlsq.large_dataset_context(large_dataset_config)[source]

Context manager for temporarily changing large dataset configuration.

Parameters:

large_dataset_config (LargeDatasetConfig) – Temporary large dataset configuration

Examples

>>> from nlsq.config import large_dataset_context, LargeDatasetConfig
>>> temp_config = LargeDatasetConfig(enable_automatic_solver_selection=True)
>>> with large_dataset_context(temp_config):
...     # Code here uses automatic solver selection
...     result = fit_large_dataset(func, x, y)
nlsq.memory_context(memory_config)[source]

Context manager for temporarily changing memory configuration.

Parameters:

memory_config (MemoryConfig) – Temporary memory configuration

Examples

>>> from nlsq.config import memory_context, MemoryConfig
>>> temp_config = MemoryConfig(memory_limit_gb=16.0)
>>> with memory_context(temp_config):
...     # Code here runs with increased memory limit
...     result = fit_large_dataset(func, x, y)
>>> # Back to previous memory settings
nlsq.merge_bounds(inferred_bounds, user_bounds=None)[source]

Merge user-provided bounds with inferred bounds.

User-provided bounds take precedence. If user provides partial bounds (e.g., only lower or only for some parameters), the remaining bounds are filled from inferred bounds.

Parameters:
  • inferred_bounds (tuple of ndarray) – Inferred (lower, upper) bounds

  • user_bounds (tuple of array_like or None, optional) – User-provided (lower, upper) bounds. Can contain -np.inf or np.inf for unbounded parameters.

Returns:

  • lower_bounds (ndarray) – Merged lower bounds

  • upper_bounds (ndarray) – Merged upper bounds

Return type:

tuple[ndarray, ndarray]

Examples

>>> inferred = (np.array([0, 0, 0]), np.array([10, 5, 10]))
>>> user = (np.array([1, -np.inf, 0]), np.array([5, np.inf, 10]))
>>> merged = merge_bounds(inferred, user)
>>> print(merged)
(array([1., 0., 0.]), array([5., 5., 10.]))

Notes

  • If user provides -np.inf for lower bound, use inferred lower bound

  • If user provides np.inf for upper bound, use inferred upper bound

  • Scalar user bounds are broadcast to all parameters

nlsq.set_memory_limits(memory_limit_gb, gpu_memory_fraction=None, safety_factor=0.8)[source]

Set memory limits for NLSQ operations.

Parameters:
  • memory_limit_gb (float) – Maximum memory to use in GB

  • gpu_memory_fraction (float, optional) – Fraction of GPU memory to use (0.0-1.0)

  • safety_factor (float, optional) – Safety factor for memory calculations (default: 0.8)

Examples

>>> from nlsq.config import set_memory_limits
>>> # Set 16GB memory limit with 80% GPU memory usage
>>> set_memory_limits(16.0, gpu_memory_fraction=0.8)