nlsq package¶
Package Overview¶
NLSQ (Nonlinear Least Squares) is a JAX-powered library that provides GPU/TPU-accelerated curve fitting
with automatic differentiation. It offers a drop-in replacement for SciPy’s curve_fit function with
significant performance improvements on modern hardware (150-270x speedup on GPUs).
Key Features:
GPU/TPU Acceleration: JAX JIT compilation to XLA for massive speedups
Automatic Differentiation: No manual Jacobian calculations required
Large Dataset Support: Automatic chunking and memory management for 100M+ points
Production-Ready: 3389 tests, 100% pass rate
Drop-in Compatibility: Minimal code changes from
scipy.optimize.curve_fit
Quick Start Example¶
Basic exponential fit:
import numpy as np
from nlsq import curve_fit
import jax.numpy as jnp
# Define model function
def exponential(x, a, b):
return a * jnp.exp(-b * x)
# Generate data
x = np.linspace(0, 5, 1000)
y = 2.5 * np.exp(-1.3 * x) + 0.01 * np.random.randn(1000)
# Fit (GPU-accelerated!)
popt, pcov = curve_fit(exponential, x, y, p0=[2, 1])
print(f"Fitted parameters: a={popt[0]:.2f}, b={popt[1]:.2f}")
Large dataset with automatic chunking:
from nlsq import curve_fit_large
# 50 million points - automatically chunked
x_large = np.linspace(0, 10, 50_000_000)
y_large = 2.0 * np.exp(-0.5 * x_large) + 0.3
popt, pcov = curve_fit_large(
exponential,
x_large,
y_large,
p0=[2.5, 0.6],
memory_limit_gb=4.0,
show_progress=True
)
See Also¶
Your First Curve Fit - Getting started tutorial
Large Dataset Tutorial - Large dataset handling guide
Performance Optimization Guide - Performance optimization
Migration Guide - Migration Guide
Submodules¶
See NLSQ API Reference for complete documentation of all submodules.
Primary Entry Points¶
- nlsq.fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, workflow=None, preset=None, multistart=None, n_starts=None, sampler='lhs', center_on_p0=True, scale_factor=1.0, memory_limit_gb=None, size_threshold=1000000, show_progress=False, chunk_size=None, **kwargs)[source]¶
Unified curve fitting function with preset-based configuration.
This function provides a simplified API for curve fitting with sensible defaults based on preset configurations. It automatically selects the appropriate backend (curve_fit, curve_fit_large, or streaming) based on the preset and dataset characteristics.
- Parameters:
f (callable) – Model function f(x, *params) -> y. Must use jax.numpy operations.
xdata (array_like) – Independent variable data.
ydata (array_like) – Dependent variable data.
p0 (array_like, optional) – Initial parameter guess.
sigma (array_like, optional) – Uncertainties in ydata for weighted fitting.
absolute_sigma (bool, optional) – Whether sigma represents absolute uncertainties.
check_finite (bool, optional) – Check for finite input values.
bounds (tuple, optional) – Parameter bounds as (lower, upper).
method (str, optional) – Optimization algorithm (‘trf’, ‘lm’, or None for auto).
preset ({'fast', 'robust', 'global', 'streaming', 'large'}, optional) –
Preset configuration to use:
’fast’: Single-start optimization for maximum speed (n_starts=0)
’robust’: Multi-start with 5 starts for robustness
’global’: Thorough global search with 20 starts
’streaming’: Streaming optimization for large datasets with multi-start
’large’: Auto-detect dataset size and use appropriate strategy
If None, defaults to ‘fast’ for small datasets or ‘large’ for datasets exceeding size_threshold.
multistart (bool, optional) – Override preset’s multi-start setting.
n_starts (int, optional) – Override preset’s n_starts setting.
sampler ({'lhs', 'sobol', 'halton'}, optional) – Sampling strategy for multi-start. Default: ‘lhs’.
center_on_p0 (bool, optional) – Center multi-start samples around p0. Default: True.
scale_factor (float, optional) – Scale factor for exploration region. Default: 1.0.
memory_limit_gb (float, optional) – Maximum memory usage in GB for large datasets.
size_threshold (int, optional) – Point threshold for large dataset processing (default: 1M).
show_progress (bool, optional) – Display progress bar for long operations.
chunk_size (int, optional) – Override automatic chunk size calculation.
**kwargs (Any) – Additional optimization parameters (ftol, xtol, gtol, max_nfev, loss).
- Returns:
result – Optimization result. Contains popt, pcov, and multistart_diagnostics. Supports tuple unpacking: popt, pcov = fit(…)
- Return type:
CurveFitResult or tuple
Examples
Basic usage with default preset:
>>> popt, pcov = fit(model_func, xdata, ydata, p0=[1, 2, 3])
Using ‘robust’ preset for multi-start:
>>> result = fit(model_func, xdata, ydata, p0=[1, 2, 3], ... bounds=([0, 0, 0], [10, 10, 10]), preset='robust')
Using ‘global’ preset for thorough search:
>>> result = fit(model_func, xdata, ydata, p0=[1, 2, 3], ... bounds=([0, 0, 0], [10, 10, 10]), preset='global')
Large dataset with auto-detection:
>>> result = fit(model_func, big_xdata, big_ydata, ... preset='large', show_progress=True)
See also
curve_fitLower-level API with full control
curve_fit_largeSpecialized API for large datasets
MemoryBudgetSelectorMemory-based optimizer selection
- nlsq.curve_fit(f, xdata, ydata, *args, auto_bounds=False, bounds_safety_factor=10.0, stability=False, rescale_data=True, max_jacobian_elements_for_svd=10000000, fallback=False, max_fallback_attempts=10, fallback_verbose=False, multistart=False, n_starts=10, global_search=False, sampler='lhs', center_on_p0=True, scale_factor=1.0, method=None, cmaes_config=None, compute_diagnostics=False, diagnostics_level=DiagnosticLevel.BASIC, diagnostics_config=None, **kwargs)[source]¶
Use nonlinear least squares to fit a function to data with GPU/TPU acceleration.
This is the main user-facing function that provides a drop-in replacement for scipy.optimize.curve_fit with GPU/TPU acceleration via JAX. The function automatically handles JAX JIT compilation, double precision configuration, and optimization algorithm selection.
- Parameters:
f (callable) – The model function f(x, *popt) -> y. Must be JAX-compatible, meaning it should use jax.numpy instead of numpy for mathematical operations to enable GPU acceleration and automatic differentiation.
xdata (array_like) – The independent variable where the data is measured.
ydata (array_like) – The dependent data, nominally
f(xdata, *popt).auto_bounds (bool, optional) –
Enable automatic parameter bounds inference from data characteristics. When True, reasonable bounds are inferred based on:
Data ranges (x and y)
Initial parameter guess (p0)
Parameter positivity constraints
Safety factors to avoid over-constraining
The inferred bounds are merged with any user-provided bounds via the
boundsparameter. User bounds take precedence where specified. Default: False.bounds_safety_factor (float, optional) – Safety multiplier for automatic bounds (larger = more conservative). Only used when auto_bounds=True. Default: 10.0.
stability ({'auto', 'check', False}, optional) –
Control numerical stability checks and automatic fixes:
’auto’: Check for stability issues and automatically apply fixes (optionally rescale data, normalize parameters, handle NaN/Inf)
’check’: Check for stability issues and warn, but don’t apply fixes
False: Skip stability checks entirely (default)
When ‘auto’, detected issues are fixed before optimization:
Ill-conditioned data (condition number > 1e10) is rescaled to [0, 1] (only if rescale_data=True)
Large data ranges (> 1e4) are normalized (only if rescale_data=True)
NaN/Inf values are replaced with mean
Parameter scale mismatches (ratio > 1e6) are normalized
Default: False.
rescale_data (bool, optional) – When stability=’auto’, controls whether data is automatically rescaled to [0, 1] for ill-conditioned or large-range data. Set to False for applications where data must maintain physical units (e.g., time in seconds, frequency in Hz). NaN/Inf handling and parameter normalization are still applied when stability=’auto’. Default: True.
max_jacobian_elements_for_svd (int, optional) –
Maximum number of elements in the Jacobian matrix (m x n) for which SVD will be computed during stability checks. For larger Jacobians, SVD is skipped to avoid O(min(m,n)^2 x max(m,n)) computation overhead. NaN/Inf checking is still performed for large Jacobians.
Examples of element counts: - 1M data points x 7 params = 7M elements - 100K data points x 100 params = 10M elements
Set to a larger value if you need condition number checks for large problems, or a smaller value to skip SVD more aggressively. Default: 10,000,000 (10M elements).
fallback (bool, optional) –
Enable automatic fallback strategies for difficult optimization problems. When True, the optimizer will automatically try alternative approaches if the initial optimization fails, including:
Alternative optimization methods
Perturbed initial guesses
Relaxed tolerances
Inferred parameter bounds
Robust loss functions
Problem rescaling
Default: False. Enabling this improves success rate on difficult problems but adds overhead when optimizations fail.
max_fallback_attempts (int, optional) – Maximum number of fallback attempts to try before giving up. Only used when fallback=True. Default: 10.
fallback_verbose (bool, optional) – Print detailed information about fallback attempts. Only used when fallback=True. Default: False.
multistart (bool, optional) – Enable multi-start optimization for global search. When True, generates multiple starting points using Latin Hypercube Sampling (or other samplers) and evaluates each, selecting the best result. This helps find global optima in problems with multiple local minima. Default: False.
n_starts (int, optional) – Number of starting points for multi-start optimization. Only used when multistart=True or global_search=True. Default: 10.
global_search (bool, optional) – Shorthand for enabling multi-start with n_starts=20. Equivalent to multistart=True, n_starts=20. Useful for thorough global search. Default: False.
sampler ({'lhs', 'sobol', 'halton'}, optional) –
Sampling strategy for generating starting points in multi-start:
’lhs’: Latin Hypercube Sampling (stratified random, default)
’sobol’: Sobol quasi-random sequence (deterministic, low-discrepancy)
’halton’: Halton quasi-random sequence (deterministic, prime bases)
Only used when multistart=True or global_search=True. Default: ‘lhs’.
center_on_p0 (bool, optional) – When True, center multi-start samples around the initial guess p0 rather than uniformly across the full parameter bounds. This provides more focused exploration around a data-informed starting region. Only used when multistart=True or global_search=True. Default: True.
scale_factor (float, optional) – Scale factor for the exploration region when center_on_p0=True. Multiplier for the exploration range around p0. Smaller values (0.5) mean tighter exploration, larger values (2.0) mean wider exploration. Only used when multistart=True or global_search=True. Default: 1.0.
method ({'auto', 'cmaes', 'multi-start', 'trf'} | None, optional) –
Optimization method to use:
’auto’: Automatically select based on parameter scale ratio. Uses CMA-ES for multi-scale problems (>1000x scale difference) when evosax is installed, otherwise multi-start.
’cmaes’: Use CMA-ES (Covariance Matrix Adaptation Evolution Strategy) for gradient-free global optimization. Requires bounds. Best for multi-scale parameters spanning many orders of magnitude. Falls back to multi-start if evosax is not installed.
’multi-start’: Use multi-start optimization with Latin Hypercube Sampling for initial points.
’trf’: Use Trust Region Reflective (default behavior).
None: Default behavior (TRF with optional multi-start if enabled).
Default: None.
cmaes_config (CMAESConfig | None, optional) – Configuration for CMA-ES optimization. If None, uses default config or preset specified by method parameter. See CMAESConfig for options including max_generations, restart_strategy, and popsize. Only used when method=’cmaes’ or method=’auto’ selects CMA-ES. Default: None.
*args (Any) – Additional arguments passed to CurveFit.curve_fit method.
**kwargs (Any) – Additional arguments passed to CurveFit.curve_fit method.
- Returns:
popt (ndarray) – Optimal values for the parameters.
pcov (ndarray) – The estimated covariance of popt.
When fallback=True, the returned object also contains
- fallback_strategy_used (str or None) – Name of the fallback strategy that succeeded, or None if original succeeded
- fallback_attempts (int) – Number of optimization attempts before success
When multistart=True or global_search=True, the returned object contains
- multistart_diagnostics (dict) – Dictionary with multi-start exploration details including n_starts, best_loss, all_losses, exploration time, etc.
- Return type:
tuple[np.ndarray, np.ndarray] | CurveFitResult
Notes
This function creates a CurveFit instance internally and calls its curve_fit method. For multiple fits with the same function signature, consider creating a CurveFit instance directly to benefit from JAX compilation caching.
When fallback=True, the optimizer tries increasingly aggressive recovery strategies if the initial optimization fails. This is particularly useful for:
Poor initial parameter guesses
Ill-conditioned problems
Problems with outliers
Numerically challenging models
When multistart=True or global_search=True, the optimizer explores multiple starting points to find the global optimum. This is particularly useful for:
Problems with multiple local minima
Complex multi-modal objective functions
Cases where the initial guess may not be close to the global optimum
See also
CurveFit.curve_fitThe underlying method with full parameter documentation
fitUnified entry point with automatic workflow selection
curve_fit_largeFor datasets with millions of points requiring special handling
FallbackOrchestratorDirect access to fallback system for custom configurations
MultiStartOrchestratorDirect access to multi-start system
Examples
Basic usage without fallback:
>>> import jax.numpy as jnp >>> import numpy as np >>> >>> def exponential(x, a, b): ... return a * jnp.exp(-b * x) >>> >>> x = np.linspace(0, 4, 50) >>> y = 2.5 * np.exp(-1.3 * x) + 0.1 * np.random.normal(size=len(x)) >>> popt, _pcov = curve_fit(exponential, x, y, p0=[2, 1])
Using multi-start for global search:
>>> # Enable multi-start with default n_starts=10 >>> result = curve_fit(exponential, x, y, p0=[2, 1], ... bounds=([0, 0], [10, 5]), multistart=True) >>> >>> # Use global_search shorthand for thorough exploration (n_starts=20) >>> result = curve_fit(exponential, x, y, p0=[2, 1], ... bounds=([0, 0], [10, 5]), global_search=True) >>> >>> # Customize multi-start with different sampler >>> result = curve_fit(exponential, x, y, p0=[2, 1], ... bounds=([0, 0], [10, 5]), ... multistart=True, n_starts=15, sampler='sobol')
Using fallback for difficult problems:
>>> # Very poor initial guess - may fail without fallback >>> result = curve_fit(exponential, x, y, p0=[100, 50], fallback=True) >>> >>> # Check which strategy was used >>> if result.fallback_strategy_used: ... print(f"Recovered using: {result.fallback_strategy_used}")
Combined multi-start + fallback for maximum robustness:
>>> result = curve_fit( ... exponential, x, y, p0=[2, 1], ... bounds=([0, 0], [10, 5]), ... global_search=True, ... fallback=True ... )
- nlsq.curve_fit_large(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, stability=False, rescale_data=True, max_jacobian_elements_for_svd=10000000, memory_limit_gb=None, auto_size_detection=True, size_threshold=1000000, show_progress=False, chunk_size=None, multistart=False, n_starts=10, global_search=False, sampler='lhs', center_on_p0=True, scale_factor=1.0, **kwargs)[source]¶
Curve fitting with automatic memory management for large datasets.
Automatically selects processing strategy based on dataset size: - Small (< 1M points): Standard curve_fit - Medium (1M - 100M points): Chunked processing - Large (> 100M points): Streaming optimization
- Parameters:
f (callable) – Model function f(x, *params) -> y. Must use jax.numpy operations.
xdata (array_like) – Independent variable data.
ydata (array_like) – Dependent variable data.
p0 (array_like, optional) – Initial parameter guess.
sigma (array_like, optional) – Uncertainties in ydata for weighted fitting.
absolute_sigma (bool, optional) – Whether sigma represents absolute uncertainties.
check_finite (bool, optional) – Check for finite input values.
bounds (tuple, optional) – Parameter bounds as (lower, upper).
method (str, optional) – Optimization algorithm (‘trf’, ‘lm’, or None for auto).
memory_limit_gb (float, optional) – Maximum memory usage in GB.
auto_size_detection (bool, optional) – Auto-detect dataset size for processing strategy.
size_threshold (int, optional) – Point threshold for large dataset processing (default: 1M).
show_progress (bool, optional) – Display progress bar for long operations.
chunk_size (int, optional) – Override automatic chunk size calculation.
multistart (bool, optional) – Enable multi-start optimization for global search. Default: False.
n_starts (int, optional) – Number of starting points for multi-start optimization. Default: 10.
global_search (bool, optional) – Shorthand for multistart=True, n_starts=20. Default: False.
sampler ({'lhs', 'sobol', 'halton'}, optional) – Sampling strategy for multi-start. Default: ‘lhs’.
center_on_p0 (bool, optional) – Center multi-start samples around p0. Default: True.
scale_factor (float, optional) – Scale factor for exploration region. Default: 1.0.
**kwargs (Any) – Additional optimization parameters (ftol, xtol, gtol, max_nfev, loss)
- Returns:
popt (ndarray) – Fitted parameters.
pcov (ndarray) – Parameter covariance matrix.
- Return type:
Notes
All large datasets use streaming optimization for 100% data utilization.
Important: Model Function Requirements for Chunking
When auto_size_detection triggers chunked processing (>1M points), your model function MUST respect the size of xdata. Model output shape must match ydata shape.
INCORRECT - Fixed-size output (causes shape errors):
>>> def bad_model(xdata, a, b): ... # WRONG: Returns fixed-size array regardless of xdata ... t_full = jnp.arange(10_000_000) ... return a * jnp.exp(-b * t_full) # Shape mismatch!
CORRECT - Output matches xdata size:
>>> def good_model(xdata, a, b): ... # CORRECT: Uses xdata as indices ... indices = xdata.astype(jnp.int32) ... return a * jnp.exp(-b * indices)
>>> def direct_model(xdata, a, b): ... # CORRECT: Operates on xdata directly ... return a * jnp.exp(-b * xdata)
Examples
Basic usage:
>>> popt, _pcov = curve_fit_large(model_func, xdata, ydata, p0=[1, 2, 3])
Large dataset with progress bar:
>>> popt, _pcov = curve_fit_large(model_func, big_xdata, big_ydata, ... show_progress=True, memory_limit_gb=8)
With multi-start optimization:
>>> popt, _pcov = curve_fit_large(model_func, xdata, ydata, ... p0=[1, 2, 3], bounds=([0, 0, 0], [10, 10, 10]), ... multistart=True, n_starts=10)
Using external logger for diagnostics:
>>> import logging >>> my_logger = logging.getLogger("myapp") >>> fitter = LargeDatasetFitter(memory_limit_gb=8, logger=my_logger) >>> result = fitter.fit(model_func, xdata, ydata, p0=[1, 2]) >>> # Chunk failures now appear in myapp's logs
Other Module Members¶
NLSQ: JAX-accelerated nonlinear least squares curve fitting.
GPU/TPU-accelerated curve fitting with automatic differentiation. Provides drop-in SciPy compatibility with curve_fit function. Supports large datasets through automatic chunking and streaming optimization.
Key Features¶
Drop-in replacement for scipy.optimize.curve_fit
GPU/TPU acceleration via JAX
Automatic memory management for datasets up to 100M+ points
Streaming optimization for unlimited data
Smart algorithm selection and numerical stability
Unified fit() entry point with automatic workflow selection
Examples
>>> import jax.numpy as jnp
>>> from nlsq import curve_fit, fit
>>> def model(x, a, b): return a * jnp.exp(-b * x)
>>> popt, pcov = curve_fit(model, xdata, ydata)
>>> # Or use unified fit() with automatic workflow selection:
>>> popt, pcov = fit(model, xdata, ydata, workflow="auto", goal="quality")
- class nlsq.AdaptiveHybridStreamingOptimizer(config=None)[source]¶
Bases:
objectAdaptive hybrid streaming optimizer with four-phase optimization.
This optimizer combines parameter normalization, L-BFGS warmup, streaming Gauss-Newton, and exact covariance computation to provide:
Fast convergence for parameters with different scales
Accurate uncertainty estimates on large datasets
Memory-efficient streaming for unlimited dataset sizes
Production-ready fault tolerance
The optimization proceeds through four phases:
Phase 0: Setup parameter normalization and bounds transformation
Phase 1: L-BFGS warmup with adaptive switching to Phase 2
Phase 2: Streaming Gauss-Newton with exact J^T J accumulation
Phase 3: Denormalize parameters and transform covariance matrix
- Parameters:
config (HybridStreamingConfig, optional) – Configuration for all phases of optimization. If None, uses default configuration. See HybridStreamingConfig for details.
- config¶
Configuration object controlling all phases
- Type:
- normalizer¶
Parameter normalizer instance (created in Phase 0)
- Type:
ParameterNormalizer or None
- normalized_model¶
Wrapped model function operating in normalized space
- Type:
NormalizedModelWrapper or None
Examples
Basic usage with default configuration:
>>> from nlsq import AdaptiveHybridStreamingOptimizer, HybridStreamingConfig >>> import jax.numpy as jnp >>> config = HybridStreamingConfig() >>> optimizer = AdaptiveHybridStreamingOptimizer(config)
With bounds-based normalization:
>>> config = HybridStreamingConfig( ... normalize=True, ... normalization_strategy='bounds' ... ) >>> optimizer = AdaptiveHybridStreamingOptimizer(config)
With custom warmup settings:
>>> config = HybridStreamingConfig( ... warmup_iterations=300, ... lbfgs_initial_step_size=0.5, ... gauss_newton_tol=1e-10 ... ) >>> optimizer = AdaptiveHybridStreamingOptimizer(config)
See also
HybridStreamingConfigConfiguration for all phases
ParameterNormalizerParameter normalization implementation
curve_fitHigh-level interface with method=’hybrid_streaming’
Notes
Based on Adaptive Hybrid Streaming Optimizer specification:
agent-os/specs/2025-12-18-adaptive-hybrid-streaming-optimizer/spec.md- __init__(config=None)[source]¶
Initialize adaptive hybrid streaming optimizer.
- Parameters:
config (HybridStreamingConfig, optional) – Configuration for all phases. If None, uses default configuration.
- set_residual_weights(weights)[source]¶
Set residual weights for weighted least squares optimization.
This method allows updating weights during optimization, for example when weights need to be recomputed based on current parameter estimates.
- Parameters:
weights (np.ndarray) – Per-group weights of shape (n_groups,). Higher weights give more importance to residuals in that group. The group index for each data point is determined by the first column of x_data.
Notes
- Weights must be positive. The weighted MSE is computed as:
wMSE = sum(w[group_idx] * residuals^2) / sum(w[group_idx])
- clear_cache()[source]¶
Release cached padded arrays to free memory.
Call this after optimization completes or when reusing the optimizer with different data. The cache is automatically invalidated when data identity changes, but explicit clearing frees memory sooner.
- property phase_status: dict[str, Any]¶
Get current phase status and history.
- Returns:
status – Phase status dictionary with keys: - ‘current_phase’: Current phase number - ‘phase_name’: Name of current phase - ‘phase_history’: List of completed phases with timing - ‘total_phases’: Total number of phases (4)
- Return type:
Examples
>>> config = HybridStreamingConfig() >>> optimizer = AdaptiveHybridStreamingOptimizer(config) >>> status = optimizer.phase_status >>> print(status['current_phase']) 0 >>> print(status['phase_name']) Phase 0: Normalization Setup
- class nlsq.AlgorithmSelector[source]¶
Bases:
objectAutomatically select best algorithm based on problem characteristics.
This class analyzes the optimization problem and recommends the best algorithm, loss function, and parameters based on: - Problem size and dimensionality - Data characteristics (outliers, noise, conditioning) - Memory constraints - Convergence requirements
- analyze_problem(f, xdata, ydata, p0=None, bounds=None, memory_limit_gb=None)[source]¶
Analyze problem characteristics.
- Parameters:
- Returns:
analysis – Problem characteristics and statistics
- Return type:
- class nlsq.BoundsInference(xdata, ydata, p0, safety_factor=10.0, enforce_positivity=True)[source]¶
Bases:
objectInfer reasonable parameter bounds from data characteristics.
This class implements heuristics to estimate parameter bounds that help constrain optimization without being overly restrictive. The bounds are based on data ranges, parameter scales, and common patterns.
- Parameters:
xdata (array_like) – Independent variable data
ydata (array_like) – Dependent variable data
p0 (array_like) – Initial parameter guess
safety_factor (float, optional) – Multiplier for bound ranges (larger = more conservative). Default: 10.0
enforce_positivity (bool, optional) – Force all bounds to be non-negative if p0 is positive. Default: True
- x_min, x_max
Range of independent variable
- Type:
- y_min, y_max
Range of dependent variable
- Type:
- x_range, y_range
Span of data
- Type:
Examples
>>> x = np.linspace(0, 10, 100) >>> y = 2.5 * np.exp(-0.5 * x) + 1.0 >>> p0 = [2.0, 0.5, 1.0] >>> >>> inference = BoundsInference(x, y, p0) >>> bounds = inference.infer() >>> print(bounds) ([0.0, 0.0, 0.0], [25.0, 5.0, 10.0])
- class nlsq.CallbackBase[source]¶
Bases:
objectBase class for optimization callbacks.
Subclass this to create custom callbacks. Override the __call__ method to define what happens at each iteration.
Examples
>>> class CustomCallback(CallbackBase): ... def __call__(self, iteration, cost, params, info): ... print(f"Iter {iteration}: cost={cost:.6f}")
- class nlsq.CallbackChain(*callbacks)[source]¶
Bases:
CallbackBaseChain multiple callbacks together.
Calls each callback in sequence. If any callback raises StopOptimization, propagates it to stop the optimization.
- Parameters:
*callbacks (CallbackBase) – Callbacks to chain together
Examples
>>> from nlsq.callbacks import CallbackChain, ProgressBar, EarlyStopping >>> callback = CallbackChain( ... ProgressBar(max_nfev=100), ... EarlyStopping(patience=5) ... ) >>> popt, pcov = curve_fit(f, x, y, callback=callback)
- class nlsq.CompilationCache(enable_stats=True, max_cache_size=512)[source]¶
Bases:
objectCache for JIT-compiled functions with LRU eviction.
Caches compiled versions of functions based on their signature to avoid repeated JIT compilation overhead.
Phase 3 Optimizations (2.2a, 2.1a):
Uses OrderedDict for LRU tracking with move_to_end() on hits
Evicts oldest entry with popitem(last=False) when at capacity
Uses composite key (id(func), id(func.__code__)) to prevent cache poisoning when functions are redefined with same name
Thread Safety¶
All public methods are guarded by a per-instance
threading.Lock. The lock is held only for dict operations and released duringjax.jit()compilation to avoid blocking concurrent threads.- cache¶
OrderedDict mapping function signatures to compiled functions (enables LRU eviction)
- Type:
OrderedDict
- _func_hash_cache¶
Memoization cache for function code hashes using composite key (id(func), id(func.__code__)) for correctness in notebooks
- Type:
- compile(func, static_argnums=(), donate_argnums=())[source]¶
Compile function with JIT and cache result.
- get_or_compile(func, *args, static_argnums=(), **kwargs)[source]¶
Get cached compiled function or compile if not cached.
- Parameters:
- Returns:
compiled_func (callable) – Compiled function
signature (str) – Function signature
- Return type:
- clear()[source]¶
Clear compilation cache, function hash memoization, and reset stats.
This method clears all cached data and resets statistics counters to zero, allowing accurate hit/miss tracking after the clear operation.
- class nlsq.ConvergenceMonitor(window_size=10, sensitivity=1.0)[source]¶
Bases:
objectMonitor convergence patterns and detect problems.
Detects: - Oscillation in parameter or cost values - Stagnation (no progress) - Divergence (increasing cost) - Slow convergence - Numerical instability
- update(cost, params, gradient=None, step_size=None)[source]¶
Update monitor with new iteration data.
- add_iteration(cost, params, gradient=None, step_size=None)[source]¶
Alias for update() method for backward compatibility.
- class nlsq.CurveFit(flength=None, use_dynamic_sizing=False, enable_stability=False, enable_recovery=False, enable_overflow_check=False, cache_config=None, max_jacobian_elements_for_svd=10000000)[source]¶
Bases:
objectMain class for nonlinear least squares curve fitting with JAX acceleration.
This class provides the core curve fitting functionality with JAX JIT compilation, automatic differentiation for Jacobian computation, and multiple optimization algorithms. It handles data preprocessing, optimization algorithm selection, and covariance matrix computation.
The class maintains compiled versions of fitting functions to avoid recompilation overhead when fitting multiple datasets with the same function signature.
- logger¶
Internal logger for debugging and performance monitoring.
- Type:
Logger
- curve_fit : Main fitting method
- create_sigma_transform_funcs : Internal method for sigma transformation setup
- __init__(flength=None, use_dynamic_sizing=False, enable_stability=False, enable_recovery=False, enable_overflow_check=False, cache_config=None, max_jacobian_elements_for_svd=10000000)[source]¶
Initialize CurveFit instance.
- Parameters:
flength (float, optional) – Fixed data length for JAX compilation. Input data is padded to this length to avoid recompilation when fitting datasets of different sizes. If None, no padding is applied and each dataset size triggers recompilation. Ignored when use_dynamic_sizing=True for large datasets.
use_dynamic_sizing (bool, default False) – Enable dynamic sizing to reduce memory usage. When True, padding is only applied when data size is smaller than flength. For large datasets, uses actual size to prevent excessive memory allocation. Default False maintains backward compatibility with fixed padding behavior.
enable_stability (bool, default False) – Enable numerical stability checks and fixes (validation, algorithm selection). Note: This does NOT include overflow checking which adds overhead.
enable_recovery (bool, default False) – Enable automatic recovery from optimization failures.
enable_overflow_check (bool, default False) – Enable overflow/underflow checking in function evaluations. This adds ~30% overhead so it’s separate from other stability features.
cache_config (dict, optional) – Configuration for the unified JIT compilation cache. Supported keys:
'maxsize'(int, default 128) — maximum number of compiled functions to cache;'enable_stats'(bool, default True) — track cache statistics (hits, misses, compile_time_ms);'disk_cache_enabled'(bool, default False) — enable disk caching tier (Phase 2 feature). If None, uses global cache with default settings.max_jacobian_elements_for_svd (int, default 10_000_000) – Maximum Jacobian size (m x n elements) for SVD computation during stability checks. SVD is skipped for larger Jacobians to avoid O(min(m,n)^2 x max(m,n)) overhead. Only applies when enable_stability=True.
Notes
Fixed length compilation trades memory usage for compilation speed: - flength=None: Minimal memory, recompiles for each dataset size - flength=large_value: Higher memory, avoids recompilation - use_dynamic_sizing=True: Balanced approach for mixed dataset sizes
- update_flength(flength)[source]¶
Set the fixed input data length.
- Parameters:
flength (float) – The fixed input data length.
- create_sigma_transform_funcs()[source]¶
Create JIT-compiled sigma transform functions.
This function creates two JIT-compiled functions: sigma_transform1d and sigma_transform2d, which are used to compute the sigma transform for 1D and 2D data, respectively. The functions are stored as attributes of the object on which the method is called.
- pad_fit_data(xdata, ydata, xdims, len_diff)[source]¶
Pad fit data to match the fixed input data length.
This function pads the input data arrays with small values to match the fixed input data length to avoid JAX retracing the JITted functions. The padding is added along the second dimension of the xdata array if it’s multidimensional data otherwise along the first dimension. The small values are chosen to be EPS, a global constant defined as a very small positive value which avoids numerical issues.
- Parameters:
- Returns:
The padded xdata and ydata arrays.
- Return type:
Tuple[np.ndarray, np.ndarray]
- class nlsq.EarlyStopping(patience=10, min_delta=1e-06, verbose=True)[source]¶
Bases:
CallbackBaseStop optimization early if no improvement for patience iterations.
- Parameters:
Examples
>>> from nlsq.callbacks import EarlyStopping >>> callback = EarlyStopping(patience=5, min_delta=1e-4) >>> popt, pcov = curve_fit(f, x, y, callback=callback)
Notes
Raises StopOptimization exception when patience is exceeded, which will be caught by the optimizer and treated as successful convergence.
- class nlsq.FallbackOrchestrator(strategies=None, max_attempts=10, verbose=False)[source]¶
Bases:
objectOrchestrates automatic fallback strategies for robust optimization.
The orchestrator tries multiple recovery strategies when optimization fails, including alternative methods, perturbed initial guesses, relaxed tolerances, and robust loss functions.
- Parameters:
strategies (list of FallbackStrategy, optional) – List of strategies to try. If None, uses default strategies.
max_attempts (int, optional) – Maximum total attempts across all strategies. Default: 10
verbose (bool, optional) – Print progress messages. Default: False
- strategies¶
Active fallback strategies, sorted by priority
- Type:
Examples
>>> from nlsq.stability.fallback import FallbackOrchestrator >>> import numpy as np >>> >>> def model(x, a, b): ... return a * np.exp(-b * x) >>> >>> x = np.linspace(0, 10, 100) >>> y = 2.5 * np.exp(-0.5 * x) + 0.1 * np.random.randn(100) >>> >>> orchestrator = FallbackOrchestrator(verbose=True) >>> result = orchestrator.fit_with_fallback( ... model, x, y, p0=[1, 0.1] # Deliberately poor p0 ... ) >>> >>> if result.fallback_strategy_used: ... print(f"Recovered using: {result.fallback_strategy_used}")
- DEFAULT_STRATEGIES: ClassVar[list[StrategyFactory]] = [<class 'nlsq.stability.fallback.AlternativeMethodStrategy'>, <class 'nlsq.stability.fallback.PerturbInitialGuessStrategy'>, <class 'nlsq.stability.fallback.AdjustTolerancesStrategy'>, <class 'nlsq.stability.fallback.AddParameterBoundsStrategy'>, <class 'nlsq.stability.fallback.UseRobustLossStrategy'>, <class 'nlsq.stability.fallback.RescaleProblemStrategy'>]¶
- __init__(strategies=None, max_attempts=10, verbose=False)[source]¶
Initialize fallback orchestrator.
- fit_with_fallback(f, xdata, ydata, **kwargs)[source]¶
Attempt curve fit with automatic fallback on failure.
- Parameters:
f (callable) – Model function
xdata (array_like) – Independent variable data
ydata (array_like) – Dependent variable data
**kwargs – Additional arguments passed to curve_fit
- Returns:
result – Optimization result with fallback metadata
- Return type:
- Raises:
RuntimeError – If all fallback strategies fail
- class nlsq.FallbackResult(result, strategy_used=None, attempts=1)[source]¶
Bases:
objectEnhanced optimization result with fallback information.
- __init__(result, strategy_used=None, attempts=1)[source]¶
Initialize fallback result.
- Parameters:
result (OptimizeResult) – Underlying optimization result
strategy_used (str, optional) – Name of fallback strategy that succeeded
attempts (int, optional) – Number of attempts before success
- class nlsq.FallbackStrategy(name, description, priority=0)[source]¶
Bases:
objectBase class for fallback strategies.
- class nlsq.HybridStreamingConfig(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)[source]¶
Bases:
objectConfiguration for adaptive hybrid streaming optimizer.
This configuration class controls all aspects of the four-phase hybrid optimizer: - Phase 0: Parameter normalization setup - Phase 1: L-BFGS warmup with adaptive switching - Phase 2: Streaming Gauss-Newton with exact J^T J accumulation - Phase 3: Denormalization and covariance transform
- Parameters:
normalize (bool, default=True) – Enable parameter normalization. When True, parameters are normalized to similar scales to improve gradient signal quality and convergence speed.
normalization_strategy (str, default='auto') –
Strategy for parameter normalization. Options:
’auto’: Use bounds-based if bounds provided, else p0-based
’bounds’: Normalize to [0, 1] using parameter bounds
’p0’: Scale by initial parameter magnitudes
’none’: Identity transform (no normalization)
warmup_iterations (int, default=200) – Number of L-BFGS warmup iterations before checking switch criteria. With L-BFGS, typical values are 20-50 (5-10x fewer than Adam). More iterations allow better initial convergence before switching to Gauss-Newton.
max_warmup_iterations (int, default=500) – Maximum L-BFGS warmup iterations before forced switch to Phase 2. Safety limit to prevent indefinite warmup when loss plateaus slowly.
warmup_learning_rate (float, default=0.001) – Legacy warmup step size retained for backward compatibility. L-BFGS warmup uses
lbfgs_initial_step_sizeand adaptive step sizes.loss_plateau_threshold (float, default=1e-4) – Relative loss improvement threshold for plateau detection. Switch to Phase 2 if: abs(loss - prev_loss) / (abs(prev_loss) + eps) < threshold. Smaller values = stricter plateau detection = later switching.
gradient_norm_threshold (float, default=1e-3) – Gradient norm threshold for early Phase 2 switch. Switch to Phase 2 if: ||gradient|| < threshold. Indicates optimization is close to optimum and Gauss-Newton will be effective.
active_switching_criteria (list, default=['plateau', 'gradient', 'max_iter']) –
List of active switching criteria for Phase 1 -> Phase 2 transition. Available criteria:
’plateau’: Loss plateau detection (loss_plateau_threshold)
’gradient’: Gradient norm below threshold (gradient_norm_threshold)
’max_iter’: Maximum iterations reached (max_warmup_iterations)
Switch occurs when ANY active criterion is met.
lbfgs_history_size (int, default=10) – Number of previous gradients and updates to store for L-BFGS Hessian approximation. Standard default from SciPy, PyTorch, and Nocedal & Wright. Larger values give better Hessian approximation but use more memory.
lbfgs_initial_step_size (float, default=0.1) – Initial step size for L-BFGS during cold start (first m iterations while history buffer fills). Small value prevents overshooting when Hessian approximation is poor (identity matrix initially).
lbfgs_line_search (str, default='wolfe') –
Line search method for L-BFGS step acceptance. Options:
’wolfe’: Standard Wolfe conditions (default)
’strong_wolfe’: Strong Wolfe conditions (stricter)
’backtracking’: Simple backtracking line search
lbfgs_exploration_step_size (float, default=0.1) – L-BFGS initial step size for exploration mode (high relative loss). Small value prevents first “Hessian=Identity” step from overshooting.
lbfgs_refinement_step_size (float, default=1.0) – L-BFGS initial step size for refinement mode (low relative loss). Larger value leverages L-BFGS’s near-Newton convergence speed when close to optimum.
gauss_newton_max_iterations (int, default=100) – Maximum iterations for Phase 2 Gauss-Newton optimization. Typical values: 50-200.
gauss_newton_tol (float, default=1e-8) – Convergence tolerance for Phase 2 (gradient norm threshold). Optimization stops if: ||gradient|| < tol.
trust_region_initial (float, default=1.0) – Initial trust region radius for Gauss-Newton step control. Radius is adapted based on actual vs predicted reduction ratio.
regularization_factor (float, default=1e-10) – Regularization factor for rank-deficient J^T J matrices. Added to diagonal: J^T J + regularization_factor * I.
cg_max_iterations (int, default=100) – Maximum iterations for Conjugate Gradient solver in Phase 2. Used when parameter count exceeds cg_param_threshold. Higher values allow better convergence but more computation.
cg_relative_tolerance (float, default=1e-4) – Relative tolerance for CG solver convergence. Convergence check: ||r|| < cg_relative_tolerance * ||J^T r_0||. Implements Inexact Newton strategy for efficiency.
cg_absolute_tolerance (float, default=1e-10) – Absolute tolerance floor for CG solver convergence. Safety floor to prevent over-iteration on well-conditioned systems.
cg_param_threshold (int, default=2000) –
Parameter count threshold for auto-selecting CG vs materialized solver.
p < threshold: Use materialized J^T J with SVD solve (faster for small p)
p >= threshold: Use CG with implicit matvec (O(p) memory vs O(p^2))
Threshold balances memory savings vs additional data passes for CG.
enable_group_variance_regularization (bool, default=False) –
Enable variance regularization for parameter groups. When enabled, adds a penalty term to the loss function that penalizes variance within specified parameter groups. This is essential for preventing per-group parameter absorption in multi-component fitting.
The regularized loss becomes
L = MSE + group_variance_lambda * sum(Var(group_i))where each group_i is a slice of parameters defined by group_variance_indices.group_variance_lambda (float, default=0.01) – Regularization strength for group variance penalty. Larger values more strongly penalize variance within parameter groups. Use 0.001-0.01 for light regularization (allows moderate group variation), 0.1-1.0 for moderate regularization (constrains groups to be similar), or 10-1000 for strong regularization (forces groups to be nearly uniform). For multi-component fits with per-group parameters, use
lambda ~ 0.1 * n_data / (n_groups * sigma^2)where sigma is the expected experimental variation (~0.05 for 5%).group_variance_indices (list of tuple, default=None) –
List of (start, end) tuples defining parameter groups for variance regularization. Each tuple specifies a slice [start:end] of the parameter vector that should have low internal variance.
Example for 23 independent groups:
group_variance_indices = [(0, 23), (23, 46)]constrains contrast params [0:23] and offset params [23:46] to each have low variance, preventing them from absorbing group-dependent physical signals.If None when enable_group_variance_regularization=True, no groups are regularized (effectively disabling the feature).
chunk_size (int, default=10000) – Size of data chunks for streaming J^T J accumulation. Larger chunks = faster but more memory. Typical: 5000-50000.
gc_chunk_interval (int, default=10) – Chunks between gc.collect() calls (FR-007). Controls how often garbage collection runs during chunked processing. Higher values reduce GC overhead but may increase memory usage. Default of 10 balances memory reclamation with performance.
enable_checkpoints (bool, default=True) – Enable checkpoint save/resume for fault tolerance.
checkpoint_frequency (int, default=100) – Save checkpoint every N iterations (across all phases).
validate_numerics (bool, default=True) – Enable NaN/Inf validation at gradient, parameter, and loss computation points.
enable_multi_device (bool, default=False) – Enable multi-GPU/TPU parallelism for Jacobian computation. Uses JAX pmap for data-parallel computation across devices.
callback_frequency (int, default=10) – Call progress callback every N iterations (if callback provided).
enable_multistart (bool, default=False) – Enable multi-start optimization with tournament selection during Phase 1. When enabled, generates multiple starting points using LHS sampling and uses tournament elimination to select the best candidate for Phase 2.
n_starts (int, default=10) – Number of starting points for multi-start optimization. Only used when enable_multistart=True.
multistart_sampler (str, default='lhs') – Sampling method for generating starting points. Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.
elimination_rounds (int, default=3) – Number of tournament elimination rounds. Each round eliminates elimination_fraction of candidates.
elimination_fraction (float, default=0.5) – Fraction of candidates to eliminate per round. Must be in (0, 1). Default 0.5 = eliminate half each round.
batches_per_round (int, default=50) – Number of data batches to use for evaluation in each tournament round. More batches = more reliable selection but slower.
Examples
Default configuration:
>>> from nlsq import HybridStreamingConfig >>> config = HybridStreamingConfig() >>> config.warmup_iterations 200
Aggressive profile (faster convergence with L-BFGS):
>>> config = HybridStreamingConfig.aggressive() >>> config.warmup_iterations 50
Conservative profile (higher quality):
>>> config = HybridStreamingConfig.conservative() >>> config.gauss_newton_tol < 1e-8 True
Memory-optimized profile:
>>> config = HybridStreamingConfig.memory_optimized() >>> config.chunk_size < 10000 True
Custom configuration:
>>> config = HybridStreamingConfig( ... warmup_iterations=50, ... lbfgs_history_size=15, ... chunk_size=5000, ... )
With multi-start tournament selection:
>>> config = HybridStreamingConfig( ... enable_multistart=True, ... n_starts=20, ... elimination_rounds=3, ... batches_per_round=50, ... )
See also
AdaptiveHybridStreamingOptimizerOptimizer that uses this configuration
curve_fitHigh-level interface with method=’hybrid_streaming’
TournamentSelectorTournament selection for multi-start optimization
Notes
Based on Adaptive Hybrid Streaming Optimizer specification:
agent-os/specs/2025-12-18-adaptive-hybrid-streaming-optimizer/spec.mdL-BFGS replaces Adam for warmup, providing 5-10x faster convergence to the basin of attraction through approximate Hessian information.
- __post_init__()[source]¶
Validate configuration after initialization.
Delegates to specialized validator functions for each configuration group. ConfigValidationError from validators is re-raised as ValueError for backwards compatibility.
- classmethod aggressive()[source]¶
Create aggressive profile: faster convergence with L-BFGS, looser tolerances.
This preset prioritizes speed over robustness: - L-BFGS warmup with reduced iterations (50 vs 300 with Adam) - Higher learning rate for faster progress - Looser tolerances for earlier Phase 2 switching - Larger chunks for better throughput
- Returns:
Configuration with aggressive settings.
- Return type:
Examples
>>> config = HybridStreamingConfig.aggressive() >>> config.warmup_learning_rate 0.003 >>> config.warmup_iterations 50
- classmethod conservative()[source]¶
Create conservative profile: slower but robust, tighter tolerances.
This preset prioritizes solution quality over speed: - L-BFGS warmup with conservative iterations - Lower learning rate for stability - Tighter tolerances for higher quality - More Gauss-Newton iterations
- Returns:
Configuration with conservative settings.
- Return type:
Examples
>>> config = HybridStreamingConfig.conservative() >>> config.gauss_newton_tol 1e-10 >>> config.warmup_iterations 30
- classmethod memory_optimized()[source]¶
Create memory-optimized profile: smaller chunks, efficient settings.
This preset minimizes memory footprint: - Smaller chunks to reduce memory usage - L-BFGS warmup with reduced iterations - Enable checkpoints for recovery (important when memory is tight) - Lower CG threshold for more aggressive CG usage (avoids O(p^2) J^T J)
- Returns:
Configuration with memory-optimized settings.
- Return type:
Examples
>>> config = HybridStreamingConfig.memory_optimized() >>> config.chunk_size 5000 >>> config.warmup_iterations 40 >>> config.cg_param_threshold 1000
- classmethod with_multistart(n_starts=10, **kwargs)[source]¶
Create configuration with multi-start tournament selection enabled.
This preset enables multi-start optimization for finding global optima: - Tournament selection during Phase 1 warmup - LHS sampling for generating starting points - Progressive elimination to select best candidate
- Parameters:
n_starts (int, default=10) – Number of starting points to generate.
**kwargs – Additional configuration parameters to override.
- Returns:
Configuration with multi-start enabled.
- Return type:
Examples
>>> config = HybridStreamingConfig.with_multistart(n_starts=20) >>> config.enable_multistart True >>> config.n_starts 20
- classmethod defense_strict()[source]¶
Create strict defense layer profile for near-optimal scenarios.
This preset maximizes protection against divergence when initial parameters are expected to be close to optimal (warm starts, refinement): - Very low warm start threshold (triggers at 1% relative loss) - Ultra-conservative learning rates for refinement - Very tight cost guard tolerance (5% increase aborts) - Very small step clipping for stability
Use this when: - Continuing optimization from a previous fit - Refining parameters that are already close to optimal - Dealing with ill-conditioned problems - Prioritizing stability over speed
- Returns:
Configuration with strict defense layer settings.
- Return type:
Examples
>>> config = HybridStreamingConfig.defense_strict() >>> config.warm_start_threshold 0.01 >>> config.cost_increase_tolerance 0.05 >>> config.warmup_iterations 25
- classmethod defense_relaxed()[source]¶
Create relaxed defense layer profile for exploration-heavy scenarios.
This preset reduces defense layer sensitivity for problems where significant parameter exploration is needed: - Higher warm start threshold (50% relative loss needed to skip) - More aggressive learning rates for exploration - Generous cost guard tolerance (50% increase allowed) - Larger step clipping for faster exploration
Use this when: - Starting from a rough initial guess - Exploring a wide parameter space - Problems with multiple local minima - Speed is more important than robustness
- Returns:
Configuration with relaxed defense layer settings.
- Return type:
Examples
>>> config = HybridStreamingConfig.defense_relaxed() >>> config.warm_start_threshold 0.5 >>> config.cost_increase_tolerance 0.5 >>> config.warmup_iterations 50
- classmethod defense_disabled()[source]¶
Create profile with all defense layers disabled.
This preset completely disables the 4-layer defense strategy, reverting to pre-0.3.6 behavior. Use with caution as this removes protection against warmup divergence.
Use this when: - Debugging to isolate defense layer effects - Benchmarking without defense overhead - Backward compatibility with older code is required
- Returns:
Configuration with all defense layers disabled.
- Return type:
Examples
>>> config = HybridStreamingConfig.defense_disabled() >>> config.enable_warm_start_detection False
- classmethod scientific_default()[source]¶
Create profile optimized for scientific computing workflows.
This preset is tuned for scientific fitting scenarios like spectroscopy, decay curves, and other physics-based models: - Balanced defense layers that protect without being too aggressive - L-BFGS warmup with moderate iterations - Enabled checkpoints for long-running fits
Use this when: - Fitting physics-based models (spectroscopy, decay curves) - Numerical precision is important - Parameters may have multiple scales - Reproducibility is required
- Returns:
Configuration optimized for scientific computing.
- Return type:
Examples
>>> config = HybridStreamingConfig.scientific_default() >>> config.warmup_iterations 35
- __init__(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)¶
- class nlsq.InputValidator(fast_mode=True)[source]¶
Bases:
objectComprehensive input validation for curve fitting functions.
- __init__(fast_mode=True)[source]¶
Initialize the input validator.
- Parameters:
fast_mode (bool, default True) – If True, skip expensive validation checks for better performance. If False, perform all validation checks.
- validate_curve_fit_inputs(f, xdata, ydata, p0=None, bounds=None, sigma=None, absolute_sigma=True, check_finite=True)[source]¶
Validate inputs for curve_fit function.
This method orchestrates the validation pipeline by calling focused helper methods for each validation step.
- Parameters:
f (callable) – Model function to fit
xdata (array_like) – Independent variable data
ydata (array_like) – Dependent variable data
p0 (array_like, optional) – Initial parameter guess
bounds (tuple, optional) – Parameter bounds
sigma (array_like, optional) – Uncertainties in ydata
absolute_sigma (bool) – Whether sigma is absolute or relative
check_finite (bool) – Whether to check for finite values
- Returns:
errors (list) – List of error messages (empty if no errors)
warnings (list) – List of warning messages
xdata_clean (np.ndarray) – Cleaned and validated xdata
ydata_clean (np.ndarray) – Cleaned and validated ydata
- Return type:
- validate_security_constraints(n_points, n_params, bounds=None, p0=None)[source]¶
Validate security constraints to prevent DoS and numerical issues.
This method combines all security-focused validation checks.
- Parameters:
- Returns:
errors (list) – List of critical error messages
warnings (list) – List of warning messages
- Return type:
- validate_least_squares_inputs(fun, x0, bounds=None, method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, max_nfev=None)[source]¶
Validate inputs for least_squares function.
This method orchestrates the validation pipeline by calling focused helper methods for each validation step.
- Parameters:
fun (callable) – Residual function
x0 (array_like) – Initial parameter guess
bounds (tuple, optional) – Parameter bounds
method (str) – Optimization method
ftol (float) – Function tolerance
xtol (float) – Parameter tolerance
gtol (float) – Gradient tolerance
max_nfev (int, optional) – Maximum function evaluations
- Returns:
errors (list) – List of error messages
warnings (list) – List of warning messages
x0_clean (np.ndarray) – Cleaned initial guess
- Return type:
- class nlsq.IterationLogger(filename=None, mode='w', log_params=False, file=None)[source]¶
Bases:
CallbackBaseLog optimization progress to file or stdout.
- Parameters:
filename (str, optional) – File to log to. If None and file is None, logs to stdout.
mode (str, optional) – File open mode. Default: ‘w’ (overwrite)
log_params (bool, optional) – Whether to log parameter values. Default: False
file (file-like object, optional) – File-like object to write to. If provided, filename is ignored.
Examples
>>> from nlsq.callbacks import IterationLogger >>> callback = IterationLogger("optimization.log") >>> popt, pcov = curve_fit(f, x, y, callback=callback)
- class nlsq.LDMemoryConfig(memory_limit_gb=8.0, safety_factor=0.8, min_chunk_size=1000, max_chunk_size=1000000, use_streaming=True, streaming_batch_size=50000, streaming_max_epochs=10, min_success_rate=0.5, save_diagnostics=False, gc_chunk_interval=10)[source]¶
Bases:
objectConfiguration for memory management in large dataset fitting.
- use_streaming¶
Use adaptive hybrid streaming optimization for unlimited data (default: True)
- Type:
- streaming_max_epochs¶
Maximum Gauss-Newton iterations for adaptive hybrid streaming (default: 10)
- Type:
- min_success_rate¶
Minimum success rate for chunked fitting (default: 0.5) If success rate falls below this threshold, fitting is considered failed
- Type:
- save_diagnostics¶
Whether to compute and save detailed diagnostic statistics (default: False) When False, skips statistical computations for successful chunks (5-10% faster)
- Type:
- gc_chunk_interval¶
Chunks between gc.collect() calls (default: 10, FR-007) Controls how often garbage collection runs during chunked processing. Higher values reduce GC overhead but may increase memory usage.
- Type:
- __init__(memory_limit_gb=8.0, safety_factor=0.8, min_chunk_size=1000, max_chunk_size=1000000, use_streaming=True, streaming_batch_size=50000, streaming_max_epochs=10, min_success_rate=0.5, save_diagnostics=False, gc_chunk_interval=10)¶
- class nlsq.LargeDatasetFitter(memory_limit_gb=8.0, config=None, curve_fit_class=None, logger=None, multistart=False, n_starts=10, sampler='lhs')[source]¶
Bases:
objectLarge dataset curve fitting with automatic memory management and chunking.
This class handles datasets with millions to billions of points that exceed available memory through automatic chunking, progressive parameter refinement, and streaming optimization. It maintains fitting accuracy while preventing memory overflow through dynamic memory monitoring and chunk size optimization.
Core Capabilities¶
Automatic memory estimation based on data size and parameter count
Dynamic chunk size calculation considering available system memory
Sequential parameter refinement across data chunks with convergence tracking
Streaming optimization for unlimited datasets (no accuracy loss)
Real-time progress monitoring with ETA for long-running fits
Full integration with NLSQ optimization algorithms and GPU acceleration
Multi-start optimization for global search (uses full data)
Memory Management Algorithm¶
Estimates total memory requirements from dataset size and parameter count
Calculates optimal chunk sizes considering available memory and safety margins
Monitors actual memory usage during processing to prevent overflow
Uses streaming optimization for extremely large datasets (processes all data)
Processing Strategies¶
Single Pass: For datasets fitting within memory limits
Sequential Chunking: Processes data in optimal-sized chunks with parameter propagation
Streaming Optimization: Mini-batch gradient descent for unlimited datasets (no subsampling)
Multi-Start Optimization¶
For medium-sized datasets (1M-100M points), multi-start optimization explores multiple starting points on full data, and the best starting point is then used for the full chunked optimization.
Performance Characteristics¶
Maintains <1% parameter error for well-conditioned problems using chunking
Achieves 5-50x speedup over naive approaches through memory optimization
Scales to datasets of unlimited size using streaming (processes all data)
Provides linear time complexity with respect to chunk count
Model Validation Caching (Task Group 7 - 5.1a)¶
Model functions are validated once per unique function identity using a cache keyed by (id(func), id(func.__code__)). This avoids redundant validation across chunks, providing 1-5% performance gain in chunked processing.
- param memory_limit_gb:
Maximum memory usage in GB. System memory is auto-detected if None.
- type memory_limit_gb:
float, default 8.0
- param config:
Advanced configuration for fine-tuning memory management behavior.
- type config:
LDMemoryConfig, optional
- param curve_fit_class:
Custom CurveFit instance for specialized fitting requirements.
- type curve_fit_class:
nlsq.minpack.CurveFit, optional
- param multistart:
Enable multi-start optimization for global search.
- type multistart:
bool, default False
- param n_starts:
Number of starting points for multi-start optimization.
- type n_starts:
int, default 10
- param sampler:
Sampling strategy for multi-start: ‘lhs’, ‘sobol’, or ‘halton’.
- type sampler:
str, default ‘lhs’
- config¶
Active memory management configuration
- Type:
- curve_fitter¶
Internal curve fitting engine with JAX acceleration
- Type:
nlsq.minpack.CurveFit
- logger¶
Internal logging for performance monitoring and debugging
- Type:
Logger
- fit : Main fitting method with automatic memory management
- fit_with_progress : Fitting with real-time progress reporting and ETA
- get_memory_recommendations : Pre-fitting memory analysis and strategy recommendations
- Important: Chunking-Compatible Model Functions
- -----------------------------------------------
- When using chunked processing (for datasets > memory limit), your model function
- MUST respect the size of xdata. During chunking, xdata will be a subset of the
- full dataset, and your model must return output matching that subset size.
- \*\*INCORRECT - Model ignores xdata size (will cause shape mismatch errors):**
- >>> def bad_model(xdata, a, b):
- ... # WRONG: Always returns full array, ignoring xdata size
- ... t_full = jnp.arange(10_000_000) # Fixed size!
- ... return a * jnp.exp(-b * t_full) # Shape mismatch during chunking
- \*\*CORRECT - Model respects xdata size:**
- >>> def good_model(xdata, a, b):
- ... # CORRECT: Uses xdata as indices to return only requested subset
- ... indices = xdata.astype(jnp.int32)
- ... return a * jnp.exp(-b * indices) # Shape matches xdata
- \*\*Alternative - Direct computation on xdata:**
- >>> def direct_model(xdata, a, b):
- ... # CORRECT: Operates directly on xdata
- ... return a * jnp.exp(-b * xdata) # Shape automatically matches
Examples
Basic usage with automatic configuration:
>>> import numpy as np >>> import jax.numpy as jnp >>> >>> # 10 million data points >>> x = np.linspace(0, 10, 10_000_000) >>> y = 2.5 * jnp.exp(-1.3 * x) + 0.1 + np.random.normal(0, 0.05, len(x)) >>> >>> fitter = LargeDatasetFitter(memory_limit_gb=4.0) >>> result = fitter.fit( ... lambda x, a, b, c: a * jnp.exp(-b * x) + c, ... x, y, p0=[2, 1, 0] ... ) >>> print(f"Parameters: {result.popt}") >>> print(f"Chunks used: {result.n_chunks}")
Multi-start optimization:
>>> fitter = LargeDatasetFitter( ... memory_limit_gb=4.0, ... multistart=True, ... n_starts=10, ... sampler='lhs', ... ) >>> result = fitter.fit( ... lambda x, a, b, c: a * jnp.exp(-b * x) + c, ... x, y, p0=[2, 1, 0], ... bounds=([0, 0, 0], [10, 5, 10]) ... )
Advanced configuration with progress monitoring:
>>> config = LDMemoryConfig( ... memory_limit_gb=8.0, ... min_chunk_size=10000, ... max_chunk_size=1000000, ... use_streaming=True, ... streaming_batch_size=50000 ... ) >>> fitter = LargeDatasetFitter(config=config) >>> >>> # Fit with progress bar for long-running operation >>> result = fitter.fit_with_progress( ... exponential_model, x_huge, y_huge, p0=[2, 1, 0] ... )
Memory analysis before processing:
>>> recommendations = fitter.get_memory_recommendations(len(x), n_params=3) >>> print(f"Strategy: {recommendations['processing_strategy']}") >>> print(f"Memory estimate: {recommendations['memory_estimate_gb']:.2f} GB") >>> print(f"Recommended chunks: {recommendations['n_chunks']}")
See also
curve_fit_largeHigh-level function with automatic dataset size detection
LDMemoryConfigConfiguration class for memory management parameters
estimate_memory_requirementsStandalone function for memory estimation
Notes
The sequential chunking algorithm maintains parameter accuracy by using each chunk’s result as the initial guess for the next chunk. This approach typically maintains fitting accuracy within 0.1% of single-pass results for well-conditioned problems while enabling processing of arbitrarily large datasets.
For extremely large datasets, streaming optimization processes all data using mini-batch gradient descent with no subsampling, ensuring zero accuracy loss compared to subsampling approaches (removed in v0.2.0).
- __init__(memory_limit_gb=8.0, config=None, curve_fit_class=None, logger=None, multistart=False, n_starts=10, sampler='lhs')[source]¶
Initialize LargeDatasetFitter.
- Parameters:
memory_limit_gb (float, optional) – Memory limit in GB (default: 8.0)
config (LDMemoryConfig, optional) – Custom memory configuration
curve_fit_class (nlsq.minpack.CurveFit, optional) – Custom CurveFit instance to use
logger (logging.Logger, optional) – External logger instance for integration with application logging. If None, uses NLSQ’s internal logger. This allows chunk failure warnings to appear in your application’s logs.
multistart (bool, optional) – Enable multi-start optimization for global search (default: False). When enabled, explores multiple starting points on full data before running the full chunked optimization.
n_starts (int, optional) – Number of starting points for multi-start optimization (default: 10). Set to 0 to disable multi-start even when multistart=True.
sampler (str, optional) – Sampling strategy for generating starting points (default: ‘lhs’). Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.
- estimate_requirements(n_points, n_params)[source]¶
Estimate memory requirements and processing strategy.
- fit_with_progress(f, xdata, ydata, p0=None, bounds=(-inf, inf), method='trf', solver='auto', **kwargs)[source]¶
Fit curve with progress reporting for long-running fits.
- Parameters:
f (callable) – The model function f(x, *params) -> y
xdata (np.ndarray) – Independent variable data
ydata (np.ndarray) – Dependent variable data
p0 (array-like, optional) – Initial parameter guess
bounds (tuple, optional) – Parameter bounds (lower, upper)
method (str, optional) – Optimization method (default: ‘trf’)
solver (str, optional) – Solver type (default: ‘auto’)
**kwargs – Additional arguments passed to curve_fit
- Returns:
Optimization result with fitted parameters and statistics
- Return type:
- class nlsq.LeastSquares(enable_stability=False, enable_diagnostics=False, max_jacobian_elements_for_svd=10000000)[source]¶
Bases:
objectCore least squares optimization engine with JAX acceleration.
This class implements the main optimization algorithms for nonlinear least squares problems, including Trust Region Reflective (TRF) and Levenberg-Marquardt (LM). It handles automatic differentiation, bound constraints, loss functions, and uncertainty propagation.
The class maintains separate automatic differentiation instances for different sigma configurations (no sigma, 1D sigma, 2D covariance matrix) to optimize compilation and execution performance.
- trf¶
Trust Region Reflective algorithm implementation
- Type:
TrustRegionReflective
- ls¶
JIT-compiled loss function implementations
- Type:
LossFunctionsJIT
- logger¶
Internal logger for debugging and performance tracking
- Type:
Logger
- f¶
Current objective function being optimized
- Type:
callable
- jac¶
Current Jacobian function (None for automatic differentiation)
- Type:
callable or None
- adjn¶
Automatic differentiation instance for unweighted problems
- Type:
AutoDiffJacobian
- adj1d¶
Automatic differentiation instance for 1D sigma weighting
- Type:
AutoDiffJacobian
- adj2d¶
Automatic differentiation instance for 2D covariance matrix weighting
- Type:
AutoDiffJacobian
- least_squares : Main optimization method
- __init__(enable_stability=False, enable_diagnostics=False, max_jacobian_elements_for_svd=10000000)[source]¶
Initialize LeastSquares with optimization algorithms and autodiff instances.
Sets up the Trust Region Reflective solver, loss functions, and separate automatic differentiation instances for different weighting schemes to maximize JAX compilation efficiency.
- Parameters:
enable_stability (bool, default False) – Enable numerical stability checks and fixes
enable_diagnostics (bool, default False) – Enable optimization diagnostics collection
max_jacobian_elements_for_svd (int, default 10_000_000) – Maximum Jacobian size (m × n elements) for SVD computation during stability checks. SVD is skipped for larger Jacobians.
- least_squares(fun, x0, jac=None, bounds=(-inf, inf), method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, x_scale=1.0, loss='linear', f_scale=1.0, diff_step=None, tr_solver=None, tr_options=None, jac_sparsity=None, max_nfev=None, verbose=0, jacobian_mode=None, xdata=None, ydata=None, data_mask=None, transform=None, timeit=False, callback=None, args=(), kwargs=None, prepared_bounds=None, **timeout_kwargs)[source]¶
Solve nonlinear least squares problem using JAX-accelerated algorithms.
This method orchestrates the optimization process by calling focused helper methods for each major step: validation, function setup, initial evaluation, stability checks, and optimization execution.
- Parameters:
fun (callable) – Residual function. Must use jax.numpy operations.
x0 (array_like) – Initial parameter guess.
jac (callable or None, optional) – Jacobian function. If None, uses JAX autodiff.
bounds (2-tuple, optional) – Parameter bounds as (lower, upper).
method (str, optional) – Optimization algorithm (‘trf’).
ftol (float, optional) – Convergence tolerances for function, parameters, and gradient.
xtol (float, optional) – Convergence tolerances for function, parameters, and gradient.
gtol (float, optional) – Convergence tolerances for function, parameters, and gradient.
x_scale (str or array_like, optional) – Parameter scaling (‘jac’ for automatic).
loss (str or callable, optional) – Robust loss function (‘linear’, ‘huber’, ‘soft_l1’, etc.).
f_scale (float, optional) – Scale parameter for robust loss functions.
max_nfev (int, optional) – Maximum function evaluations.
verbose (int, optional) – Verbosity level (0, 1, or 2).
jacobian_mode ({'auto', 'fwd', 'rev'}, optional) – Jacobian automatic differentiation mode. If None, uses configuration from environment variable, config file, or auto-default. Default is None. - ‘auto’: Automatically select based on problem dimensions - ‘fwd’: Force forward-mode AD (jacfwd) - ‘rev’: Force reverse-mode AD (jacrev)
xdata (array_like, optional) – Data for curve fitting applications.
ydata (array_like, optional) – Data for curve fitting applications.
data_mask (array_like, optional) – Boolean mask for data exclusion.
transform (array_like, optional) – Transformation matrix for weighted fitting.
timeit (bool, optional) – Enable detailed timing analysis.
callback (callable or None, optional) – Callback function called after each optimization iteration with signature
callback(iteration, cost, params, info). Useful for monitoring optimization progress, logging, or implementing custom stopping criteria. If None (default), no callback is invoked.args (tuple, optional) – Additional arguments for objective function.
kwargs (dict, optional) – Additional optimization parameters.
prepared_bounds (tuple[np.ndarray, np.ndarray] or None, optional) – Pre-prepared bounds as (lb, ub). If provided, skips prepare_bounds call. This optimization avoids redundant bounds preparation when caller has already called prepare_bounds. Default is None.
- Returns:
result – Optimization result with solution, convergence info, and statistics.
- Return type:
- autdiff_jac(jac, mode='fwd')[source]¶
We do this for all three sigma transformed functions such that if sigma is changed from none to 1D to covariance sigma then no retracing is needed.
- Parameters:
jac (None) – Passed in to maintain compatibility with the user defined Jacobian function.
mode (str, optional) – Jacobian mode (‘fwd’ or ‘rev’), by default ‘fwd’
- update_function(func)[source]¶
Wraps the given fit function to be a residual function using the data. The wrapped function is in a JAX JIT compatible format which is purely functional. This requires that both the data mask and the uncertainty transform are passed to the function. Even for the case where the data mask is all True and the uncertainty transform is None we still need to pass these arguments to the function due JAX’s functional nature.
- Parameters:
func (Callable) – The fit function to wrap.
- Return type:
None
- wrap_jac(jac)[source]¶
Wraps an user defined Jacobian function to allow for data masking and uncertainty transforms. The wrapped function is in a JAX JIT compatible format which is purely functional. This requires that both the data mask and the uncertainty transform are passed to the function.
Using an analytical Jacobian of the fit function is equivalent to the Jacobian of the residual function.
Also note that the analytical Jacobian doesn’t require the independent ydata, but we still need to pass it to the function to maintain compatibility with autdiff version which does require the ydata.
- Parameters:
jac (Callable) – The Jacobian function to wrap.
- Returns:
The masked Jacobian of the function evaluated at args with respect to the data.
- Return type:
jnp.ndarray
- class nlsq.MemoryBudget(available_gb, threshold_gb, data_gb, jacobian_gb, peak_gb)[source]¶
Bases:
objectComputed memory budget for optimizer selection.
This immutable dataclass represents the computed memory requirements and available resources for automatic optimizer strategy selection. Use the compute() factory method to create instances.
Examples
>>> budget = MemoryBudget.compute(n_points=10_000_000, n_params=10) >>> print(f"Available: {budget.available_gb:.1f} GB") >>> print(f"Peak estimate: {budget.peak_gb:.2f} GB") >>> print(f"Fits in memory: {budget.fits_in_memory}")
- property fits_in_memory: bool¶
Check if estimated peak memory fits within safe threshold.
- Returns:
True if peak_gb <= threshold_gb.
- Return type:
- property data_fits: bool¶
Check if data arrays alone fit within safe threshold.
- Returns:
True if data_gb <= threshold_gb.
- Return type:
- classmethod compute(n_points, n_params, n_features=1, dtype_bytes=8, safety_factor=0.75, memory_limit_gb=None, use_gpu=False)[source]¶
Compute memory budget for a given dataset size.
- Parameters:
n_points (int) – Number of data points.
n_params (int) – Number of fit parameters.
n_features (int, default=1) – Number of features in x_data (dimensions).
dtype_bytes (int, default=8) – Bytes per element (8 for float64, 4 for float32).
safety_factor (float, default=0.75) – Memory safety factor (0.75 means use 75% of available).
memory_limit_gb (float | None, default=None) – Override memory limit in GB. If None, auto-detect.
use_gpu (bool, default=False) – If True, use GPU memory instead of CPU memory.
- Returns:
Computed memory budget with all fields populated.
- Return type:
- Raises:
ValueError – If n_points <= 0, n_params <= 0, or safety_factor not in (0, 1].
Examples
>>> budget = MemoryBudget.compute(n_points=1_000_000, n_params=5) >>> budget.fits_in_memory True
- __init__(available_gb, threshold_gb, data_gb, jacobian_gb, peak_gb)¶
- class nlsq.MemoryBudgetSelector(safety_factor=0.75)[source]¶
Bases:
objectSelects optimal optimizer strategy based on memory budget.
This class computes memory requirements and selects between STREAMING, CHUNKED, and STANDARD strategies based on three sequential memory comparisons.
- Decision Tree:
data_gb > threshold_gb → STREAMING (data doesn’t fit)
peak_gb > threshold_gb → CHUNKED (Jacobian doesn’t fit)
else → STANDARD (everything fits)
- Parameters:
safety_factor (float, default=0.75) – Memory safety factor (0.75 means use 75% of available memory).
Examples
>>> selector = MemoryBudgetSelector(safety_factor=0.75) >>> strategy, config = selector.select(n_points=5_000_000, n_params=10) >>> if strategy == "streaming": ... # Use HybridStreamingOptimizer with config ... pass >>> elif strategy == "chunked": ... # Use LargeDatasetFitter with config ... pass >>> else: ... # Use standard curve_fit() ... pass
- __init__(safety_factor=0.75)[source]¶
Initialize selector with safety factor.
- Parameters:
safety_factor (float, default=0.75) – Memory safety factor (0.75 means use 75% of available memory).
- select(n_points, n_params, n_features=1, memory_limit_gb=None, goal=None, use_gpu=False, verbose=False)[source]¶
Select optimal optimizer strategy based on memory budget.
- Parameters:
n_points (int) – Number of data points.
n_params (int) – Number of fit parameters.
n_features (int, default=1) – Number of features in x_data.
memory_limit_gb (float | None, default=None) – Override memory limit in GB. If None, auto-detect.
goal (OptimizationGoal | None, default=None) – Optimization goal (affects tolerances, not strategy selection).
use_gpu (bool, default=False) – If True, use GPU memory instead of CPU memory.
verbose (bool, default=False) – If True, log memory budget details and strategy selection reason.
- Returns:
strategy: “streaming”, “chunked”, or “standard”
config: HybridStreamingConfig, LDMemoryConfig, or None
- Return type:
- Raises:
ValueError – If n_points <= 0 or n_params <= 0.
- class nlsq.MemoryManager(gc_threshold=0.8, safety_factor=1.2, enable_adaptive_safety=False, disable_padding=False, memory_cache_ttl=1.0, adaptive_ttl=True)[source]¶
Bases:
objectIntelligent memory management for optimization algorithms.
This class provides: - Memory usage monitoring and prediction - Array pooling to reduce allocations with LRU eviction - Automatic garbage collection triggers - Context managers for memory-safe operations
LRU Memory Pool (Task Group 7 - 1.2a)¶
The memory pool uses an OrderedDict to track access order, enabling true LRU (Least Recently Used) eviction when at capacity. This improves cache utilization for frequently accessed array shapes by 5-10%.
Telemetry Circular Buffer (Task Group 9 - 1.3a)¶
The safety telemetry uses a deque with maxlen=1000 to prevent memory leak in multi-day optimization runs. This maintains the last 1000 telemetry records for adaptive safety factor calculation.
- memory_pool¶
Pool of reusable arrays indexed by (shape, dtype) with LRU tracking
- Type:
OrderedDict
- __init__(gc_threshold=0.8, safety_factor=1.2, enable_adaptive_safety=False, disable_padding=False, memory_cache_ttl=1.0, adaptive_ttl=True)[source]¶
Initialize memory manager.
- Parameters:
gc_threshold (float) – Trigger GC when memory usage exceeds this fraction (0-1)
safety_factor (float) – Multiply memory requirements by this factor for safety
enable_adaptive_safety (bool) – Enable adaptive safety factor reduction (1.2 -> 1.05 after warmup)
disable_padding (bool) – Disable padding/bucketing for strict memory environments (Task 5.6). When True: uses exact shapes, sets safety_factor=1.0. Use case: cloud quotas, strict memory limits.
memory_cache_ttl (float) – TTL in seconds for cached memory info (default: 1.0). Reduces psutil system call overhead by 90%.
adaptive_ttl (bool) – Enable adaptive TTL based on call frequency (default: True). High-frequency callers (>100 calls/sec) get 15s effective TTL. Medium-frequency callers (>10 calls/sec) get 10s effective TTL. Low-frequency callers use the default TTL. Reduces psutil overhead in streaming optimization by 15-20%.
- get_available_memory()[source]¶
Get available memory in bytes.
- Returns:
available – Available memory in bytes
- Return type:
Notes
Uses TTL-based caching to reduce psutil system call overhead by 90%. When adaptive_ttl is enabled, the effective TTL is adjusted based on call frequency to further reduce overhead for streaming optimization.
- get_memory_usage_bytes()[source]¶
Get current memory usage in bytes.
- Returns:
usage – Current memory usage in bytes
- Return type:
Notes
Uses TTL-based caching to reduce psutil system call overhead by 90%.
- get_memory_usage_fraction()[source]¶
Get current memory usage as fraction of total.
- Returns:
fraction – Memory usage fraction (0-1)
- Return type:
Notes
Uses TTL-based caching to reduce psutil system call overhead by 90%.
- predict_memory_requirement(n_points, n_params, algorithm='trf', dtype=<class 'jax.numpy.float64'>)[source]¶
Predict memory requirement for optimization.
- Parameters:
- Returns:
bytes_needed – Estimated memory requirement in bytes
- Return type:
Notes
Memory requirements scale linearly with precision: - float32: 4 bytes per element (50% memory savings) - float64: 8 bytes per element (default, higher precision)
- memory_guard(bytes_needed)[source]¶
Context manager to ensure memory availability.
- Parameters:
bytes_needed (int) – Required memory in bytes
- Raises:
MemoryError – If insufficient memory is available
- get_safety_telemetry()[source]¶
Get safety factor telemetry statistics.
- Returns:
telemetry – Safety factor telemetry with: - current_safety_factor: Current safety factor - initial_safety_factor: Initial safety factor (1.2) - min_safety_factor: Target minimum (1.05) - telemetry_entries: Number of telemetry entries collected - p95_safety_needed: 95th percentile of safety factors needed (if data available) - safety_factor_history: List of safety factors over time
- Return type:
- allocate_array(shape, dtype=<class 'numpy.float64'>, zero=True)[source]¶
Allocate array with memory pooling and LRU tracking.
- Parameters:
- Returns:
array – Allocated array
- Return type:
np.ndarray
- Raises:
MemoryError – If allocation fails
Notes
Task Group 7 (1.2a): Uses LRU tracking via OrderedDict. When an array is reused from the pool, it is moved to the end (most recently used) to enable proper LRU eviction.
- free_array(arr)[source]¶
Return array to pool for reuse.
- Parameters:
arr (np.ndarray) – Array to free
Notes
Task Group 7 (1.2a): Uses LRU tracking via OrderedDict. The returned array is added/moved to the end of the pool, marking it as recently used.
- get_memory_stats()[source]¶
Get memory usage statistics.
- Returns:
stats – Memory statistics including current usage, peak, pool size
- Return type:
- optimize_memory_pool(max_arrays=100)[source]¶
Optimize memory pool using LRU eviction.
- Parameters:
max_arrays (int) – Maximum number of arrays to keep in pool
Notes
Task Group 7 (1.2a): Uses LRU eviction via popitem(last=False). Arrays are evicted in order of least recent use, keeping the most recently used arrays in the pool.
- temporary_allocation(shape, dtype=<class 'numpy.float64'>)[source]¶
Context manager for temporary array allocation.
- class nlsq.MemoryPool(max_pool_size=10, enable_stats=False, enable_bucketing=True)[source]¶
Bases:
objectMemory pool for reusable array buffers.
Pre-allocates buffers for common array shapes to avoid repeated allocations during optimization iterations.
- __init__(max_pool_size=10, enable_stats=False, enable_bucketing=True)[source]¶
Initialize memory pool.
- allocate(shape, dtype=<class 'jax.numpy.float64'>)[source]¶
Allocate array from pool or create new one.
- Parameters:
- Returns:
array – Allocated array (may be reused from pool)
- Return type:
jnp.ndarray
Notes
When bucketing is enabled, arrays are pooled by size classes (1KB/10KB/100KB) for better reuse rates (Task 5.4).
- release(arr)[source]¶
Return array to pool for reuse.
- Parameters:
arr (jnp.ndarray) – Array to return to pool
Notes
When bucketing is enabled, arrays are stored in size-class buckets for better reuse (Task 5.4).
- class nlsq.OptimizationDiagnostics(verbosity=1)[source]¶
Bases:
objectComprehensive optimization diagnostics and reporting.
Tracks: - Iteration data (parameters, cost, gradients) - Convergence metrics - Memory usage - Timing information - Numerical stability indicators
- __init__(verbosity=1)[source]¶
Initialize diagnostics system.
- Parameters:
verbosity (int) – Verbosity level: - 0: Minimal diagnostics (no condition number computation) - 1: Normal (cheap 1-norm condition estimate, O(nm)) - 2: Detailed (full SVD condition number, O(mn²))
- start_optimization(x0=None, problem_name='optimization', *, n_params=None, n_data=None, method=None, loss=None)[source]¶
Initialize diagnostics for new optimization.
- Parameters:
x0 (np.ndarray, optional) – Initial parameters (legacy API)
problem_name (str) – Name for this optimization problem
n_params (int, optional) – Number of parameters (new API from LeastSquares)
n_data (int, optional) – Number of data points (new API from LeastSquares)
method (str, optional) – Optimization method (new API from LeastSquares)
loss (str, optional) – Loss function name (new API from LeastSquares)
- record_iteration(iteration, x, cost, gradient=None, jacobian=None, step_size=None, method_info=None)[source]¶
Record data for current iteration.
- Parameters:
iteration (int) – Iteration number
x (np.ndarray) – Current parameters
cost (float) – Current cost value
gradient (np.ndarray, optional) – Current gradient
jacobian (np.ndarray, optional) – Current Jacobian matrix
step_size (float, optional) – Step size taken
method_info (dict, optional) – Algorithm-specific information
- get_summary_statistics()[source]¶
Get summary statistics for the optimization.
- Returns:
stats – Summary statistics
- Return type:
- class nlsq.OptimizationRecovery(max_retries=3, enable_diagnostics=True)[source]¶
Bases:
objectAutomatic recovery from optimization failures.
This class provides multiple recovery strategies for handling optimization failures including: - Parameter perturbation - Algorithm switching - Regularization adjustment - Problem reformulation - Multi-start optimization
- diagnostics¶
Diagnostics collector for monitoring
- Type:
nlsq.diagnostics.OptimizationDiagnostics
- stability_guard¶
Numerical stability checker
- Type:
nlsq.stability.NumericalStabilityGuard
- class nlsq.OptimizeResult[source]¶
Bases:
dictOptimization result container for NLSQ curve fitting operations.
This class stores the complete results from nonlinear least squares optimization performed using JAX-accelerated algorithms. It extends dict to provide both dictionary-style and attribute-style access to optimization results.
Core Attributes¶
- xjax.numpy.ndarray or numpy.ndarray
Optimized parameter vector containing the final fitted parameters. These represent the solution to the nonlinear least squares problem.
- successbool
Indicates whether the optimization terminated successfully. True means convergence criteria were satisfied within tolerance limits.
- statusint
Numerical termination status code indicating why optimization stopped:
1: Gradient convergence (||g||_inf < gtol)
2: Step size convergence (||dx||/||x|| < xtol)
3: Function value convergence (delta_f/f < ftol)
0: Maximum iterations reached
-1: Evaluation limit exceeded
-3: Inner loop iteration limit (algorithm-specific)
- messagestr
Human-readable description of termination cause. Provides detailed information about convergence status or failure reasons.
Objective Function Results¶
- funjax.numpy.ndarray
Final residual vector f(x) at the solution. For curve fitting, these are the differences between model predictions and data points.
- costfloat
Final cost function value: 0.5 * ||f(x)||² for standard least squares, or 0.5 * sum(ρ(f_i²/σ²)) for robust loss functions.
- jacjax.numpy.ndarray
Final Jacobian matrix J(x) with shape (m, n) where m is number of data points and n is number of parameters. Computed using JAX autodiff.
- gradjax.numpy.ndarray
Final gradient vector g = J^T * f with shape (n,). Used for convergence checking and parameter uncertainty estimation.
Convergence Metrics¶
- optimalityfloat
Final gradient norm ||g||_inf used for convergence assessment. Should be less than gtol for successful convergence.
- active_masknumpy.ndarray
Boolean mask indicating which parameters hit bounds (for bounded optimization). Shape (n,) with True for parameters at constraints.
Iteration Statistics¶
- nfevint
Total number of objective function evaluations during optimization. Each evaluation computes residuals f(x) for given parameters.
- njevint
Total number of Jacobian evaluations. With JAX autodiff, this equals the number of combined function+gradient evaluations.
- nitint
Number of optimization iterations completed. Not always available for all algorithms.
Algorithm-Specific Results¶
- pcovjax.numpy.ndarray, optional
Parameter covariance matrix with shape (n, n). Provides parameter uncertainty estimates. Available when uncertainty estimation is requested. Computed as: pcov = inv(J^T * J) * residual_variance
- active_masknumpy.ndarray
For bounded optimization, indicates which parameters are at bounds.
- all_timesdict, optional
Detailed timing information for algorithm profiling. Contains timing data for different optimization phases (function evaluation, Jacobian computation, linear algebra operations, etc.).
Usage Examples¶
Basic result access:
import nlsq # Perform curve fitting result = nlsq.curve_fit(model_func, x_data, y_data, p0=initial_guess) # Access optimized parameters fitted_params = result.x # Check convergence if result.success: print(f"Optimization converged: {result.message}") print(f"Final cost: {result.cost}") print(f"Function evaluations: {result.nfev}") else: print(f"Optimization failed: {result.message}") # Parameter uncertainties (if covariance computed) if hasattr(result, 'pcov'): param_errors = jnp.sqrt(jnp.diag(result.pcov)) print(f"Parameter uncertainties: {param_errors}")
Advanced result inspection:
# Examine residuals and fit quality final_residuals = result.fun rms_error = jnp.sqrt(jnp.mean(final_residuals**2)) # Check gradient convergence gradient_norm = result.optimality print(f"Final gradient norm: {gradient_norm}") # Analyze Jacobian condition jacobian = result.jac condition_number = jnp.linalg.cond(jacobian) print(f"Jacobian condition number: {condition_number}") # For bounded problems, check active constraints if hasattr(result, 'active_mask'): constrained_params = jnp.where(result.active_mask)[0] print(f"Parameters at bounds: {constrained_params}")
Integration with SciPy¶
This class maintains compatibility with scipy.optimize.OptimizeResult while adding JAX-specific features and NLSQ-specific results. It can be used interchangeably with SciPy optimization results in most contexts.
Technical Notes¶
All JAX arrays are automatically converted to NumPy arrays for compatibility
Covariance matrices use double precision for numerical stability
Large dataset results may include memory management statistics
GPU timing results require explicit timing mode activation
Progress monitoring data is stored in algorithm-specific attributes
- exception nlsq.OptimizeWarning[source]¶
Bases:
UserWarningWarning class for optimization-related issues.
This warning is raised when non-critical issues are encountered during optimization, such as unknown solver options, convergence concerns, or numerical stability warnings that don’t prevent the optimization from completing but should be brought to the user’s attention.
- Common scenarios:
Unknown or deprecated solver options passed to optimizer
Convergence achieved but with warnings about numerical conditioning
Parameter bounds adjusted automatically
Automatic algorithm selection overrides
Example
>>> import warnings >>> warnings.filterwarnings('error', category=OptimizeWarning) >>> # Now OptimizeWarning will raise an exception instead of warning
See also
nlsq.error_messages.OptimizationError : Exception for critical failures
- class nlsq.PerformanceProfiler[source]¶
Bases:
objectPerformance profiler for NLSQ optimization.
Tracks and analyzes performance metrics across optimization runs.
Examples
>>> profiler = PerformanceProfiler() >>> with profiler.profile("my_optimization"): ... result = curve_fit(model, x, y, p0=[1, 2]) >>> >>> report = profiler.get_report() >>> print(report)
- start_profile(name='default')[source]¶
Start profiling a new optimization run.
- Parameters:
name (str) – Name for this profiling session
- Returns:
metrics – Metrics object for this profile
- Return type:
- end_profile(metrics=None)[source]¶
End current profiling session.
- Parameters:
metrics (ProfileMetrics, optional) – Metrics to finalize. If None, uses current profile.
- profile(name='default')[source]¶
Context manager for profiling.
- Parameters:
name (str) – Name for this profiling session
Examples
>>> profiler = PerformanceProfiler() >>> with profiler.profile("test_1"): ... result = curve_fit(model, x, y)
- update_current(**kwargs)[source]¶
Update current profile with arbitrary metrics.
- Parameters:
**kwargs – Metrics to update
- get_metrics(name='default')[source]¶
Get all metrics for a named profile.
- Parameters:
name (str) – Profile name
- Returns:
metrics – All metrics for this profile
- Return type:
- class nlsq.ProfileMetrics(total_time=0.0, jit_compile_time=0.0, optimization_time=0.0, jacobian_time=0.0, n_iterations=0, n_function_evals=0, n_jacobian_evals=0, n_data_points=0, n_parameters=0, data_dimension=1, final_cost=0.0, initial_cost=0.0, cost_reduction=0.0, final_gradient_norm=0.0, success=False, method='', backend='cpu', metadata=<factory>)[source]¶
Bases:
objectContainer for performance metrics from a single optimization run.
- __init__(total_time=0.0, jit_compile_time=0.0, optimization_time=0.0, jacobian_time=0.0, n_iterations=0, n_function_evals=0, n_jacobian_evals=0, n_data_points=0, n_parameters=0, data_dimension=1, final_cost=0.0, initial_cost=0.0, cost_reduction=0.0, final_gradient_norm=0.0, success=False, method='', backend='cpu', metadata=<factory>)¶
- class nlsq.ProfilerVisualization(profiler)[source]¶
Bases:
objectVisualization tools for performance profiling data.
Examples
>>> from nlsq.utils.profiler import get_global_profiler >>> from nlsq.utils.profiler_visualization import ProfilerVisualization >>> profiler = get_global_profiler() >>> viz = ProfilerVisualization(profiler) >>> fig = viz.plot_timing_comparison(["test1", "test2"]) >>> viz.save_html_report("report.html")
- __init__(profiler)[source]¶
Initialize visualization tools.
- Parameters:
profiler (PerformanceProfiler) – Profiler instance to visualize
- plot_timing_comparison(names, figsize=(10, 6))[source]¶
Compare timing metrics across multiple profiles.
- plot_timing_distribution(name='default', figsize=(10, 6))[source]¶
Plot distribution of timing metrics.
- class nlsq.ProfilingDashboard(profiler)[source]¶
Bases:
objectInteractive dashboard for profiling data.
Examples
>>> from nlsq.utils.profiler import get_global_profiler >>> from nlsq.utils.profiler_visualization import ProfilingDashboard >>> profiler = get_global_profiler() >>> dashboard = ProfilingDashboard(profiler) >>> dashboard.add_profile("test1") >>> dashboard.add_profile("test2") >>> dashboard.generate_comparison_report()
- __init__(profiler)[source]¶
Initialize dashboard.
- Parameters:
profiler (PerformanceProfiler) – Profiler instance to visualize
- add_profile(name)[source]¶
Add a profile to track in the dashboard.
- Parameters:
name (str) – Profile name to track
- remove_profile(name)[source]¶
Remove a profile from tracking.
- Parameters:
name (str) – Profile name to remove
- generate_comparison_report()[source]¶
Generate comparison report for tracked profiles.
- Returns:
report – Formatted comparison report
- Return type:
- class nlsq.ProgressBar(max_nfev=None, desc='Optimizing', **tqdm_kwargs)[source]¶
Bases:
CallbackBaseProgress bar callback using tqdm.
Displays a progress bar showing optimization progress with current cost, gradient norm, and iteration statistics.
- Parameters:
Examples
>>> from nlsq import curve_fit >>> from nlsq.callbacks import ProgressBar >>> callback = ProgressBar(max_nfev=100) >>> popt, pcov = curve_fit(f, x, y, callback=callback)
- class nlsq.RobustDecomposition(enable_logging=False)[source]¶
Bases:
objectMulti-level fallback for matrix decompositions.
This class provides robust matrix decomposition methods that automatically fall back through multiple strategies if the primary method fails. The fallback chain goes from: 1. JAX on GPU (if available) 2. JAX on CPU 3. SciPy (if available) 4. NumPy 5. Safe mode with regularization
- logger¶
Logger for debugging decomposition issues
- Type:
- __init__(enable_logging=False)[source]¶
Initialize robust decomposition handler.
- Parameters:
enable_logging (bool) – Whether to enable detailed logging of fallback attempts
- svd(matrix, full_matrices=False)[source]¶
Compute SVD with automatic fallback.
- Parameters:
matrix (jnp.ndarray) – Matrix to decompose
full_matrices (bool) – Whether to compute full matrices
- Returns:
U (jnp.ndarray) – Left singular vectors
s (jnp.ndarray) – Singular values
Vt (jnp.ndarray) – Right singular vectors (transposed)
- Raises:
RuntimeError – If all decomposition methods fail
- Return type:
- qr(matrix, mode='reduced')[source]¶
Compute QR decomposition with fallback.
- Parameters:
matrix (jnp.ndarray) – Matrix to decompose
mode (str) – QR mode (‘reduced’ or ‘complete’)
- Returns:
Q (jnp.ndarray) – Orthogonal matrix
R (jnp.ndarray) – Upper triangular matrix
- Raises:
RuntimeError – If all decomposition methods fail
- Return type:
- cholesky(matrix, lower=True)[source]¶
Compute Cholesky decomposition with fallback and regularization.
- Parameters:
matrix (jnp.ndarray) – Positive definite matrix to decompose
lower (bool) – Whether to return lower triangular matrix
- Returns:
L – Cholesky factor (lower or upper triangular)
- Return type:
jnp.ndarray
- Raises:
RuntimeError – If all decomposition methods fail
- class nlsq.SmartCache(cache_dir='.nlsq_cache', max_memory_items=1000, disk_cache_enabled=True, enable_stats=True)[source]¶
Bases:
objectIntelligent caching system for optimization computations.
This class provides:
Memory and disk caching with LRU eviction
Automatic cache key generation from function arguments
Cache persistence across sessions
Cache invalidation and warming strategies
Phase 3 Optimizations (3.2a):
Array hash optimization: uses stride-based sampling only for arrays with >10000 elements when xxhash is unavailable
For smaller arrays, hashes full array directly without redundant sampling, providing 15-20% improvement in cache key generation
All dict operations are protected by a per-instance
threading.Lockso that concurrent threads can safely callget/set/invalidate.- __init__(cache_dir='.nlsq_cache', max_memory_items=1000, disk_cache_enabled=True, enable_stats=True)[source]¶
Initialize smart cache.
- cache_key(*args, **kwargs)[source]¶
Generate cache key from arguments.
- Parameters:
- Returns:
key – Hash of arguments (xxhash if available, BLAKE2b fallback)
- Return type:
Notes
Uses xxhash (xxh64) when available for ~10x faster hashing compared to SHA256/BLAKE2b. Falls back to BLAKE2b if xxhash is not installed. All cache keys are prefixed with CACHE_VERSION to ensure old cache entries are invalidated when the hash algorithm changes.
Task 9.3 (3.2a): Array hash optimization - Arrays <= 10000 elements: hash full array directly (no sampling) - Arrays > 10000 elements: use stride-based sampling for efficiency - Removes redundant sampling when computing full hash in fallback path
- get(key)[source]¶
Get value from cache.
- Parameters:
key (str) – Cache key
- Returns:
value – Cached value or None if not found
- Return type:
Any or None
- set(key, value)[source]¶
Set value in cache.
- Parameters:
key (str) – Cache key
value (Any) – Value to cache
- invalidate(key=None)[source]¶
Invalidate cache entries.
- Parameters:
key (str, optional) – Specific key to invalidate, or None to clear all
- class nlsq.SparseJacobianComputer(sparsity_threshold=0.01)[source]¶
Bases:
objectCompute and manage sparse Jacobians for large-scale problems.
For many curve fitting problems, the Jacobian has a sparse structure where each data point only depends on a subset of parameters. This class exploits that structure to reduce memory usage by 10-100x.
- __init__(sparsity_threshold=0.01)[source]¶
Initialize sparse Jacobian computer.
- Parameters:
sparsity_threshold (float) – Elements with absolute value below this threshold are considered zero. Default is 0.01 which works well for most problems.
- detect_sparsity_pattern(func, x0, xdata_sample, n_samples=100)[source]¶
Detect sparsity pattern of Jacobian from sample evaluations.
- Parameters:
func (Callable) – Function to evaluate
x0 (np.ndarray) – Initial parameter values
xdata_sample (np.ndarray or list) – Sample of x data points. Can be a single array for 1D problems, a list of arrays [X, Y] for 2D problems, or a 2D array with shape (k, N) for multi-dimensional coordinates.
n_samples (int) – Number of samples to use for pattern detection
- Returns:
pattern (np.ndarray) – Boolean array indicating non-zero elements
sparsity (float) – Fraction of zero elements
- Return type:
- compute_sparse_jacobian(jac_func, x, xdata, ydata, data_mask=None, chunk_size=10000, func=None)[source]¶
Compute Jacobian in sparse format with chunking.
- Parameters:
jac_func (Callable) – Jacobian function
x (np.ndarray) – Current parameter values
xdata (np.ndarray) – Independent variable data
ydata (np.ndarray) – Dependent variable data
data_mask (np.ndarray, optional) – Mask for valid data points
chunk_size (int) – Size of chunks for computation
func (Callable, optional) – Original function for finite difference fallback
- Returns:
J_sparse – Sparse Jacobian matrix
- Return type:
csr_matrix
- sparse_matrix_vector_product(J_sparse, v)[source]¶
Efficient sparse matrix-vector product.
- Parameters:
J_sparse (csr_matrix) – Sparse Jacobian matrix
v (np.ndarray) – Vector to multiply
- Returns:
result – J @ v
- Return type:
np.ndarray
- sparse_normal_equations(J_sparse, f)[source]¶
Set up normal equations with sparse Jacobian.
Solves (J^T @ J) @ p = -J^T @ f without forming J^T @ J explicitly.
- class nlsq.SparseOptimizer(sparsity_threshold=0.01, min_sparsity=0.9, auto_detect=True)[source]¶
Bases:
objectOptimizer that uses sparse Jacobians for large-scale problems.
This optimizer automatically detects when sparse Jacobians would be beneficial and switches to sparse computations transparently.
- __init__(sparsity_threshold=0.01, min_sparsity=0.9, auto_detect=True)[source]¶
Initialize sparse optimizer.
- should_use_sparse(n_data, n_params, force_check=False)[source]¶
Determine if sparse methods should be used.
- optimize_with_sparsity(func, x0, xdata, ydata, **kwargs)[source]¶
Optimize using sparse Jacobian methods.
- Parameters:
func (Callable) – Objective function
x0 (np.ndarray) – Initial parameters
xdata (np.ndarray) – Independent variable data
ydata (np.ndarray) – Dependent variable data
**kwargs (Any) – Additional optimization parameters
- Returns:
result – Optimization result
- Return type:
- exception nlsq.StopOptimization[source]¶
Bases:
ExceptionException raised by callbacks to request early termination.
- class nlsq.TRFMemoryPool(m, n, dtype=<class 'jax.numpy.float64'>)[source]¶
Bases:
objectSpecialized memory pool for Trust Region Reflective algorithm.
Pre-allocates buffers for common TRF operations.
- Parameters:
- nlsq.apply_automatic_fixes(xdata, ydata, p0=None, stability_report=None, rescale_data=True)[source]¶
Automatically apply fixes for detected stability issues.
This function applies common fixes such as rescaling data and parameters based on the stability report.
- Parameters:
xdata (array_like) – Independent variable data
ydata (array_like) – Dependent variable data
p0 (array_like, optional) – Initial parameter guess
stability_report (dict, optional) – Report from check_problem_stability(). If None, will be computed.
rescale_data (bool, optional) – If True (default), rescale xdata/ydata to [0, 1] when ill-conditioned or large range is detected. Set to False for applications requiring unit preservation where data must maintain physical units (e.g., time in seconds, frequency in Hz). NaN/Inf handling and parameter normalization are still applied when stability=’auto’. Default: True.
- Returns:
xdata_fixed (ndarray) – Fixed xdata
ydata_fixed (ndarray) – Fixed ydata
p0_fixed (ndarray or None) – Fixed p0
fix_info (dict) – Information about applied fixes with keys: - ‘applied_fixes’: list of str - ‘x_scale’: float - ‘y_scale’: float - ‘x_offset’: float - ‘y_offset’: float
- Return type:
Examples
>>> x = np.linspace(0, 1e6, 100) >>> y = 2.0 * x + 1.0 >>> x_fixed, y_fixed, p0_fixed, info = apply_automatic_fixes(x, y, [2.0, 1.0]) >>> print(f"Applied fixes: {info['applied_fixes']}")
>>> # For applications requiring unit preservation, disable rescaling >>> x_fixed, y_fixed, p0_fixed, info = apply_automatic_fixes( ... x, y, [2.0, 1.0], rescale_data=False ... )
- nlsq.auto_select_algorithm(f, xdata, ydata, p0=None, bounds=None, memory_limit_gb=None, user_preferences=None)[source]¶
Automatically select best optimization algorithm.
- Parameters:
- Returns:
recommendations – Algorithm recommendations
- Return type:
- nlsq.cached_function(cache=None, ttl=None)[source]¶
Decorator for caching function results.
- Parameters:
cache (SmartCache, optional) – Cache instance to use (creates new if None)
ttl (float, optional) – Time-to-live in seconds for cached values
- Returns:
decorator – Decorator function
- Return type:
function
- nlsq.cached_jacobian(cache=None)[source]¶
Decorator specifically for caching Jacobian evaluations.
- Parameters:
cache (SmartCache, optional) – Cache instance to use
- Returns:
decorator – Decorator function
- Return type:
function
- nlsq.cached_jit(func=None, static_argnums=(), donate_argnums=())[source]¶
Decorator for caching JIT-compiled functions.
- Parameters:
- Returns:
decorated – Decorated function with cached compilation
- Return type:
callable
Examples
>>> @cached_jit ... def my_function(x): ... return x ** 2
>>> @cached_jit(static_argnums=(1,)) ... def my_function_with_static(x, n): ... return x ** n
- nlsq.check_problem_stability(xdata, ydata, p0=None, f=None)[source]¶
Comprehensive pre-flight stability check for optimization problem.
Identifies potential numerical issues before optimization begins, providing warnings and recommendations for fixes.
- Parameters:
xdata (array_like) – Independent variable data
ydata (array_like) – Dependent variable data
p0 (array_like, optional) – Initial parameter guess
f (callable, optional) – Model function (currently unused, reserved for future checks)
- Returns:
report – Stability report with keys: - ‘issues’: list of (issue_type, message, severity) tuples - ‘condition_number’: float - ‘parameter_scale_ratio’: float or None - ‘has_collinearity’: bool - ‘recommendations’: list of str - ‘severity’: str (‘ok’, ‘warning’, ‘critical’)
- Return type:
Examples
>>> x = np.linspace(0, 1e6, 100) >>> y = 2.0 * x + 1.0 >>> p0 = [2.0, 1.0] >>> report = check_problem_stability(x, y, p0) >>> print(f"Severity: {report['severity']}") >>> for issue_type, message, severity in report['issues']: ... print(f"{severity}: {message}")
- nlsq.clear_compilation_cache()[source]¶
Clear the global compilation cache, function hash cache, and reset stats.
This function is useful in interactive environments (Jupyter notebooks, IPython, REPL) where model functions may be redefined during development. When a function is redefined with the same name but different code, the compilation cache may return stale compiled versions. Calling this function clears the cache and allows fresh compilation of redefined functions.
The function also resets all statistics counters (hits, misses, compilations) to zero, enabling accurate cache performance tracking after the clear.
Examples
In a Jupyter notebook, after redefining a model function:
>>> import jax.numpy as jnp >>> from nlsq.caching.compilation_cache import clear_compilation_cache >>> from nlsq import curve_fit >>> >>> # Initial model definition >>> def model(x, a, b): ... return a * jnp.exp(-b * x) >>> >>> # ... some fitting work ... >>> >>> # Redefine the model with different implementation >>> def model(x, a, b): ... return a * jnp.exp(-b * x) + 0.1 # Added offset >>> >>> # Clear cache to ensure new model is compiled >>> clear_compilation_cache() >>> >>> # Now curve_fit will use the updated model >>> popt, pcov = curve_fit(model, xdata, ydata, p0=[1.0, 0.1])
- nlsq.clear_global_pool()[source]¶
Clear the global memory pool.
Notes
For test isolation, this resets the global pool to None, forcing fresh initialization on next access.
- nlsq.configure_for_large_datasets(memory_limit_gb=8.0, enable_chunking=True, progress_reporting=True)[source]¶
Configure NLSQ for optimal large dataset performance.
This function sets up memory management, chunking, streaming, and other settings for handling large datasets efficiently.
- Parameters:
Notes
All large datasets use streaming optimization for 100% data utilization.
Examples
>>> from nlsq.config import configure_for_large_datasets >>> # Configure for large datasets with 16GB memory limit >>> configure_for_large_datasets( ... memory_limit_gb=16.0, ... progress_reporting=True ... )
- nlsq.detect_collinearity(xdata, threshold=0.95)[source]¶
Detect collinearity in multidimensional input data.
Collinearity occurs when predictors are highly correlated, leading to unstable parameter estimates.
- Parameters:
xdata (array_like) – Independent variable data (multidimensional)
threshold (float, optional) – Correlation threshold for detecting collinearity. Default: 0.95
- Returns:
has_collinearity (bool) – True if any pair of variables is highly correlated
collinear_pairs (list of tuple) – List of (i, j, correlation) for collinear variable pairs
- Return type:
Examples
>>> x1 = np.linspace(0, 10, 100) >>> x2 = 2 * x1 + 0.01 * np.random.randn(100) # Nearly collinear >>> xdata = np.column_stack([x1, x2]) >>> has_coll, pairs = detect_collinearity(xdata) >>> print(f"Collinear: {has_coll}") Collinear: True
- nlsq.detect_jacobian_sparsity(func, x0, xdata_sample, threshold=0.01)[source]¶
Detect and analyze Jacobian sparsity for a given problem.
- Parameters:
func (Callable) – Objective function
x0 (np.ndarray) – Initial parameters
xdata_sample (np.ndarray or list) – Sample of x data. Can be a single array for 1D problems, a list of arrays [X, Y] for 2D problems, or a 2D array with shape (k, N) for multi-dimensional coordinates.
threshold (float) – Threshold for zero elements
- Returns:
sparsity (float) – Fraction of zero elements
info (dict) – Additional sparsity information
- Return type:
- nlsq.detect_parameter_scale_mismatch(p0, threshold=1000000.0)[source]¶
Detect if parameter scales differ by too many orders of magnitude.
Large scale differences can cause numerical issues and slow convergence.
- Parameters:
p0 (array_like) – Initial parameter guess
threshold (float, optional) – Ratio threshold for detecting mismatch. Default: 1e6
- Returns:
has_mismatch (bool) – True if parameter scales differ by more than threshold
scale_ratio (float) – Ratio of largest to smallest parameter magnitude
- Return type:
Examples
>>> p0 = np.array([1e-3, 1e3, 1.0]) >>> has_mismatch, ratio = detect_parameter_scale_mismatch(p0) >>> print(f"Mismatch: {has_mismatch}, Ratio: {ratio:.2e}") Mismatch: True, Ratio: 1.00e+06
- nlsq.estimate_condition_number(xdata)[source]¶
Estimate the condition number of the data matrix.
This checks if the independent variable data is well-conditioned for least squares fitting. High condition numbers indicate numerical instability.
- Parameters:
xdata (array_like) – Independent variable data (can be 1D or 2D)
- Returns:
condition_number – Estimated condition number. Values > 1e10 indicate potential problems.
- Return type:
Notes
For 1D data, constructs a Vandermonde-like matrix with [1, x, x^2]. For multidimensional data, computes the condition number directly.
- nlsq.estimate_memory_requirements(n_points, n_params)[source]¶
Estimate memory requirements for a dataset.
- Parameters:
- Returns:
Memory requirements and processing recommendations
- Return type:
DatasetStats
Examples
>>> from nlsq.streaming.large_dataset import estimate_memory_requirements >>> >>> # Estimate requirements for 50M points, 3 parameters >>> stats = estimate_memory_requirements(50_000_000, 3) >>> print(f"Estimated memory: {stats.total_memory_estimate_gb:.2f} GB") >>> print(f"Recommended chunk size: {stats.recommended_chunk_size:,}") >>> print(f"Number of chunks: {stats.n_chunks}")
- nlsq.fit_large_dataset(f, xdata, ydata, p0=None, memory_limit_gb=8.0, show_progress=False, logger=None, multistart=False, n_starts=10, sampler='lhs', **kwargs)[source]¶
Convenience function for fitting large datasets.
- Parameters:
f (callable) – The model function f(x, *params) -> y
xdata (np.ndarray) – Independent variable data
ydata (np.ndarray) – Dependent variable data
p0 (array-like, optional) – Initial parameter guess
memory_limit_gb (float, optional) – Memory limit in GB (default: 8.0)
show_progress (bool, optional) – Whether to show progress (default: False)
logger (logging.Logger, optional) – External logger for application integration (default: None)
multistart (bool, optional) – Enable multi-start optimization for global search (default: False). When enabled, explores multiple starting points on full data before running the full chunked optimization.
n_starts (int, optional) – Number of starting points for multi-start optimization (default: 10). Set to 0 to disable multi-start even when multistart=True.
sampler (str, optional) – Sampling strategy for generating starting points (default: ‘lhs’). Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.
**kwargs – Additional arguments passed to curve_fit
- Returns:
Optimization result
- Return type:
Examples
>>> from nlsq.streaming.large_dataset import fit_large_dataset >>> import numpy as np >>> import jax.numpy as jnp >>> >>> # Generate large dataset >>> x_large = np.linspace(0, 10, 5_000_000) >>> y_large = 2.5 * np.exp(-1.3 * x_large) + np.random.normal(0, 0.1, len(x_large)) >>> >>> # Fit with automatic memory management >>> result = fit_large_dataset( ... lambda x, a, b: a * jnp.exp(-b * x), ... x_large, y_large, ... p0=[2.0, 1.0], ... memory_limit_gb=4.0, ... show_progress=True ... ) >>> print(f"Fitted parameters: {result.popt}") >>> print(f"Success rate: {result.success_rate:.1%}") >>> >>> # Fit with multi-start optimization >>> result = fit_large_dataset( ... lambda x, a, b: a * jnp.exp(-b * x), ... x_large, y_large, ... p0=[2.0, 1.0], ... bounds=([0, 0], [10, 5]), ... multistart=True, ... n_starts=10, ... sampler='lhs' ... ) >>> >>> # Check failure diagnostics if some chunks failed >>> if result.failure_summary['total_failures'] > 0: ... print(f"Failed chunks: {result.failure_summary['failed_chunk_indices']}") ... print(f"Common errors: {result.failure_summary['common_errors']}")
- nlsq.get_global_cache()[source]¶
Get global cache instance.
- Returns:
cache – Global cache instance
- Return type:
- nlsq.get_global_compilation_cache()[source]¶
Get or create global compilation cache (thread-safe).
Uses double-checked locking to avoid acquiring the lock on every call once the singleton has been initialized.
- Returns:
cache – Global compilation cache instance
- Return type:
- nlsq.get_global_pool(enable_stats=False)[source]¶
Get or create global memory pool.
- Parameters:
enable_stats (bool) – Enable statistics tracking
- Returns:
pool – Global memory pool instance
- Return type:
- nlsq.get_global_profiler()[source]¶
Get the global profiler instance.
- Returns:
profiler – Global profiler
- Return type:
- nlsq.get_jit_cache()[source]¶
Get JIT compilation cache.
- Returns:
cache – JIT compilation cache
- Return type:
JITCompilationCache
- nlsq.get_large_dataset_config()[source]¶
Get the current large dataset configuration.
- Returns:
Current large dataset configuration
- Return type:
LargeDatasetConfig
- nlsq.get_memory_config()[source]¶
Get the current memory configuration.
- Returns:
Current memory configuration
- Return type:
MemoryConfig
- nlsq.get_memory_manager()[source]¶
Get or create global memory manager instance.
- Returns:
manager – Global memory manager instance
- Return type:
- nlsq.get_memory_stats()[source]¶
Get memory usage statistics.
- Returns:
stats – Memory statistics
- Return type:
- nlsq.infer_bounds(xdata, ydata, p0, safety_factor=10.0, enforce_positivity=True)[source]¶
Infer reasonable parameter bounds from data and initial guess.
This is a convenience function that creates a BoundsInference instance and returns the inferred bounds.
- Parameters:
xdata (array_like) – Independent variable data
ydata (array_like) – Dependent variable data
p0 (array_like) – Initial parameter guess
safety_factor (float, optional) – Multiplier for bound ranges (larger = more conservative). Default: 10.0
enforce_positivity (bool, optional) – Force all bounds to be non-negative if p0 is positive. Default: True
- Returns:
lower_bounds (ndarray) – Lower bounds for each parameter
upper_bounds (ndarray) – Upper bounds for each parameter
- Return type:
Examples
Basic usage:
>>> import numpy as np >>> x = np.linspace(0, 10, 100) >>> y = 2.5 * np.exp(-0.5 * x) + 1.0 >>> p0 = [2.0, 0.5, 1.0] >>> bounds = infer_bounds(x, y, p0) >>> print(f"Lower: {bounds[0]}") >>> print(f"Upper: {bounds[1]}")
With custom safety factor:
>>> bounds = infer_bounds(x, y, p0, safety_factor=20.0) # More conservative
Allow negative parameters:
>>> bounds = infer_bounds(x, y, p0, enforce_positivity=False)
- nlsq.large_dataset_context(large_dataset_config)[source]¶
Context manager for temporarily changing large dataset configuration.
- Parameters:
large_dataset_config (LargeDatasetConfig) – Temporary large dataset configuration
Examples
>>> from nlsq.config import large_dataset_context, LargeDatasetConfig >>> temp_config = LargeDatasetConfig(enable_automatic_solver_selection=True) >>> with large_dataset_context(temp_config): ... # Code here uses automatic solver selection ... result = fit_large_dataset(func, x, y)
- nlsq.memory_context(memory_config)[source]¶
Context manager for temporarily changing memory configuration.
- Parameters:
memory_config (MemoryConfig) – Temporary memory configuration
Examples
>>> from nlsq.config import memory_context, MemoryConfig >>> temp_config = MemoryConfig(memory_limit_gb=16.0) >>> with memory_context(temp_config): ... # Code here runs with increased memory limit ... result = fit_large_dataset(func, x, y) >>> # Back to previous memory settings
- nlsq.merge_bounds(inferred_bounds, user_bounds=None)[source]¶
Merge user-provided bounds with inferred bounds.
User-provided bounds take precedence. If user provides partial bounds (e.g., only lower or only for some parameters), the remaining bounds are filled from inferred bounds.
- Parameters:
- Returns:
lower_bounds (ndarray) – Merged lower bounds
upper_bounds (ndarray) – Merged upper bounds
- Return type:
Examples
>>> inferred = (np.array([0, 0, 0]), np.array([10, 5, 10])) >>> user = (np.array([1, -np.inf, 0]), np.array([5, np.inf, 10])) >>> merged = merge_bounds(inferred, user) >>> print(merged) (array([1., 0., 0.]), array([5., 5., 10.]))
Notes
If user provides -np.inf for lower bound, use inferred lower bound
If user provides np.inf for upper bound, use inferred upper bound
Scalar user bounds are broadcast to all parameters
- nlsq.set_memory_limits(memory_limit_gb, gpu_memory_fraction=None, safety_factor=0.8)[source]¶
Set memory limits for NLSQ operations.
- Parameters:
Examples
>>> from nlsq.config import set_memory_limits >>> # Set 16GB memory limit with 80% GPU memory usage >>> set_memory_limits(16.0, gpu_memory_fraction=0.8)