nlsq.hybrid_streaming_config module

Configuration for adaptive hybrid streaming optimizer.

This module provides configuration options for the four-phase hybrid optimizer that combines parameter normalization, L-BFGS warmup, streaming Gauss-Newton, and exact covariance computation.

class nlsq.streaming.hybrid_config.HybridStreamingConfig(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)[source]

Bases: object

Configuration for adaptive hybrid streaming optimizer.

This configuration class controls all aspects of the four-phase hybrid optimizer: - Phase 0: Parameter normalization setup - Phase 1: L-BFGS warmup with adaptive switching - Phase 2: Streaming Gauss-Newton with exact J^T J accumulation - Phase 3: Denormalization and covariance transform

Parameters:
  • normalize (bool, default=True) – Enable parameter normalization. When True, parameters are normalized to similar scales to improve gradient signal quality and convergence speed.

  • normalization_strategy (str, default='auto') –

    Strategy for parameter normalization. Options:

    • ’auto’: Use bounds-based if bounds provided, else p0-based

    • ’bounds’: Normalize to [0, 1] using parameter bounds

    • ’p0’: Scale by initial parameter magnitudes

    • ’none’: Identity transform (no normalization)

  • warmup_iterations (int, default=200) – Number of L-BFGS warmup iterations before checking switch criteria. With L-BFGS, typical values are 20-50 (5-10x fewer than Adam). More iterations allow better initial convergence before switching to Gauss-Newton.

  • max_warmup_iterations (int, default=500) – Maximum L-BFGS warmup iterations before forced switch to Phase 2. Safety limit to prevent indefinite warmup when loss plateaus slowly.

  • warmup_learning_rate (float, default=0.001) – Legacy warmup step size retained for backward compatibility. L-BFGS warmup uses lbfgs_initial_step_size and adaptive step sizes.

  • loss_plateau_threshold (float, default=1e-4) – Relative loss improvement threshold for plateau detection. Switch to Phase 2 if: abs(loss - prev_loss) / (abs(prev_loss) + eps) < threshold. Smaller values = stricter plateau detection = later switching.

  • gradient_norm_threshold (float, default=1e-3) – Gradient norm threshold for early Phase 2 switch. Switch to Phase 2 if: ||gradient|| < threshold. Indicates optimization is close to optimum and Gauss-Newton will be effective.

  • active_switching_criteria (list, default=['plateau', 'gradient', 'max_iter']) –

    List of active switching criteria for Phase 1 -> Phase 2 transition. Available criteria:

    • ’plateau’: Loss plateau detection (loss_plateau_threshold)

    • ’gradient’: Gradient norm below threshold (gradient_norm_threshold)

    • ’max_iter’: Maximum iterations reached (max_warmup_iterations)

    Switch occurs when ANY active criterion is met.

  • lbfgs_history_size (int, default=10) – Number of previous gradients and updates to store for L-BFGS Hessian approximation. Standard default from SciPy, PyTorch, and Nocedal & Wright. Larger values give better Hessian approximation but use more memory.

  • lbfgs_initial_step_size (float, default=0.1) – Initial step size for L-BFGS during cold start (first m iterations while history buffer fills). Small value prevents overshooting when Hessian approximation is poor (identity matrix initially).

  • lbfgs_line_search (str, default='wolfe') –

    Line search method for L-BFGS step acceptance. Options:

    • ’wolfe’: Standard Wolfe conditions (default)

    • ’strong_wolfe’: Strong Wolfe conditions (stricter)

    • ’backtracking’: Simple backtracking line search

  • lbfgs_exploration_step_size (float, default=0.1) – L-BFGS initial step size for exploration mode (high relative loss). Small value prevents first “Hessian=Identity” step from overshooting.

  • lbfgs_refinement_step_size (float, default=1.0) – L-BFGS initial step size for refinement mode (low relative loss). Larger value leverages L-BFGS’s near-Newton convergence speed when close to optimum.

  • gauss_newton_max_iterations (int, default=100) – Maximum iterations for Phase 2 Gauss-Newton optimization. Typical values: 50-200.

  • gauss_newton_tol (float, default=1e-8) – Convergence tolerance for Phase 2 (gradient norm threshold). Optimization stops if: ||gradient|| < tol.

  • trust_region_initial (float, default=1.0) – Initial trust region radius for Gauss-Newton step control. Radius is adapted based on actual vs predicted reduction ratio.

  • regularization_factor (float, default=1e-10) – Regularization factor for rank-deficient J^T J matrices. Added to diagonal: J^T J + regularization_factor * I.

  • cg_max_iterations (int, default=100) – Maximum iterations for Conjugate Gradient solver in Phase 2. Used when parameter count exceeds cg_param_threshold. Higher values allow better convergence but more computation.

  • cg_relative_tolerance (float, default=1e-4) – Relative tolerance for CG solver convergence. Convergence check: ||r|| < cg_relative_tolerance * ||J^T r_0||. Implements Inexact Newton strategy for efficiency.

  • cg_absolute_tolerance (float, default=1e-10) – Absolute tolerance floor for CG solver convergence. Safety floor to prevent over-iteration on well-conditioned systems.

  • cg_param_threshold (int, default=2000) –

    Parameter count threshold for auto-selecting CG vs materialized solver.

    • p < threshold: Use materialized J^T J with SVD solve (faster for small p)

    • p >= threshold: Use CG with implicit matvec (O(p) memory vs O(p^2))

    Threshold balances memory savings vs additional data passes for CG.

  • enable_group_variance_regularization (bool, default=False) –

    Enable variance regularization for parameter groups. When enabled, adds a penalty term to the loss function that penalizes variance within specified parameter groups. This is essential for preventing per-group parameter absorption in multi-component fitting.

    The regularized loss becomes L = MSE + group_variance_lambda * sum(Var(group_i)) where each group_i is a slice of parameters defined by group_variance_indices.

  • group_variance_lambda (float, default=0.01) – Regularization strength for group variance penalty. Larger values more strongly penalize variance within parameter groups. Use 0.001-0.01 for light regularization (allows moderate group variation), 0.1-1.0 for moderate regularization (constrains groups to be similar), or 10-1000 for strong regularization (forces groups to be nearly uniform). For multi-component fits with per-group parameters, use lambda ~ 0.1 * n_data / (n_groups * sigma^2) where sigma is the expected experimental variation (~0.05 for 5%).

  • group_variance_indices (list of tuple, default=None) –

    List of (start, end) tuples defining parameter groups for variance regularization. Each tuple specifies a slice [start:end] of the parameter vector that should have low internal variance.

    Example for 23 independent groups: group_variance_indices = [(0, 23), (23, 46)] constrains contrast params [0:23] and offset params [23:46] to each have low variance, preventing them from absorbing group-dependent physical signals.

    If None when enable_group_variance_regularization=True, no groups are regularized (effectively disabling the feature).

  • chunk_size (int, default=10000) – Size of data chunks for streaming J^T J accumulation. Larger chunks = faster but more memory. Typical: 5000-50000.

  • gc_chunk_interval (int, default=10) – Chunks between gc.collect() calls (FR-007). Controls how often garbage collection runs during chunked processing. Higher values reduce GC overhead but may increase memory usage. Default of 10 balances memory reclamation with performance.

  • enable_checkpoints (bool, default=True) – Enable checkpoint save/resume for fault tolerance.

  • checkpoint_frequency (int, default=100) – Save checkpoint every N iterations (across all phases).

  • validate_numerics (bool, default=True) – Enable NaN/Inf validation at gradient, parameter, and loss computation points.

  • enable_multi_device (bool, default=False) – Enable multi-GPU/TPU parallelism for Jacobian computation. Uses JAX pmap for data-parallel computation across devices.

  • callback_frequency (int, default=10) – Call progress callback every N iterations (if callback provided).

  • enable_multistart (bool, default=False) – Enable multi-start optimization with tournament selection during Phase 1. When enabled, generates multiple starting points using LHS sampling and uses tournament elimination to select the best candidate for Phase 2.

  • n_starts (int, default=10) – Number of starting points for multi-start optimization. Only used when enable_multistart=True.

  • multistart_sampler (str, default='lhs') – Sampling method for generating starting points. Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.

  • elimination_rounds (int, default=3) – Number of tournament elimination rounds. Each round eliminates elimination_fraction of candidates.

  • elimination_fraction (float, default=0.5) – Fraction of candidates to eliminate per round. Must be in (0, 1). Default 0.5 = eliminate half each round.

  • batches_per_round (int, default=50) – Number of data batches to use for evaluation in each tournament round. More batches = more reliable selection but slower.

Examples

Default configuration:

>>> from nlsq import HybridStreamingConfig
>>> config = HybridStreamingConfig()
>>> config.warmup_iterations
200

Aggressive profile (faster convergence with L-BFGS):

>>> config = HybridStreamingConfig.aggressive()
>>> config.warmup_iterations
50

Conservative profile (higher quality):

>>> config = HybridStreamingConfig.conservative()
>>> config.gauss_newton_tol < 1e-8
True

Memory-optimized profile:

>>> config = HybridStreamingConfig.memory_optimized()
>>> config.chunk_size < 10000
True

Custom configuration:

>>> config = HybridStreamingConfig(
...     warmup_iterations=50,
...     lbfgs_history_size=15,
...     chunk_size=5000,
... )

With multi-start tournament selection:

>>> config = HybridStreamingConfig(
...     enable_multistart=True,
...     n_starts=20,
...     elimination_rounds=3,
...     batches_per_round=50,
... )

See also

AdaptiveHybridStreamingOptimizer

Optimizer that uses this configuration

curve_fit

High-level interface with method=’hybrid_streaming’

TournamentSelector

Tournament selection for multi-start optimization

Notes

Based on Adaptive Hybrid Streaming Optimizer specification: agent-os/specs/2025-12-18-adaptive-hybrid-streaming-optimizer/spec.md

L-BFGS replaces Adam for warmup, providing 5-10x faster convergence to the basin of attraction through approximate Hessian information.

normalize: bool
normalization_strategy: str
warmup_iterations: int
max_warmup_iterations: int
warmup_learning_rate: float
loss_plateau_threshold: float
gradient_norm_threshold: float
active_switching_criteria: list[str] | None
lbfgs_history_size: int
lbfgs_initial_step_size: float
lbfgs_line_search: Literal['wolfe', 'strong_wolfe', 'backtracking']
lbfgs_exploration_step_size: float
lbfgs_refinement_step_size: float
use_learning_rate_schedule: bool
lr_schedule_warmup_steps: int
lr_schedule_decay_steps: int
lr_schedule_end_value: float
gradient_clip_value: float | None
enable_warm_start_detection: bool
warm_start_threshold: float
enable_adaptive_warmup_lr: bool
warmup_lr_refinement: float
warmup_lr_careful: float
enable_cost_guard: bool
cost_increase_tolerance: float
enable_step_clipping: bool
max_warmup_step_size: float
gauss_newton_max_iterations: int
gauss_newton_tol: float
trust_region_initial: float
regularization_factor: float
cg_max_iterations: int
cg_relative_tolerance: float
cg_absolute_tolerance: float
cg_param_threshold: int
enable_group_variance_regularization: bool
group_variance_lambda: float
group_variance_indices: list[tuple[int, int]] | None
enable_residual_weighting: bool
residual_weights: list[float] | None
chunk_size: int
gc_chunk_interval: int
loop_strategy: Literal['auto', 'scan', 'loop']
enable_checkpoints: bool
checkpoint_frequency: int
checkpoint_dir: str | None
resume_from_checkpoint: str | None
validate_numerics: bool
enable_fault_tolerance: bool
max_retries_per_batch: int
min_success_rate: float
enable_multi_device: bool
callback_frequency: int
verbose: int
log_frequency: int
enable_multistart: bool
n_starts: int
multistart_sampler: Literal['lhs', 'sobol', 'halton']
elimination_rounds: int
elimination_fraction: float
batches_per_round: int
center_on_p0: bool
scale_factor: float
__post_init__()[source]

Validate configuration after initialization.

Delegates to specialized validator functions for each configuration group. ConfigValidationError from validators is re-raised as ValueError for backwards compatibility.

classmethod aggressive()[source]

Create aggressive profile: faster convergence with L-BFGS, looser tolerances.

This preset prioritizes speed over robustness: - L-BFGS warmup with reduced iterations (50 vs 300 with Adam) - Higher learning rate for faster progress - Looser tolerances for earlier Phase 2 switching - Larger chunks for better throughput

Returns:

Configuration with aggressive settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.aggressive()
>>> config.warmup_learning_rate
0.003
>>> config.warmup_iterations
50
classmethod conservative()[source]

Create conservative profile: slower but robust, tighter tolerances.

This preset prioritizes solution quality over speed: - L-BFGS warmup with conservative iterations - Lower learning rate for stability - Tighter tolerances for higher quality - More Gauss-Newton iterations

Returns:

Configuration with conservative settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.conservative()
>>> config.gauss_newton_tol
1e-10
>>> config.warmup_iterations
30
classmethod memory_optimized()[source]

Create memory-optimized profile: smaller chunks, efficient settings.

This preset minimizes memory footprint: - Smaller chunks to reduce memory usage - L-BFGS warmup with reduced iterations - Enable checkpoints for recovery (important when memory is tight) - Lower CG threshold for more aggressive CG usage (avoids O(p^2) J^T J)

Returns:

Configuration with memory-optimized settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.memory_optimized()
>>> config.chunk_size
5000
>>> config.warmup_iterations
40
>>> config.cg_param_threshold
1000
classmethod with_multistart(n_starts=10, **kwargs)[source]

Create configuration with multi-start tournament selection enabled.

This preset enables multi-start optimization for finding global optima: - Tournament selection during Phase 1 warmup - LHS sampling for generating starting points - Progressive elimination to select best candidate

Parameters:
  • n_starts (int, default=10) – Number of starting points to generate.

  • **kwargs – Additional configuration parameters to override.

Returns:

Configuration with multi-start enabled.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.with_multistart(n_starts=20)
>>> config.enable_multistart
True
>>> config.n_starts
20
classmethod defense_strict()[source]

Create strict defense layer profile for near-optimal scenarios.

This preset maximizes protection against divergence when initial parameters are expected to be close to optimal (warm starts, refinement): - Very low warm start threshold (triggers at 1% relative loss) - Ultra-conservative learning rates for refinement - Very tight cost guard tolerance (5% increase aborts) - Very small step clipping for stability

Use this when: - Continuing optimization from a previous fit - Refining parameters that are already close to optimal - Dealing with ill-conditioned problems - Prioritizing stability over speed

Returns:

Configuration with strict defense layer settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_strict()
>>> config.warm_start_threshold
0.01
>>> config.cost_increase_tolerance
0.05
>>> config.warmup_iterations
25
classmethod defense_relaxed()[source]

Create relaxed defense layer profile for exploration-heavy scenarios.

This preset reduces defense layer sensitivity for problems where significant parameter exploration is needed: - Higher warm start threshold (50% relative loss needed to skip) - More aggressive learning rates for exploration - Generous cost guard tolerance (50% increase allowed) - Larger step clipping for faster exploration

Use this when: - Starting from a rough initial guess - Exploring a wide parameter space - Problems with multiple local minima - Speed is more important than robustness

Returns:

Configuration with relaxed defense layer settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_relaxed()
>>> config.warm_start_threshold
0.5
>>> config.cost_increase_tolerance
0.5
>>> config.warmup_iterations
50
classmethod defense_disabled()[source]

Create profile with all defense layers disabled.

This preset completely disables the 4-layer defense strategy, reverting to pre-0.3.6 behavior. Use with caution as this removes protection against warmup divergence.

Use this when: - Debugging to isolate defense layer effects - Benchmarking without defense overhead - Backward compatibility with older code is required

Returns:

Configuration with all defense layers disabled.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_disabled()
>>> config.enable_warm_start_detection
False
classmethod scientific_default()[source]

Create profile optimized for scientific computing workflows.

This preset is tuned for scientific fitting scenarios like spectroscopy, decay curves, and other physics-based models: - Balanced defense layers that protect without being too aggressive - L-BFGS warmup with moderate iterations - Enabled checkpoints for long-running fits

Use this when: - Fitting physics-based models (spectroscopy, decay curves) - Numerical precision is important - Parameters may have multiple scales - Reproducibility is required

Returns:

Configuration optimized for scientific computing.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.scientific_default()
>>> config.warmup_iterations
35
__init__(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)

Overview

The nlsq.hybrid_streaming_config module provides configuration options for the four-phase adaptive hybrid streaming optimizer. This configuration controls all aspects of the optimization process including parameter normalization, L-BFGS warmup, streaming Gauss-Newton, and covariance computation.

New in version 0.3.0: Complete configuration for adaptive hybrid streaming.

Key Features

  • Phase 0: Parameter normalization configuration (bounds-based, p0-based, or none)

  • Phase 1: L-BFGS warmup with configurable step sizes and switching criteria

  • 4-Layer Defense Strategy (new in 0.3.6): Protection against warmup divergence

  • Phase 2: Streaming Gauss-Newton with trust region and regularization control

  • Phase 3: Denormalization and covariance transform settings

  • Fault tolerance: Checkpointing, validation, and retry configuration

  • Multi-device: GPU/TPU parallelism settings

  • Presets: Ready-to-use profiles for common use cases

New in version 0.3.6: 4-layer defense strategy parameters and sensitivity presets.

Classes

class nlsq.streaming.hybrid_config.HybridStreamingConfig(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)[source]

Bases: object

Configuration for adaptive hybrid streaming optimizer.

This configuration class controls all aspects of the four-phase hybrid optimizer: - Phase 0: Parameter normalization setup - Phase 1: L-BFGS warmup with adaptive switching - Phase 2: Streaming Gauss-Newton with exact J^T J accumulation - Phase 3: Denormalization and covariance transform

Parameters:
  • normalize (bool, default=True) – Enable parameter normalization. When True, parameters are normalized to similar scales to improve gradient signal quality and convergence speed.

  • normalization_strategy (str, default='auto') –

    Strategy for parameter normalization. Options:

    • ’auto’: Use bounds-based if bounds provided, else p0-based

    • ’bounds’: Normalize to [0, 1] using parameter bounds

    • ’p0’: Scale by initial parameter magnitudes

    • ’none’: Identity transform (no normalization)

  • warmup_iterations (int, default=200) – Number of L-BFGS warmup iterations before checking switch criteria. With L-BFGS, typical values are 20-50 (5-10x fewer than Adam). More iterations allow better initial convergence before switching to Gauss-Newton.

  • max_warmup_iterations (int, default=500) – Maximum L-BFGS warmup iterations before forced switch to Phase 2. Safety limit to prevent indefinite warmup when loss plateaus slowly.

  • warmup_learning_rate (float, default=0.001) – Legacy warmup step size retained for backward compatibility. L-BFGS warmup uses lbfgs_initial_step_size and adaptive step sizes.

  • loss_plateau_threshold (float, default=1e-4) – Relative loss improvement threshold for plateau detection. Switch to Phase 2 if: abs(loss - prev_loss) / (abs(prev_loss) + eps) < threshold. Smaller values = stricter plateau detection = later switching.

  • gradient_norm_threshold (float, default=1e-3) – Gradient norm threshold for early Phase 2 switch. Switch to Phase 2 if: ||gradient|| < threshold. Indicates optimization is close to optimum and Gauss-Newton will be effective.

  • active_switching_criteria (list, default=['plateau', 'gradient', 'max_iter']) –

    List of active switching criteria for Phase 1 -> Phase 2 transition. Available criteria:

    • ’plateau’: Loss plateau detection (loss_plateau_threshold)

    • ’gradient’: Gradient norm below threshold (gradient_norm_threshold)

    • ’max_iter’: Maximum iterations reached (max_warmup_iterations)

    Switch occurs when ANY active criterion is met.

  • lbfgs_history_size (int, default=10) – Number of previous gradients and updates to store for L-BFGS Hessian approximation. Standard default from SciPy, PyTorch, and Nocedal & Wright. Larger values give better Hessian approximation but use more memory.

  • lbfgs_initial_step_size (float, default=0.1) – Initial step size for L-BFGS during cold start (first m iterations while history buffer fills). Small value prevents overshooting when Hessian approximation is poor (identity matrix initially).

  • lbfgs_line_search (str, default='wolfe') –

    Line search method for L-BFGS step acceptance. Options:

    • ’wolfe’: Standard Wolfe conditions (default)

    • ’strong_wolfe’: Strong Wolfe conditions (stricter)

    • ’backtracking’: Simple backtracking line search

  • lbfgs_exploration_step_size (float, default=0.1) – L-BFGS initial step size for exploration mode (high relative loss). Small value prevents first “Hessian=Identity” step from overshooting.

  • lbfgs_refinement_step_size (float, default=1.0) – L-BFGS initial step size for refinement mode (low relative loss). Larger value leverages L-BFGS’s near-Newton convergence speed when close to optimum.

  • gauss_newton_max_iterations (int, default=100) – Maximum iterations for Phase 2 Gauss-Newton optimization. Typical values: 50-200.

  • gauss_newton_tol (float, default=1e-8) – Convergence tolerance for Phase 2 (gradient norm threshold). Optimization stops if: ||gradient|| < tol.

  • trust_region_initial (float, default=1.0) – Initial trust region radius for Gauss-Newton step control. Radius is adapted based on actual vs predicted reduction ratio.

  • regularization_factor (float, default=1e-10) – Regularization factor for rank-deficient J^T J matrices. Added to diagonal: J^T J + regularization_factor * I.

  • cg_max_iterations (int, default=100) – Maximum iterations for Conjugate Gradient solver in Phase 2. Used when parameter count exceeds cg_param_threshold. Higher values allow better convergence but more computation.

  • cg_relative_tolerance (float, default=1e-4) – Relative tolerance for CG solver convergence. Convergence check: ||r|| < cg_relative_tolerance * ||J^T r_0||. Implements Inexact Newton strategy for efficiency.

  • cg_absolute_tolerance (float, default=1e-10) – Absolute tolerance floor for CG solver convergence. Safety floor to prevent over-iteration on well-conditioned systems.

  • cg_param_threshold (int, default=2000) –

    Parameter count threshold for auto-selecting CG vs materialized solver.

    • p < threshold: Use materialized J^T J with SVD solve (faster for small p)

    • p >= threshold: Use CG with implicit matvec (O(p) memory vs O(p^2))

    Threshold balances memory savings vs additional data passes for CG.

  • enable_group_variance_regularization (bool, default=False) –

    Enable variance regularization for parameter groups. When enabled, adds a penalty term to the loss function that penalizes variance within specified parameter groups. This is essential for preventing per-group parameter absorption in multi-component fitting.

    The regularized loss becomes L = MSE + group_variance_lambda * sum(Var(group_i)) where each group_i is a slice of parameters defined by group_variance_indices.

  • group_variance_lambda (float, default=0.01) – Regularization strength for group variance penalty. Larger values more strongly penalize variance within parameter groups. Use 0.001-0.01 for light regularization (allows moderate group variation), 0.1-1.0 for moderate regularization (constrains groups to be similar), or 10-1000 for strong regularization (forces groups to be nearly uniform). For multi-component fits with per-group parameters, use lambda ~ 0.1 * n_data / (n_groups * sigma^2) where sigma is the expected experimental variation (~0.05 for 5%).

  • group_variance_indices (list of tuple, default=None) –

    List of (start, end) tuples defining parameter groups for variance regularization. Each tuple specifies a slice [start:end] of the parameter vector that should have low internal variance.

    Example for 23 independent groups: group_variance_indices = [(0, 23), (23, 46)] constrains contrast params [0:23] and offset params [23:46] to each have low variance, preventing them from absorbing group-dependent physical signals.

    If None when enable_group_variance_regularization=True, no groups are regularized (effectively disabling the feature).

  • chunk_size (int, default=10000) – Size of data chunks for streaming J^T J accumulation. Larger chunks = faster but more memory. Typical: 5000-50000.

  • gc_chunk_interval (int, default=10) – Chunks between gc.collect() calls (FR-007). Controls how often garbage collection runs during chunked processing. Higher values reduce GC overhead but may increase memory usage. Default of 10 balances memory reclamation with performance.

  • enable_checkpoints (bool, default=True) – Enable checkpoint save/resume for fault tolerance.

  • checkpoint_frequency (int, default=100) – Save checkpoint every N iterations (across all phases).

  • validate_numerics (bool, default=True) – Enable NaN/Inf validation at gradient, parameter, and loss computation points.

  • enable_multi_device (bool, default=False) – Enable multi-GPU/TPU parallelism for Jacobian computation. Uses JAX pmap for data-parallel computation across devices.

  • callback_frequency (int, default=10) – Call progress callback every N iterations (if callback provided).

  • enable_multistart (bool, default=False) – Enable multi-start optimization with tournament selection during Phase 1. When enabled, generates multiple starting points using LHS sampling and uses tournament elimination to select the best candidate for Phase 2.

  • n_starts (int, default=10) – Number of starting points for multi-start optimization. Only used when enable_multistart=True.

  • multistart_sampler (str, default='lhs') – Sampling method for generating starting points. Options: ‘lhs’ (Latin Hypercube), ‘sobol’, ‘halton’.

  • elimination_rounds (int, default=3) – Number of tournament elimination rounds. Each round eliminates elimination_fraction of candidates.

  • elimination_fraction (float, default=0.5) – Fraction of candidates to eliminate per round. Must be in (0, 1). Default 0.5 = eliminate half each round.

  • batches_per_round (int, default=50) – Number of data batches to use for evaluation in each tournament round. More batches = more reliable selection but slower.

Examples

Default configuration:

>>> from nlsq import HybridStreamingConfig
>>> config = HybridStreamingConfig()
>>> config.warmup_iterations
200

Aggressive profile (faster convergence with L-BFGS):

>>> config = HybridStreamingConfig.aggressive()
>>> config.warmup_iterations
50

Conservative profile (higher quality):

>>> config = HybridStreamingConfig.conservative()
>>> config.gauss_newton_tol < 1e-8
True

Memory-optimized profile:

>>> config = HybridStreamingConfig.memory_optimized()
>>> config.chunk_size < 10000
True

Custom configuration:

>>> config = HybridStreamingConfig(
...     warmup_iterations=50,
...     lbfgs_history_size=15,
...     chunk_size=5000,
... )

With multi-start tournament selection:

>>> config = HybridStreamingConfig(
...     enable_multistart=True,
...     n_starts=20,
...     elimination_rounds=3,
...     batches_per_round=50,
... )

See also

AdaptiveHybridStreamingOptimizer

Optimizer that uses this configuration

curve_fit

High-level interface with method=’hybrid_streaming’

TournamentSelector

Tournament selection for multi-start optimization

Notes

Based on Adaptive Hybrid Streaming Optimizer specification: agent-os/specs/2025-12-18-adaptive-hybrid-streaming-optimizer/spec.md

L-BFGS replaces Adam for warmup, providing 5-10x faster convergence to the basin of attraction through approximate Hessian information.

normalize: bool
normalization_strategy: str
warmup_iterations: int
max_warmup_iterations: int
warmup_learning_rate: float
loss_plateau_threshold: float
gradient_norm_threshold: float
active_switching_criteria: list[str] | None
lbfgs_history_size: int
lbfgs_initial_step_size: float
lbfgs_line_search: Literal['wolfe', 'strong_wolfe', 'backtracking']
lbfgs_exploration_step_size: float
lbfgs_refinement_step_size: float
use_learning_rate_schedule: bool
lr_schedule_warmup_steps: int
lr_schedule_decay_steps: int
lr_schedule_end_value: float
gradient_clip_value: float | None
enable_warm_start_detection: bool
warm_start_threshold: float
enable_adaptive_warmup_lr: bool
warmup_lr_refinement: float
warmup_lr_careful: float
enable_cost_guard: bool
cost_increase_tolerance: float
enable_step_clipping: bool
max_warmup_step_size: float
gauss_newton_max_iterations: int
gauss_newton_tol: float
trust_region_initial: float
regularization_factor: float
cg_max_iterations: int
cg_relative_tolerance: float
cg_absolute_tolerance: float
cg_param_threshold: int
enable_group_variance_regularization: bool
group_variance_lambda: float
group_variance_indices: list[tuple[int, int]] | None
enable_residual_weighting: bool
residual_weights: list[float] | None
chunk_size: int
gc_chunk_interval: int
loop_strategy: Literal['auto', 'scan', 'loop']
enable_checkpoints: bool
checkpoint_frequency: int
checkpoint_dir: str | None
resume_from_checkpoint: str | None
validate_numerics: bool
enable_fault_tolerance: bool
max_retries_per_batch: int
min_success_rate: float
enable_multi_device: bool
callback_frequency: int
verbose: int
log_frequency: int
enable_multistart: bool
n_starts: int
multistart_sampler: Literal['lhs', 'sobol', 'halton']
elimination_rounds: int
elimination_fraction: float
batches_per_round: int
center_on_p0: bool
scale_factor: float
__post_init__()[source]

Validate configuration after initialization.

Delegates to specialized validator functions for each configuration group. ConfigValidationError from validators is re-raised as ValueError for backwards compatibility.

classmethod aggressive()[source]

Create aggressive profile: faster convergence with L-BFGS, looser tolerances.

This preset prioritizes speed over robustness: - L-BFGS warmup with reduced iterations (50 vs 300 with Adam) - Higher learning rate for faster progress - Looser tolerances for earlier Phase 2 switching - Larger chunks for better throughput

Returns:

Configuration with aggressive settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.aggressive()
>>> config.warmup_learning_rate
0.003
>>> config.warmup_iterations
50
classmethod conservative()[source]

Create conservative profile: slower but robust, tighter tolerances.

This preset prioritizes solution quality over speed: - L-BFGS warmup with conservative iterations - Lower learning rate for stability - Tighter tolerances for higher quality - More Gauss-Newton iterations

Returns:

Configuration with conservative settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.conservative()
>>> config.gauss_newton_tol
1e-10
>>> config.warmup_iterations
30
classmethod memory_optimized()[source]

Create memory-optimized profile: smaller chunks, efficient settings.

This preset minimizes memory footprint: - Smaller chunks to reduce memory usage - L-BFGS warmup with reduced iterations - Enable checkpoints for recovery (important when memory is tight) - Lower CG threshold for more aggressive CG usage (avoids O(p^2) J^T J)

Returns:

Configuration with memory-optimized settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.memory_optimized()
>>> config.chunk_size
5000
>>> config.warmup_iterations
40
>>> config.cg_param_threshold
1000
classmethod with_multistart(n_starts=10, **kwargs)[source]

Create configuration with multi-start tournament selection enabled.

This preset enables multi-start optimization for finding global optima: - Tournament selection during Phase 1 warmup - LHS sampling for generating starting points - Progressive elimination to select best candidate

Parameters:
  • n_starts (int, default=10) – Number of starting points to generate.

  • **kwargs – Additional configuration parameters to override.

Returns:

Configuration with multi-start enabled.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.with_multistart(n_starts=20)
>>> config.enable_multistart
True
>>> config.n_starts
20
classmethod defense_strict()[source]

Create strict defense layer profile for near-optimal scenarios.

This preset maximizes protection against divergence when initial parameters are expected to be close to optimal (warm starts, refinement): - Very low warm start threshold (triggers at 1% relative loss) - Ultra-conservative learning rates for refinement - Very tight cost guard tolerance (5% increase aborts) - Very small step clipping for stability

Use this when: - Continuing optimization from a previous fit - Refining parameters that are already close to optimal - Dealing with ill-conditioned problems - Prioritizing stability over speed

Returns:

Configuration with strict defense layer settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_strict()
>>> config.warm_start_threshold
0.01
>>> config.cost_increase_tolerance
0.05
>>> config.warmup_iterations
25
classmethod defense_relaxed()[source]

Create relaxed defense layer profile for exploration-heavy scenarios.

This preset reduces defense layer sensitivity for problems where significant parameter exploration is needed: - Higher warm start threshold (50% relative loss needed to skip) - More aggressive learning rates for exploration - Generous cost guard tolerance (50% increase allowed) - Larger step clipping for faster exploration

Use this when: - Starting from a rough initial guess - Exploring a wide parameter space - Problems with multiple local minima - Speed is more important than robustness

Returns:

Configuration with relaxed defense layer settings.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_relaxed()
>>> config.warm_start_threshold
0.5
>>> config.cost_increase_tolerance
0.5
>>> config.warmup_iterations
50
classmethod defense_disabled()[source]

Create profile with all defense layers disabled.

This preset completely disables the 4-layer defense strategy, reverting to pre-0.3.6 behavior. Use with caution as this removes protection against warmup divergence.

Use this when: - Debugging to isolate defense layer effects - Benchmarking without defense overhead - Backward compatibility with older code is required

Returns:

Configuration with all defense layers disabled.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.defense_disabled()
>>> config.enable_warm_start_detection
False
classmethod scientific_default()[source]

Create profile optimized for scientific computing workflows.

This preset is tuned for scientific fitting scenarios like spectroscopy, decay curves, and other physics-based models: - Balanced defense layers that protect without being too aggressive - L-BFGS warmup with moderate iterations - Enabled checkpoints for long-running fits

Use this when: - Fitting physics-based models (spectroscopy, decay curves) - Numerical precision is important - Parameters may have multiple scales - Reproducibility is required

Returns:

Configuration optimized for scientific computing.

Return type:

HybridStreamingConfig

Examples

>>> config = HybridStreamingConfig.scientific_default()
>>> config.warmup_iterations
35
__init__(normalize=True, normalization_strategy='auto', warmup_iterations=200, max_warmup_iterations=500, warmup_learning_rate=0.001, loss_plateau_threshold=0.0001, gradient_norm_threshold=0.001, active_switching_criteria=None, lbfgs_history_size=10, lbfgs_initial_step_size=0.1, lbfgs_line_search='wolfe', lbfgs_exploration_step_size=0.1, lbfgs_refinement_step_size=1.0, use_learning_rate_schedule=False, lr_schedule_warmup_steps=50, lr_schedule_decay_steps=450, lr_schedule_end_value=0.0001, gradient_clip_value=None, enable_warm_start_detection=True, warm_start_threshold=0.01, enable_adaptive_warmup_lr=True, warmup_lr_refinement=1e-06, warmup_lr_careful=1e-05, enable_cost_guard=True, cost_increase_tolerance=0.05, enable_step_clipping=True, max_warmup_step_size=0.1, gauss_newton_max_iterations=100, gauss_newton_tol=1e-08, trust_region_initial=1.0, regularization_factor=1e-10, cg_max_iterations=100, cg_relative_tolerance=0.0001, cg_absolute_tolerance=1e-10, cg_param_threshold=2000, enable_group_variance_regularization=False, group_variance_lambda=0.01, group_variance_indices=None, enable_residual_weighting=False, residual_weights=None, chunk_size=10000, gc_chunk_interval=10, loop_strategy='auto', enable_checkpoints=True, checkpoint_frequency=100, checkpoint_dir=None, resume_from_checkpoint=None, validate_numerics=True, enable_fault_tolerance=True, max_retries_per_batch=2, min_success_rate=0.5, enable_multi_device=False, callback_frequency=10, verbose=1, log_frequency=1, enable_multistart=False, n_starts=10, multistart_sampler='lhs', elimination_rounds=3, elimination_fraction=0.5, batches_per_round=50, center_on_p0=True, scale_factor=1.0)

Configuration Presets

The HybridStreamingConfig class provides factory methods for common use cases.

Performance Profiles

Aggressive Profile

Fast convergence, more warmup, looser tolerances:

from nlsq import HybridStreamingConfig

config = HybridStreamingConfig.aggressive()
# Larger warmup: 300-800 iterations
# Larger initial step size: 0.5
# Larger chunks: 20000
# Looser tolerances

Conservative Profile

Slower but robust, tighter tolerances:

config = HybridStreamingConfig.conservative()
# Smaller warmup: 100-300 iterations
# Smaller initial step size: 0.05
# Tighter tolerance: 1e-10
# Smaller trust region: 0.5

Memory-Optimized Profile

Minimizes memory footprint:

config = HybridStreamingConfig.memory_optimized()
# Smaller chunks: 5000
# float32 precision
# Frequent checkpoints: every 50 iterations

Defense Layer Sensitivity Presets

New in version 0.3.6

Defense Strict

Maximum protection for near-optimal scenarios (warm starts, refinement):

config = HybridStreamingConfig.defense_strict()
# Very low warm start threshold (1%)
# Ultra-conservative step sizes
# Tight cost guard tolerance (5%)
# Very small step clipping (0.05)

Use when:

  • Continuing from previous fit

  • Refining near-optimal parameters

  • Ill-conditioned problems

  • Prioritizing stability over speed

Defense Relaxed

Relaxed protection for exploration-heavy scenarios:

config = HybridStreamingConfig.defense_relaxed()
# High warm start threshold (50%)
# Aggressive step sizes
# Generous cost guard tolerance (50%)
# Larger step clipping (0.5)

Use when:

  • Starting from rough initial guess

  • Exploring wide parameter space

  • Problems with multiple local minima

  • Speed more important than robustness

Defense Disabled

Disable all defense layers (reverts to pre-0.3.6 behavior):

config = HybridStreamingConfig.defense_disabled()

Warning

Use with caution! Removes protection against warmup divergence.

Use when:

  • Debugging to isolate defense layer effects

  • Benchmarking without defense overhead

  • Backward compatibility required

Scientific Default

Optimized for scientific computing workflows (XPCS, scattering, spectroscopy):

config = HybridStreamingConfig.scientific_default()
# Balanced defense layers
# Float64 precision
# Tight Gauss-Newton tolerances (1e-10)
# Enabled checkpoints

Use when:

  • Fitting physics-based models

  • Numerical precision is critical

  • Parameters span multiple scales

  • Reproducibility required

Usage Examples

Default Configuration

Create an optimizer with default settings:

from nlsq import HybridStreamingConfig, AdaptiveHybridStreamingOptimizer

config = HybridStreamingConfig()
optimizer = AdaptiveHybridStreamingOptimizer(config)

Custom Configuration

Fine-tune specific parameters:

config = HybridStreamingConfig(
    # Normalization
    normalize=True,
    normalization_strategy="bounds",  # 'auto', 'bounds', 'p0', 'none'
    # Phase 1: L-BFGS warmup
    warmup_iterations=300,
    max_warmup_iterations=800,
    lbfgs_history_size=15,
    lbfgs_initial_step_size=0.5,
    lbfgs_line_search="backtracking",
    lbfgs_exploration_step_size=0.1,
    lbfgs_refinement_step_size=1.0,
    loss_plateau_threshold=5e-4,
    gradient_norm_threshold=5e-3,
    # Phase 2: Gauss-Newton
    gauss_newton_max_iterations=150,
    gauss_newton_tol=1e-9,
    trust_region_initial=0.5,
    regularization_factor=1e-8,
    # Streaming
    chunk_size=20000,
    # Fault tolerance
    enable_checkpoints=True,
    checkpoint_frequency=50,
    validate_numerics=True,
    # Precision
    precision="float64",  # 'auto', 'float32', 'float64'
)

Normalization Strategies

Configure how parameters are normalized:

# Auto-detect: use bounds if provided, else p0-based
config = HybridStreamingConfig(normalization_strategy="auto")

# Normalize to [0, 1] using parameter bounds
config = HybridStreamingConfig(normalization_strategy="bounds")

# Scale by initial parameter magnitudes
config = HybridStreamingConfig(normalization_strategy="p0")

# No normalization (identity transform)
config = HybridStreamingConfig(normalization_strategy="none")

Switching Criteria

Control when Phase 1 switches to Phase 2:

config = HybridStreamingConfig(
    # Any of these criteria can trigger switch
    active_switching_criteria=["plateau", "gradient", "max_iter"],
    # Loss plateau detection threshold
    loss_plateau_threshold=1e-4,
    # Gradient norm threshold
    gradient_norm_threshold=1e-3,
    # Maximum warmup iterations
    max_warmup_iterations=500,
)

L-BFGS Options

Configure L-BFGS behavior and line search:

config = HybridStreamingConfig(
    lbfgs_history_size=10,
    lbfgs_initial_step_size=0.1,
    lbfgs_line_search="wolfe",  # "wolfe", "strong_wolfe", "backtracking"
    lbfgs_exploration_step_size=0.1,
    lbfgs_refinement_step_size=1.0,
    gradient_clip_value=1.0,
)

Configuration Parameters

Phase 0: Normalization

Parameter

Default

Description

normalize

True

Enable parameter normalization

normalization_strategy

'auto'

Strategy: ‘auto’, ‘bounds’, ‘p0’, ‘none’

Phase 1: L-BFGS Warmup

Parameter

Default

Description

warmup_iterations

200

Initial warmup iterations before checking switch

max_warmup_iterations

500

Maximum warmup before forced switch

lbfgs_history_size

10

L-BFGS history size

lbfgs_initial_step_size

0.1

Initial step size for L-BFGS line search

lbfgs_line_search

'wolfe'

Line search strategy (‘wolfe’, ‘strong_wolfe’, ‘backtracking’)

lbfgs_exploration_step_size

0.1

Step size for exploration mode

lbfgs_refinement_step_size

1.0

Step size for refinement mode

loss_plateau_threshold

1e-4

Relative loss improvement for plateau detection

gradient_norm_threshold

1e-3

Gradient norm for early switch

4-Layer Defense Strategy (New in 0.3.6)

Parameter

Default

Description

Layer 1: Warm Start Detection

enable_warm_start_detection

True

Enable/disable warm start detection

warm_start_threshold

0.01

Relative loss threshold (skip if < threshold)

Layer 2: Adaptive Step Size

enable_adaptive_warmup_lr

True

Enable/disable adaptive step size selection

warmup_lr_refinement

1e-6

Step size for excellent fits (relative_loss < 0.1)

warmup_lr_careful

1e-5

Step size for good fits (0.1 ≤ relative_loss < 1.0)

Layer 3: Cost-Increase Guard

enable_cost_guard

True

Enable/disable cost-increase guard

cost_increase_tolerance

0.05

Max allowed loss increase (5%)

Layer 4: Step Clipping

enable_step_clipping

True

Enable/disable step clipping

max_warmup_step_size

0.1

Maximum L2 norm of parameter update

See also

How Curve Fitting Works for complete optimization strategy documentation.

Phase 2: Gauss-Newton

Parameter

Default

Description

gauss_newton_max_iterations

100

Maximum Gauss-Newton iterations

gauss_newton_tol

1e-8

Convergence tolerance

trust_region_initial

1.0

Initial trust region radius

regularization_factor

1e-10

Regularization for rank-deficient matrices

See Also