nlsq.core.orchestration

Orchestration components for CurveFit decomposition.

This package contains extracted components from the CurveFit God class, each handling a single responsibility:

  • DataPreprocessor: Input validation, array conversion, data padding

  • OptimizationSelector: Method selection, bounds preparation, initial guess

  • CovarianceComputer: Covariance via SVD, sigma transformation

  • StreamingCoordinator: Memory analysis, streaming strategy selection

These components are used internally by CurveFit and controlled via feature flags.

Submodules

nlsq.core.orchestration.data_preprocessor

DataPreprocessor component for CurveFit decomposition.

Handles input validation, array conversion, data masking, and padding for curve fitting operations. This component is extracted from the CurveFit class as part of the God class decomposition.

Reference: specs/017-curve-fit-decomposition/spec.md FR-001

class nlsq.core.orchestration.data_preprocessor.DataPreprocessor[source]

Bases: object

Preprocessor for curve fitting input data.

Handles: 1. Input validation (type checking, finiteness) 2. Array conversion (numpy/list to JAX) 3. Length consistency checking 4. Data masking for invalid points 5. NaN/Inf handling via nan_policy

Example

>>> preprocessor = DataPreprocessor()
>>> data = preprocessor.preprocess(
...     f=my_model,
...     xdata=x_values,
...     ydata=y_values,
...     sigma=uncertainties,
...     check_finite=True,
... )
>>> print(f"Valid points: {data.n_points}")
preprocess(f, xdata, ydata, *, sigma=None, absolute_sigma=False, check_finite=True, nan_policy='raise', stability_check=False)[source]

Validate and preprocess input data for curve fitting.

Parameters:
  • f (Callable[..., ArrayLike]) – Model function to fit (used for parameter count detection)

  • xdata (ArrayLike) – Independent variable data

  • ydata (ArrayLike) – Dependent variable data (observations)

  • sigma (ArrayLike | None) – Uncertainty/weights for observations

  • absolute_sigma (bool) – If True, sigma is absolute; else relative

  • check_finite (bool) – If True, raise on NaN/Inf values

  • nan_policy (str) – How to handle NaN: ‘raise’, ‘omit’, or ‘propagate’

  • stability_check (bool) – If True, run additional stability checks

Returns:

PreprocessedData with validated, converted arrays

Raises:
  • ValueError – If inputs are invalid (wrong shape, non-finite, etc.)

  • TypeError – If inputs have wrong types

Return type:

PreprocessedData

validate_sigma(sigma, ydata_shape)[source]

Validate and convert sigma to appropriate format.

Public interface matching DataPreprocessorProtocol.

Parameters:
  • sigma (ArrayLike | None) – Input sigma (1D for diagonal, 2D for full covariance)

  • ydata_shape (tuple[int, ...]) – Shape of ydata for compatibility check

Returns:

Validated numpy array or None

Raises:

ValueError – If sigma shape is incompatible with ydata

Return type:

np.ndarray | None

nlsq.core.orchestration.optimization_selector

OptimizationSelector component for CurveFit decomposition.

Handles parameter detection, method selection, bounds preparation, and solver configuration for curve fitting operations.

Reference: specs/017-curve-fit-decomposition/spec.md FR-002

nlsq.core.orchestration.optimization_selector.prepare_bounds(bounds, n)[source]

Prepare bounds for optimization.

Parameters:
  • bounds (tuple | None) – Tuple of (lower, upper) bounds or None for unbounded

  • n (int) – Number of parameters

Returns:

Tuple of (lower_bounds, upper_bounds) arrays

Return type:

tuple[ndarray, ndarray]

class nlsq.core.orchestration.optimization_selector.OptimizationSelector[source]

Bases: object

Selector for optimization method and configuration.

Handles: 1. Parameter count detection from function signature 2. Method selection based on bounds and problem type 3. Bounds validation and preparation 4. Initial guess generation if not provided 5. Solver configuration validation

Example

>>> selector = OptimizationSelector()
>>> config = selector.select(
...     f=my_model,
...     xdata=x_values,
...     ydata=y_values,
...     bounds=([0, 0], [10, 10]),
... )
>>> print(f"Method: {config.method}, Params: {config.n_params}")
select(f, xdata, ydata, *, p0=None, bounds=None, method=None, jac=None, tr_solver=None, x_scale=1.0, ftol=1e-08, xtol=1e-08, gtol=1e-08, max_nfev=None)[source]

Select optimization method and prepare configuration.

Parameters:
  • f (Callable[..., ArrayLike]) – Model function to fit

  • xdata (ArrayLike) – Independent variable data

  • ydata (ArrayLike) – Dependent variable data

  • p0 (ArrayLike | None) – Initial parameter guess (auto-detected if None)

  • bounds (tuple[ArrayLike, ArrayLike] | None) – Parameter bounds as (lower, upper)

  • method (str | None) – Optimization method (‘trf’, ‘lm’, ‘dogbox’, or None for auto)

  • jac (str | Callable | None) – Jacobian computation method

  • tr_solver (str | None) – Trust region solver (‘exact’, ‘lsmr’, or None for auto)

  • x_scale (ArrayLike | str | float) – Parameter scaling

  • ftol (float) – Function tolerance

  • xtol (float) – Parameter tolerance

  • gtol (float) – Gradient tolerance

  • max_nfev (int | None) – Maximum function evaluations (auto if None)

Returns:

OptimizationConfig with all settings resolved

Raises:

ValueError – If configuration is invalid

Return type:

OptimizationConfig

detect_parameter_count(f, xdata)[source]

Detect number of parameters from function signature.

Uses inspection of function signature to determine parameter count.

Parameters:
  • f (Callable[..., ArrayLike]) – Model function to analyze

  • xdata (ArrayLike) – Sample data (not used currently, for future probing)

Returns:

Number of parameters (excluding x)

Raises:

ValueError – If parameter count cannot be determined

Return type:

int

auto_initial_guess(n_params, bounds)[source]

Generate automatic initial parameter guess.

Uses bounds midpoint if available, otherwise ones.

Parameters:
Returns:

Initial guess array of shape (n_params,)

Return type:

jax.Array

nlsq.core.orchestration.covariance_computer

CovarianceComputer component for CurveFit decomposition.

Handles covariance matrix computation via SVD, sigma transformation, and condition number estimation.

Reference: specs/017-curve-fit-decomposition/spec.md FR-003

class nlsq.core.orchestration.covariance_computer.CovarianceComputer[source]

Bases: object

Computer for parameter covariance from optimization results.

Handles: 1. Jacobian-based covariance via SVD 2. Sigma transformation (1D and 2D) 3. Absolute vs relative sigma handling 4. Singularity detection and handling

Example

>>> computer = CovarianceComputer()
>>> result = computer.compute(
...     result=optimize_result,
...     n_data=100,
...     sigma=uncertainties,
...     absolute_sigma=True,
... )
>>> print(f"Parameter errors: {result.perr}")
__init__()[source]

Initialize CovarianceComputer with JIT-compiled functions.

compute(result, n_data, *, sigma=None, absolute_sigma=False, full_output=False)[source]

Compute parameter covariance from optimization result.

Uses the Jacobian at the solution to compute covariance via: pcov = (J^T @ J)^(-1) * s_sq

where s_sq is the residual variance.

Parameters:
  • result (OptimizeResult) – OptimizeResult from LeastSquares

  • n_data (int) – Number of data points

  • sigma (jax.Array | None) – Observation uncertainties/weights

  • absolute_sigma (bool) – If True, sigma is absolute uncertainty

  • full_output (bool) – If True, include additional diagnostics

Returns:

CovarianceResult with covariance matrix and metadata

Raises:

ValueError – If Jacobian is unavailable or invalid

Return type:

CovarianceResult

create_sigma_transform(sigma, n_data)[source]

Create sigma transformation function.

Handles both 1D (diagonal) and 2D (full covariance) sigma.

Parameters:
  • sigma (jax.Array) – Sigma array, shape (n,) or (n, n)

  • n_data (int) – Number of data points

Returns:

Tuple of (transform_func, is_2d) - transform_func: Function to apply sigma weighting - is_2d: True if sigma is full covariance matrix

Return type:

tuple[Callable, bool]

compute_condition_number(jacobian)[source]

Compute condition number of Jacobian.

Uses singular values: cond = max(s) / min(s)

Parameters:

jacobian (jax.Array) – Jacobian matrix at solution

Returns:

Condition number (inf if singular)

Return type:

float

setup_sigma_transform(sigma, ydata, data_mask, len_diff, m)[source]

Setup sigma transformation for weighted least squares.

This is the legacy interface matching CurveFit._setup_sigma_transform.

Parameters:
  • sigma (np.ndarray | None) – Uncertainty in ydata (1-D errors or 2-D covariance matrix)

  • ydata (np.ndarray) – Dependent data array

  • data_mask (np.ndarray) – Boolean mask for valid data points

  • len_diff (int) – Difference in length for padding

  • m (int) – Original number of data points

Returns:

Transformation array for sigma or None

Raises:

ValueError – If sigma has incorrect shape or is not positive definite

Return type:

jax.Array | None

nlsq.core.orchestration.streaming_coordinator

StreamingCoordinator component for CurveFit decomposition.

Handles memory analysis, streaming strategy selection, and configuration for large-scale curve fitting operations.

Reference: specs/017-curve-fit-decomposition/spec.md FR-004

class nlsq.core.orchestration.streaming_coordinator.StreamingCoordinator(safety_factor=0.75)[source]

Bases: object

Coordinator for streaming strategy selection.

Handles: 1. Memory estimation for dataset + Jacobian 2. Available memory detection 3. Strategy selection based on memory pressure 4. Configuration of chunked/hybrid strategies

Example

>>> coordinator = StreamingCoordinator()
>>> decision = coordinator.decide(
...     xdata=x_array,
...     ydata=y_array,
...     n_params=5,
... )
>>> if decision.strategy == "hybrid":
...     config = decision.hybrid_config
...     # Use hybrid streaming optimizer
__init__(safety_factor=0.75)[source]

Initialize StreamingCoordinator.

Parameters:

safety_factor (float) – Memory safety factor (0.75 means use 75% of available)

decide(xdata, ydata, n_params, *, workflow='auto', memory_limit_mb=None, force_streaming=False)[source]

Decide on streaming strategy for the dataset.

Analyzes memory requirements and available resources to select the optimal execution strategy.

Parameters:
  • xdata (jax.Array) – Independent variable data

  • ydata (jax.Array) – Dependent variable data

  • n_params (int) – Number of parameters

  • workflow (str) – Workflow hint (‘auto’, ‘streaming’, ‘hybrid’, ‘normal’)

  • memory_limit_mb (float | None) – Override for memory limit detection

  • force_streaming (bool) – If True, always use streaming

Returns:

StreamingDecision with strategy and configuration

Raises:

MemoryError – If dataset too large even for streaming

Return type:

StreamingDecision

estimate_memory(n_data, n_params, dtype_bytes=8)[source]

Estimate memory requirement in MB.

Accounts for: - Data arrays (x, y, residuals) - Jacobian matrix (n_data x n_params) - Working arrays for optimization - JAX compilation overhead

Parameters:
  • n_data (int) – Number of data points

  • n_params (int) – Number of parameters

  • dtype_bytes (int) – Bytes per element (8 for float64)

Returns:

Estimated memory in MB

Return type:

float

get_available_memory()[source]

Get available system memory in MB.

Cached once per coordinator lifetime (one streaming decision per fit).

Returns:

Available memory in MB

Return type:

float

configure_hybrid(n_data, n_params, available_memory_mb)[source]

Configure hybrid streaming for dataset.

Calculates optimal chunk size and strategy parameters.

Parameters:
  • n_data (int) – Number of data points

  • n_params (int) – Number of parameters

  • available_memory_mb (float) – Available memory

Returns:

HybridStreamingConfig for the dataset

Return type:

HybridStreamingConfig