Workflow System Overview
========================

.. versionchanged:: 0.6.3
   The workflow system was simplified from 9 presets to 3 smart workflows:
   ``auto``, ``auto_global``, and ``hpc``. The system now automatically selects
   the optimal strategy based on memory constraints and problem characteristics.

NLSQ provides automatic workflow selection based on memory constraints and dataset
characteristics. The system analyzes available memory and data size to choose the
optimal fitting strategy, preventing out-of-memory errors while maximizing performance.

The Three Workflows
-------------------

NLSQ v0.6.3 provides three workflows that cover all use cases:

.. list-table::
   :header-rows: 1
   :widths: 15 35 25 25

   * - Workflow
     - Description
     - Bounds
     - Use Case
   * - ``auto``
     - Memory-aware local optimization
     - Optional
     - **Default**. Standard curve fitting.
   * - ``auto_global``
     - Memory-aware global optimization
     - Required
     - Multi-modal problems, unknown initial guess.
   * - ``hpc``
     - ``auto_global`` + checkpointing
     - Required
     - Long-running HPC jobs.

workflow="auto" (Default)
~~~~~~~~~~~~~~~~~~~~~~~~~

The default workflow for local optimization. It automatically selects the
best memory strategy based on your data size:

.. code-block:: python

   from nlsq import fit
   import jax.numpy as jnp


   def model(x, a, b, c):
       return a * jnp.exp(-b * x) + c


   # Default: workflow="auto"
   result = fit(model, x, y, p0=[1.0, 0.5, 0.1])

   # Explicit workflow selection
   result = fit(model, x, y, p0=[1.0, 0.5, 0.1], workflow="auto")

   # With optional bounds (constrains solution to valid range)
   result = fit(
       model, x, y, p0=[1.0, 0.5, 0.1], workflow="auto", bounds=([0, 0, -1], [10, 5, 1])
   )

workflow="auto_global"
~~~~~~~~~~~~~~~~~~~~~~

For problems with multiple local minima or unknown initial guesses. Requires
bounds to define the search space.

The system automatically selects between:

- **CMA-ES**: When parameter scale ratio > 1000 (wide bounds relative to typical values)
- **Multi-Start**: Otherwise, using Latin Hypercube Sampling

.. code-block:: python

   from nlsq import fit

   # Global optimization with automatic method selection
   result = fit(
       model,
       x,
       y,
       p0=[1.0, 0.5, 0.1],
       workflow="auto_global",
       bounds=([0, 0, 0], [10, 5, 1]),
       n_starts=10,  # For multi-start (default: 10)
   )

workflow="hpc"
~~~~~~~~~~~~~~

For long-running jobs on HPC clusters. Wraps ``auto_global`` with automatic
checkpointing for crash recovery.

.. code-block:: python

   from nlsq import fit

   result = fit(
       model,
       x,
       y,
       p0=[1.0, 0.5, 0.1],
       workflow="hpc",
       bounds=([0, 0, 0], [10, 5, 1]),
       checkpoint_dir="/scratch/my_job/checkpoints",
       checkpoint_interval=10,  # Save every 10 generations/starts
   )

Memory Strategy Selection
-------------------------

Both ``auto`` and ``auto_global`` workflows use the ``MemoryBudgetSelector``
to choose the optimal memory strategy. The selector uses 75% of available
RAM as the threshold.

.. code-block:: text

   ┌─────────────────────────────────────────────────────────────────┐
   │                    MEMORY BUDGET COMPUTATION                     │
   ├─────────────────────────────────────────────────────────────────┤
   │ available_gb = psutil.virtual_memory().available / 1e9          │
   │ threshold_gb = available_gb × 0.75  (safety factor)             │
   │                                                                  │
   │ # Memory estimates (float64 = 8 bytes)                          │
   │ data_gb     = n_points × 2 × 8 / 1e9  (x + y)                   │
   │ jacobian_gb = n_points × n_params × 8 / 1e9                     │
   │ peak_gb     = data_gb + 1.3 × jacobian_gb + solver_overhead     │
   └─────────────────────────────────────────────────────────────────┘
                               │
                               ▼
               ┌───────────────────────────────┐
               │     data_gb > threshold_gb ?  │
               └───────────────────────────────┘
                       │ YES              │ NO
                       ▼                  ▼
          ┌──────────────────┐    ┌───────────────────────────┐
          │ STREAMING        │    │ peak_gb > threshold_gb?   │
          │ HybridStreaming  │    └───────────────────────────┘
          │ with adaptive    │          │ YES           │ NO
          │ batch_size       │          ▼               ▼
          └──────────────────┘   ┌─────────────┐  ┌─────────────┐
                                 │ CHUNKED     │  │ STANDARD    │
                                 │ LDMemory    │  │ Direct TRF  │
                                 │ with auto   │  │ curve_fit() │
                                 │ chunk_size  │  └─────────────┘
                                 └─────────────┘

Strategy × Method Matrix
~~~~~~~~~~~~~~~~~~~~~~~~

The ``auto_global`` workflow produces 6 combinations:

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Memory Strategy
     - Multi-Start
     - CMA-ES
   * - **standard**
     - MultiStartOrchestrator + n_starts × TRF
     - CMAESOptimizer + BIPOP + TRF refine
   * - **chunked**
     - LargeDatasetFitter + multi-start
     - CMAESOptimizer + data_chunk_size
   * - **streaming**
     - AdaptiveHybridStreaming + multi-start
     - CMAESOptimizer + data streaming

Method Selection (CMA-ES vs Multi-Start)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``MethodSelector`` chooses between CMA-ES and Multi-Start based on
parameter scale ratio:

.. code-block:: python

   from nlsq.global_optimization.method_selector import MethodSelector

   selector = MethodSelector()
   method = selector.select("auto", lower_bounds, upper_bounds)
   # Returns "cmaes" or "multi-start"

- **CMA-ES**: Selected when ``scale_ratio > 1000`` AND ``evosax`` is available
- **Multi-Start**: Selected otherwise

The scale ratio is computed as:

.. code-block:: python

   scale_ratio = max(upper - lower) / min(upper - lower)

Memory Override
~~~~~~~~~~~~~~~

You can override automatic memory detection:

.. code-block:: python

   # Force smaller memory footprint
   result = fit(
       model,
       x,
       y,
       p0=[1, 2],
       workflow="auto",
       memory_limit_gb=4.0,  # Pretend only 4GB available
   )

Tolerance Configuration
-----------------------

Tolerances are set directly, not via presets:

.. code-block:: python

   # Fast fitting with looser tolerances
   result = fit(model, x, y, p0=[1, 2], gtol=1e-6, ftol=1e-6, xtol=1e-6)

   # High precision fitting
   result = fit(model, x, y, p0=[1, 2], gtol=1e-10, ftol=1e-10, xtol=1e-10)

Migration from Old Presets
--------------------------

.. versionchanged:: 0.6.3
   The following presets were removed. Using them will raise ``ValueError``
   with a migration hint.

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Old Preset
     - New Equivalent
   * - ``standard``
     - ``workflow="auto"``
   * - ``fast``
     - ``workflow="auto", gtol=1e-6, ftol=1e-6, xtol=1e-6``
   * - ``quality``
     - ``workflow="auto_global", n_starts=20``
   * - ``large_robust``
     - ``workflow="auto"`` (auto-detects large data)
   * - ``streaming``
     - ``workflow="auto"`` (auto-detects memory pressure)
   * - ``hpc_distributed``
     - ``workflow="hpc"``
   * - ``cmaes``
     - ``workflow="auto_global"`` (auto-selects CMA-ES)
   * - ``cmaes-global``
     - ``workflow="auto_global", cmaes_config=CMAESConfig(n_generations=200)``
   * - ``global_auto``
     - ``workflow="auto_global"``

4-Layer Defense Strategy
------------------------

All workflows using ``hybrid_streaming`` or ``AdaptiveHybridStreamingOptimizer``
include a 4-layer defense against L-BFGS warmup divergence. This is particularly
important for **warm-start refinement** scenarios where initial parameters are
already near optimal.

The layers activate automatically:

1. **Warm Start Detection**: Skips warmup if initial loss < 1% of data variance
2. **Adaptive Step Size**: Scales step size based on fit quality (1e-6 to 0.001)
3. **Cost-Increase Guard**: Aborts if loss increases > 5%
4. **Step Clipping**: Limits parameter update magnitude (max norm 0.1)

Where to go next
----------------

- API reference: :doc:`../api/nlsq.workflow`
- Configuration options: :doc:`../reference/configuration`
- Common workflow patterns: :doc:`../howto/common_workflows`
- Large dataset handling: :doc:`../howto/handle_large_data`