nlsq.large\_dataset module
============================

.. currentmodule:: nlsq.streaming.large_dataset

.. automodule:: nlsq.streaming.large_dataset
   :noindex:

Overview
--------

The ``nlsq.large_dataset`` module provides specialized tools for fitting curves to datasets
that are too large to fit in memory or require chunking for efficient processing. This module
is essential for working with datasets containing millions or billions of points.

Key Features
------------

- **Automatic dataset handling** for 100M+ points
- **Intelligent chunking** with <1% error for well-conditioned problems
- **Memory estimation** and automatic memory management
- **Streaming optimization** for unlimited-size datasets
- **Progress reporting** for long-running fits

Classes
-------

.. autosummary::
   :toctree: generated/
   :template: class.rst

   LargeDatasetFitter

.. autoclass:: LDMemoryConfig
   :members:
   :noindex:

Functions
---------

.. autosummary::
   :toctree: generated/

   fit_large_dataset
   estimate_memory_requirements

Examples
--------

Basic Usage with curve_fit_large
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from nlsq import curve_fit_large, estimate_memory_requirements
    import jax.numpy as jnp
    import numpy as np

    # Check memory requirements
    n_points = 50_000_000  # 50 million points
    n_params = 3
    stats = estimate_memory_requirements(n_points, n_params)
    print(f"Memory required: {stats.total_memory_estimate_gb:.2f} GB")

    # Generate large dataset
    x = np.linspace(0, 10, n_points)
    y = 2.0 * np.exp(-0.5 * x) + 0.3 + np.random.normal(0, 0.05, n_points)


    # Define fit function
    def exponential(x, a, b, c):
        return a * jnp.exp(-b * x) + c


    # Fit with automatic chunking
    popt, pcov = curve_fit_large(
        exponential,
        x,
        y,
        p0=[2.5, 0.6, 0.2],
        memory_limit_gb=4.0,
        show_progress=True,
    )

Advanced Usage with LargeDatasetFitter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from nlsq import LargeDatasetFitter, LDMemoryConfig
    import jax.numpy as jnp

    # Configure memory management
    config = LDMemoryConfig(
        memory_limit_gb=4.0,
        min_chunk_size=10000,
        max_chunk_size=1000000,
        min_success_rate=0.8,  # Require 80% of chunks to succeed
    )

    # Create fitter
    fitter = LargeDatasetFitter(config=config)

    # Fit with progress tracking
    result = fitter.fit_with_progress(exponential, x, y, p0=[2.5, 0.6, 0.2])

    print(f"Fitted parameters: {result.popt}")
    # Note: success_rate and n_chunks only available for multi-chunk fits

Adaptive Hybrid Streaming
~~~~~~~~~~~~~~~~~~~~~~~~~

For datasets that don't fit in memory, use adaptive hybrid streaming:

.. code-block:: python

    from nlsq import AdaptiveHybridStreamingOptimizer, HybridStreamingConfig

    config = HybridStreamingConfig(chunk_size=10000, gauss_newton_max_iterations=100)
    optimizer = AdaptiveHybridStreamingOptimizer(config)

    result = optimizer.fit((x, y), func, p0=p0)

Sparse Jacobian Optimization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For problems with sparse structure:

.. code-block:: python

    from nlsq import SparseJacobianComputer

    # Detect and exploit sparsity
    sparse_computer = SparseJacobianComputer(sparsity_threshold=0.01)
    pattern, sparsity = sparse_computer.detect_sparsity_pattern(func, p0, x_sample)

    if sparsity > 0.1:  # If more than 10% sparse
        print(f"Jacobian is {sparsity:.1%} sparse")

See Also
--------

- :doc:`../howto/handle_large_data` : Large dataset guide
- :doc:`nlsq.memory_manager` : Memory management utilities
- :doc:`nlsq.adaptive_hybrid_streaming` : Adaptive hybrid streaming details