nlsq.validators module
=======================

.. currentmodule:: nlsq.utils.validators

.. automodule:: nlsq.utils.validators
   :noindex:

Overview
--------

The ``nlsq.validators`` module provides comprehensive input validation for curve fitting
and optimization functions. It catches common errors early, provides helpful error messages,
and ensures data quality before expensive optimization operations begin.

**New in version 0.1.1**: Complete validation system with fast mode and extensive checks.

**New in version 0.3.1**: Added security-focused validation (array size limits, bounds checking,
parameter validation).

Key Features
------------

- **Comprehensive input validation** for all curve fitting parameters
- **Security validation** (v0.3.1): Array size limits, bounds numeric range, parameter values
- **Early error detection** with clear, actionable error messages
- **Data quality checks** for outliers, duplicates, and degenerate cases
- **Fast mode** to skip expensive checks for performance-critical code
- **Function signature analysis** to detect parameter mismatches
- **Automatic type conversion** with warnings
- **Bounds validation** including initial guess checking
- **Decorator support** for automatic validation

Classes
-------

.. autosummary::
   :toctree: generated/
   :template: class.rst

   InputValidator

Functions
---------

.. autosummary::
   :toctree: generated/

   validate_inputs

Usage Examples
--------------

Basic Validation for curve_fit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validate inputs before curve fitting:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np

    # Create validator
    validator = InputValidator(fast_mode=False)


    # Define model and data
    def model(x, a, b):
        return a * np.exp(-b * x)


    x = np.linspace(0, 10, 100)
    y = 2.5 * np.exp(-0.5 * x) + np.random.normal(0, 0.1, 100)
    p0 = [2.0, 0.4]

    # Validate inputs
    errors, warnings, x_clean, y_clean = validator.validate_curve_fit_inputs(
        model, x, y, p0=p0, bounds=([0, 0], [10, 10])
    )

    if errors:
        print("Validation errors:")
        for error in errors:
            print(f"  - {error}")
    else:
        print("Validation passed!")

    if warnings:
        print("Warnings:")
        for warning in warnings:
            print(f"  - {warning}")

Fast Mode Validation
~~~~~~~~~~~~~~~~~~~~

Skip expensive checks for performance:

.. code-block:: python

    from nlsq.utils.validators import InputValidator

    # Fast mode skips function callable tests and data quality checks
    fast_validator = InputValidator(fast_mode=True)

    errors, warnings, x_clean, y_clean = fast_validator.validate_curve_fit_inputs(
        model, x, y, p0=p0
    )

    # Much faster, suitable for production code with trusted inputs

Decorator-Based Validation
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Automatically validate function inputs:

.. code-block:: python

    from nlsq.utils.validators import validate_inputs
    import numpy as np


    @validate_inputs(validation_type="curve_fit")
    def my_curve_fit(f, xdata, ydata, p0=None, **kwargs):
        """Custom curve fit with automatic validation."""
        # Inputs are automatically validated before this code runs
        # Invalid inputs raise ValueError with detailed message
        # xdata and ydata are converted to numpy arrays

        # Your fitting logic here
        return optimize(f, xdata, ydata, p0, **kwargs)


    # Use it - validation happens automatically
    try:
        result = my_curve_fit(model, x, y, p0=[1.0, 0.5])
    except ValueError as e:
        print(f"Validation failed: {e}")

Validation for least_squares
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validate least squares inputs:

.. code-block:: python

    from nlsq.utils.validators import InputValidator

    validator = InputValidator()


    # Define residual function
    def residual(params):
        a, b = params
        return y_data - (a * np.exp(-b * x_data))


    x0 = np.array([2.0, 0.5])
    bounds = ([0, 0], [10, 10])

    # Validate
    errors, warnings, x0_clean = validator.validate_least_squares_inputs(
        residual,
        x0,
        bounds=bounds,
        method="trf",
        ftol=1e-8,
        xtol=1e-8,
        gtol=1e-8,
        max_nfev=1000,
    )

    if errors:
        raise ValueError(f"Validation failed: {'; '.join(errors)}")

Detailed Validation Checks
~~~~~~~~~~~~~~~~~~~~~~~~~~~

See what validation detects:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np

    validator = InputValidator(fast_mode=False)

    # Problematic data
    x = np.array([1.0, 2.0, np.nan, 4.0])  # Contains NaN
    y = np.array([1.0, 2.0, 3.0])  # Wrong length
    p0 = [1.0, 2.0, 3.0]  # Wrong number of params


    def model(x, a, b):
        return a * x + b


    errors, warnings, _, _ = validator.validate_curve_fit_inputs(model, x, y, p0=p0)

    # Errors will include:
    # - "xdata contains 1 NaN or Inf values"
    # - "xdata (4 points) and ydata (3 points) must have same length"
    # - "Initial guess p0 has 3 parameters, but function expects 2"

Data Quality Warnings
~~~~~~~~~~~~~~~~~~~~~

Detect potential data quality issues:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np

    validator = InputValidator(fast_mode=False)

    # Data with quality issues
    x = np.array([1, 2, 3, 3, 3, 4, 5, 100])  # Duplicates and outlier
    y = np.array([1, 2, 3, 3.1, 2.9, 4, 5, 200])  # Outlier


    def linear(x, a, b):
        return a * x + b


    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        linear, x, y, p0=[1.0, 0.0]
    )

    # Warnings will include:
    # - "xdata contains 3 duplicate values"
    # - "ydata may contain 1 outliers - consider using robust loss function"

Bounds Validation
~~~~~~~~~~~~~~~~~

Check parameter bounds:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np

    validator = InputValidator()

    x = np.linspace(0, 10, 50)
    y = 2 * x + 1 + np.random.randn(50) * 0.5


    def linear(x, a, b):
        return a * x + b


    # Invalid bounds
    bad_bounds = ([0, 0], [0, 10])  # Lower >= Upper for first param
    p0 = [2.0, 1.0]

    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        linear, x, y, p0=p0, bounds=bad_bounds
    )

    # Errors: "Lower bounds must be less than upper bounds"

    # Initial guess outside bounds
    bounds = ([0, 0], [10, 10])
    p0_out = [-1.0, 1.0]  # First param outside bounds

    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        linear, x, y, p0=p0_out, bounds=bounds
    )

    # Warnings: "Initial guess p0 is outside bounds"

Sigma Validation
~~~~~~~~~~~~~~~~

Validate uncertainty parameters:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np

    validator = InputValidator()

    x = np.linspace(0, 10, 50)
    y = 2 * x + 1 + np.random.randn(50) * 0.5

    # Invalid sigma - wrong shape
    sigma_bad = np.ones(40)  # Wrong length

    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        lambda x, a, b: a * x + b, x, y, p0=[1, 0], sigma=sigma_bad
    )

    # Error: "sigma must have same shape as ydata"

    # Invalid sigma - negative values
    sigma_neg = np.ones(50)
    sigma_neg[10] = -0.5

    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        lambda x, a, b: a * x + b, x, y, p0=[1, 0], sigma=sigma_neg
    )

    # Error: "sigma values must be positive"

Degenerate Data Detection
~~~~~~~~~~~~~~~~~~~~~~~~~~

Detect problematic data patterns:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np

    validator = InputValidator()

    # All x values identical
    x_const = np.ones(100)
    y = np.random.randn(100)

    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        lambda x, a, b: a * x + b, x_const, y, p0=[1, 0]
    )

    # Error: "All x values are identical - cannot fit"

    # Very small range
    x_small_range = np.linspace(1.0, 1.0000000001, 100)  # Range ~ 1e-9

    errors, warnings, _, _ = validator.validate_curve_fit_inputs(
        lambda x, a, b: a * x + b, x_small_range, y, p0=[1, 0]
    )

    # Warning: "x data range is very small (1.00e-09) - consider rescaling"

Tolerance Validation
~~~~~~~~~~~~~~~~~~~~

Validate convergence tolerances:

.. code-block:: python

    from nlsq.utils.validators import InputValidator

    validator = InputValidator()


    def residual(x):
        return x**2


    # Very small tolerances
    errors, warnings, x0_clean = validator.validate_least_squares_inputs(
        residual, x0=[1.0], ftol=1e-16, xtol=1e-17, gtol=1e-8  # Too small  # Too small
    )

    # Warnings:
    # - "ftol=1e-16 is very small, may not converge"
    # - "xtol=1e-17 is very small, may not converge"

Custom Validation Pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Build custom validation logic:

.. code-block:: python

    from nlsq.utils.validators import InputValidator
    import numpy as np


    class CustomValidator(InputValidator):
        """Extended validator with custom checks."""

        def validate_my_data(self, x, y, **kwargs):
            """Custom validation pipeline."""
            # Use parent class methods
            errors, warnings, x, y = self.validate_curve_fit_inputs(
                kwargs["f"], x, y, p0=kwargs.get("p0"), bounds=kwargs.get("bounds")
            )

            # Add custom checks
            if np.std(y) < 0.01:
                warnings.append(
                    "y data has very low variance - may indicate measurement issue"
                )

            if len(x) < 10:
                errors.append("Need at least 10 data points for reliable fit")

            return errors, warnings, x, y


    # Use it
    custom_validator = CustomValidator()
    errors, warnings, x, y = custom_validator.validate_my_data(
        x_data, y_data, f=model_func, p0=initial_guess
    )

Security Validation (v0.3.1)
-----------------------------

The validator includes security-focused checks to prevent resource exhaustion
and detect malformed inputs early:

.. code-block:: python

   from nlsq.utils.validators import InputValidator, validate_security_constraints

   validator = InputValidator()

   # Security checks run automatically and early in the validation pipeline
   errors, warnings, x, y = validator.validate_curve_fit_inputs(model, x, y, p0=p0)

   # Or call directly
   errors, warnings = validate_security_constraints(x, y, n_params=3)

**Array Size Limits**:

- Maximum 10 billion (10^10) data points
- Maximum 100 billion (10^11) Jacobian elements
- Detects integer overflow in size calculations
- Memory estimation warnings (>10GB, >100GB)

.. code-block:: python

   # Example: Excessive data size detection
   x_huge = np.ones(15_000_000_000)  # 15 billion points
   # Error: "Data size exceeds maximum allowed (10B points)"

**Bounds Numeric Range**:

- Warns for extreme bound values (absolute value > 1e100)
- Errors for NaN in bounds

.. code-block:: python

   # Example: Extreme bounds detection
   bounds = ([0, 0], [1e200, 10])  # Extremely large upper bound
   # Warning: "Bounds contain extreme values (>1e100)"

**Parameter Value Validation**:

- Warns for extreme initial parameters (abs(p0) > 1e50)
- Errors for NaN/Inf in initial parameters

.. code-block:: python

   # Example: Invalid parameter detection
   p0 = [1.0, np.nan, 0.5]
   # Error: "Initial parameters contain NaN/Inf values"

**Early Fail-Fast**:

Security validation runs before expensive operations to fail fast on
malformed input that could cause denial-of-service or numerical instability.

Validation Check Reference
---------------------------

The validator performs these checks:

**Array Validation**:

- Convert to numpy arrays
- Check dimensions (at least 1D)
- Validate array lengths match
- Check for tuple xdata (multi-dimensional fitting)

**Finite Values**:

- Detect NaN values
- Detect Inf values
- Report counts of bad values

**Data Shapes**:

- Minimum 2 data points
- More data points than parameters
- Matching xdata/ydata lengths

**Initial Guess (p0)**:

- Correct number of parameters
- All finite values
- Length matches function signature

**Bounds**:

- 2-tuple format (lower, upper)
- Correct lengths
- Lower < Upper for all parameters
- Initial guess within bounds
- Method compatibility (LM doesn't support bounds)

**Sigma**:

- Correct shape (matches ydata)
- All positive values
- All finite values

**Tolerances**:

- All positive (ftol, xtol, gtol)
- Not too small (<1e-15)

**Method**:

- Valid method name ('trf', 'lm', 'dogbox')

**Function Callable** (unless fast_mode=True):

- Function can be evaluated
- Output shape matches expected

**Data Quality** (unless fast_mode=True):

- Check for duplicate x values
- Detect potential outliers (3 IQR rule)
- Degenerate x values (all identical, tiny range)
- Degenerate y values (all identical, tiny range)

Performance Impact
------------------

**Fast mode** (fast_mode=True):

- Skips: Function callable test, data quality checks
- Speedup: ~30-50% faster validation
- Use for: Production code with trusted inputs

**Full mode** (fast_mode=False):

- All checks enabled
- Recommended for: Interactive use, debugging, untrusted inputs

**Typical validation overhead**:

- Small datasets (<1000 points): <1ms
- Medium datasets (1000-100K points): 1-10ms
- Large datasets (>100K points): 10-100ms

**Recommendation**: Use fast mode in tight loops, full mode for user-facing APIs

Integration Examples
--------------------

With curve_fit
~~~~~~~~~~~~~~

.. code-block:: python

    from nlsq import curve_fit
    from nlsq.utils.validators import InputValidator


    def safe_curve_fit(f, xdata, ydata, **kwargs):
        """curve_fit with validation."""
        validator = InputValidator(fast_mode=kwargs.pop("fast_validation", False))

        errors, warnings, x, y = validator.validate_curve_fit_inputs(
            f,
            xdata,
            ydata,
            p0=kwargs.get("p0"),
            bounds=kwargs.get("bounds"),
            sigma=kwargs.get("sigma"),
        )

        if errors:
            raise ValueError(f"Validation failed: {'; '.join(errors)}")

        # Show warnings
        for warning in warnings:
            import warnings

            warnings.warn(warning)

        return curve_fit(f, x, y, **kwargs)

With least_squares
~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from nlsq import LeastSquares
    from nlsq.utils.validators import InputValidator


    def safe_least_squares(fun, x0, **kwargs):
        """least_squares with validation."""
        validator = InputValidator()

        errors, warnings, x0 = validator.validate_least_squares_inputs(
            fun,
            x0,
            bounds=kwargs.get("bounds"),
            method=kwargs.get("method", "trf"),
            ftol=kwargs.get("ftol", 1e-8),
            xtol=kwargs.get("xtol", 1e-8),
            gtol=kwargs.get("gtol", 1e-8),
            max_nfev=kwargs.get("max_nfev"),
        )

        if errors:
            raise ValueError(f"Validation failed: {'; '.join(errors)}")

        for warning in warnings:
            import warnings

            warnings.warn(warning)

        ls = LeastSquares()
        return ls.least_squares(fun, x0, **kwargs)

See Also
--------

- :doc:`nlsq.minpack` : Main curve fitting API
- :doc:`nlsq.least_squares` : Least squares solver
- :doc:`../howto/troubleshooting` : Troubleshooting guide