How to Debug Bad Fits
=====================

When curve fitting fails or produces poor results, this guide helps you
diagnose and fix the problem.

Common Symptoms
---------------

1. **Convergence failure**: Fit doesn't complete
2. **Wrong parameters**: Results are obviously incorrect
3. **Large uncertainties**: Parameter errors are huge
4. **Poor R²**: Low coefficient of determination
5. **Patterned residuals**: Systematic errors in residual plot

Diagnosis Flowchart
-------------------

::

   Fit fails?
   ├── Yes → Check error message → See "Convergence Failures"
   └── No → Check results
            ├── Parameters at bounds? → Relax bounds
            ├── Large uncertainties? → See "Poor Parameter Estimates"
            ├── Low R²? → See "Poor Fit Quality"
            └── Patterned residuals? → See "Model Mismatch"

Convergence Failures
--------------------

Error: "Optimal parameters not found"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cause**: Algorithm couldn't find a minimum.

**Solutions**:

1. Provide better initial guesses:

   .. code-block:: python

      # Estimate from data
      A_guess = np.max(y) - np.min(y)
      k_guess = 1.0 / (x[np.argmax(y)] - x[0])

      popt, pcov = curve_fit(model, x, y, p0=[A_guess, k_guess])

2. Use global optimization:

   .. code-block:: python

      from nlsq import fit

      popt, pcov = fit(model, x, y, preset="global")

3. Check data quality:

   .. code-block:: python

      # Check for NaN/Inf
      print(f"NaN in x: {np.any(np.isnan(x))}")
      print(f"NaN in y: {np.any(np.isnan(y))}")
      print(f"Inf in y: {np.any(np.isinf(y))}")

Error: "Maximum iterations reached"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cause**: Fit needs more iterations.

**Solutions**:

.. code-block:: python

   # Increase max iterations
   popt, pcov = curve_fit(model, x, y, max_nfev=10000)

Error: "Jacobian is singular"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cause**: Model is ill-conditioned or parameters are redundant.

**Solutions**:

1. Simplify the model
2. Fix some parameters
3. Rescale data

.. code-block:: python

   # Rescale data
   x_scale = np.max(np.abs(x))
   y_scale = np.max(np.abs(y))

   x_scaled = x / x_scale
   y_scaled = y / y_scale

   popt_scaled, pcov = curve_fit(model, x_scaled, y_scaled)

   # Unscale parameters as needed

Poor Parameter Estimates
------------------------

Parameters Have Large Uncertainties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cause**: Parameters are poorly constrained by data.

**Diagnosis**:

.. code-block:: python

   perr = np.sqrt(np.diag(pcov))
   for i, (p, e) in enumerate(zip(popt, perr)):
       relative_error = abs(e / p) if p != 0 else float("inf")
       print(f"p{i}: {p:.4f} ± {e:.4f} ({relative_error*100:.1f}%)")

**Solutions**:

1. Need more data, especially in sensitive regions
2. Fix some parameters if known
3. Simplify the model

Parameters at Bounds
~~~~~~~~~~~~~~~~~~~~

**Cause**: True value is outside allowed range, or bound is too restrictive.

**Diagnosis**:

.. code-block:: python

   lower, upper = bounds
   for i, p in enumerate(popt):
       if np.isclose(p, lower[i]) or np.isclose(p, upper[i]):
           print(f"Parameter {i} is at bound: {p}")

**Solutions**:

1. Relax bounds
2. Check if bounds are physically realistic
3. Reconsider model

Highly Correlated Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cause**: Parameters trade off against each other.

**Diagnosis**:

.. code-block:: python

   perr = np.sqrt(np.diag(pcov))
   correlation = pcov / np.outer(perr, perr)

   for i in range(len(popt)):
       for j in range(i + 1, len(popt)):
           if abs(correlation[i, j]) > 0.9:
               print(f"High correlation: p{i} and p{j}: {correlation[i,j]:.3f}")

**Solutions**:

1. Reparameterize the model
2. Fix one of the correlated parameters
3. Acquire more diverse data

Poor Fit Quality
----------------

Low R² Value
~~~~~~~~~~~~

**Cause**: Model doesn't explain the data well.

**Solutions**:

1. Check if model is appropriate for data:

   .. code-block:: python

      # Visualize data and model
      plt.scatter(x, y, alpha=0.5, label="Data")
      plt.plot(x, model(x, *popt), "r-", label="Fit")
      plt.legend()
      plt.show()

2. Consider different models (see :doc:`choose_model`)

3. Check for outliers:

   .. code-block:: python

      residuals = y - model(x, *popt)
      z_scores = (residuals - np.mean(residuals)) / np.std(residuals)
      outliers = np.abs(z_scores) > 3

      if np.any(outliers):
          print(f"Found {np.sum(outliers)} potential outliers")

High RMSE
~~~~~~~~~

**Cause**: Large prediction errors.

**Solutions**:

1. Check noise level in data
2. Use weighted fitting if noise varies:

   .. code-block:: python

      sigma = estimate_uncertainties(x, y)
      popt, pcov = curve_fit(model, x, y, sigma=sigma, absolute_sigma=True)

Model Mismatch
--------------

Systematic Patterns in Residuals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Cause**: Model doesn't capture the true relationship.

**Diagnosis**:

.. code-block:: python

   residuals = y - model(x, *popt)

   plt.figure(figsize=(12, 4))

   plt.subplot(1, 3, 1)
   plt.scatter(x, residuals, alpha=0.5)
   plt.axhline(0, color="r", linestyle="--")
   plt.xlabel("x")
   plt.ylabel("Residuals")
   plt.title("Residuals vs x")

   plt.subplot(1, 3, 2)
   plt.scatter(model(x, *popt), residuals, alpha=0.5)
   plt.axhline(0, color="r", linestyle="--")
   plt.xlabel("Predicted y")
   plt.ylabel("Residuals")
   plt.title("Residuals vs Predicted")

   plt.subplot(1, 3, 3)
   plt.hist(residuals, bins=20)
   plt.xlabel("Residual value")
   plt.ylabel("Count")
   plt.title("Residual Distribution")

   plt.tight_layout()
   plt.show()

**Patterns and solutions**:

- **U-shape or curved**: Missing quadratic term
- **Oscillating**: Missing periodic component
- **Increasing spread**: Heteroscedastic data (use weighted fitting)
- **Asymmetric histogram**: Non-normal errors (use robust fitting)

Debugging Checklist
-------------------

.. code-block:: text

   □ Data quality
     □ No NaN or Inf values
     □ Reasonable value ranges
     □ Sufficient data points

   □ Model appropriateness
     □ Matches known physics
     □ Correct number of parameters
     □ All parameters identifiable

   □ Initial guesses
     □ Estimated from data
     □ Within physical bounds
     □ Order of magnitude correct

   □ Bounds
     □ Physically motivated
     □ Not too restrictive
     □ Initial guess within bounds

   □ Fit configuration
     □ Sufficient max iterations
     □ Appropriate tolerance
     □ Correct method (trf for bounds)

Complete Debugging Example
--------------------------

.. code-block:: python

   import numpy as np
   import jax.numpy as jnp
   from nlsq import curve_fit
   import matplotlib.pyplot as plt


   def debug_fit(model, x, y, p0, bounds=None):
       """Comprehensive fit debugging."""

       print("=" * 60)
       print("FIT DEBUGGING REPORT")
       print("=" * 60)

       # 1. Check data
       print("\n1. DATA CHECK")
       print(f"   x: {len(x)} points, range [{x.min():.3g}, {x.max():.3g}]")
       print(f"   y: {len(y)} points, range [{y.min():.3g}, {y.max():.3g}]")
       print(f"   NaN in x: {np.any(np.isnan(x))}")
       print(f"   NaN in y: {np.any(np.isnan(y))}")

       # 2. Try fit
       print("\n2. FITTING")
       try:
           if bounds:
               popt, pcov = curve_fit(model, x, y, p0=p0, bounds=bounds)
           else:
               popt, pcov = curve_fit(model, x, y, p0=p0)
           print("   Status: SUCCESS")
       except Exception as e:
           print(f"   Status: FAILED - {e}")
           return

       # 3. Parameter analysis
       print("\n3. PARAMETERS")
       perr = np.sqrt(np.diag(pcov))
       for i, (p, e) in enumerate(zip(popt, perr)):
           rel_err = abs(e / p) * 100 if p != 0 else float("inf")
           status = "OK" if rel_err < 50 else "HIGH UNCERTAINTY"
           print(f"   p{i}: {p:10.4g} ± {e:10.4g} ({rel_err:5.1f}%) - {status}")

       # 4. Correlation check
       print("\n4. CORRELATIONS")
       corr = pcov / np.outer(perr, perr)
       high_corr = []
       for i in range(len(popt)):
           for j in range(i + 1, len(popt)):
               if abs(corr[i, j]) > 0.9:
                   high_corr.append((i, j, corr[i, j]))
       if high_corr:
           for i, j, c in high_corr:
               print(f"   WARNING: p{i}-p{j} correlation = {c:.3f}")
       else:
           print("   All correlations < 0.9")

       # 5. Residuals
       print("\n5. FIT QUALITY")
       y_pred = model(x, *popt)
       residuals = y - y_pred
       ss_res = np.sum(residuals**2)
       ss_tot = np.sum((y - np.mean(y)) ** 2)
       r2 = 1 - ss_res / ss_tot
       rmse = np.sqrt(np.mean(residuals**2))

       print(f"   R² = {r2:.4f}")
       print(f"   RMSE = {rmse:.4g}")

       if r2 < 0.9:
           print("   WARNING: R² < 0.9 suggests poor fit")

       return popt, pcov


   # Example usage
   def model(x, a, b, c):
       return a * jnp.exp(-b * x) + c


   np.random.seed(42)
   x = np.linspace(0, 10, 100)
   y = 2.5 * np.exp(-0.5 * x) + 0.3 + 0.1 * np.random.randn(100)

   debug_fit(model, x, y, p0=[2, 0.5, 0.3])

See Also
--------

- :doc:`troubleshooting` - General troubleshooting guide
- :doc:`choose_model` - Model selection
- :doc:`/tutorials/routine/getting_started/understanding_results` - Interpreting results