nlsq.trf module¶
Trust Region Reflective algorithm implementation.
Trust Region Reflective algorithm for least-squares optimization. The algorithm is based on ideas from paper [STIR]. The main idea is to account for the presence of the bounds by appropriate scaling of the variables (or, equivalently, changing a trust-region shape). Let’s introduce a vector v:
| ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
| 1, otherwise
where g is the gradient of a cost function and lb, ub are the bounds. Its components are distances to the bounds at which the anti-gradient points (if this distance is finite). Define a scaling matrix D = diag(v**0.5). First-order optimality conditions can be stated as:
D^2 g(x) = 0.
Meaning that components of the gradient should be zero for strictly interior variables, and components must point inside the feasible region for variables on the bound. Now consider this system of equations as a new optimization problem. If the point x is strictly interior (not on the bound), then the left-hand side is differentiable and the Newton step for it satisfies:
(D^2 H + diag(g) Jv) p = -D^2 g
where H is the Hessian matrix (or its J^T J approximation in least squares), Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all elements of matrix C = diag(g) Jv are non-negative. Introduce the change of the variables x = D x_h (_h would be “hat” in LaTeX). In the new variables, we have a Newton step satisfying:
B_h p_h = -g_h,
where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect to “hat” variables. To guarantee global convergence we formulate a trust-region problem based on the Newton step in the new variables:
0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta
In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region problem is:
0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta
Here, the meaning of the matrix D becomes more clear: it alters the shape
of a trust-region, such that large steps towards the bounds are not allowed.
In the implementation, the trust-region problem is solved in “hat” space,
but handling of the bounds is done in the original space (see below and read
the code).
The introduction of the matrix D doesn’t allow to ignore bounds, the algorithm
must keep iterates strictly feasible (to satisfy aforementioned
differentiability), the parameter theta controls step back from the boundary
(see the code for details).
The algorithm does another important trick. If the trust-region solution
doesn’t fit into the bounds, then a reflected (from a firstly encountered
bound) search direction is considered. For motivation and analysis refer to
[STIR] paper (and other papers of the authors). In practice, it doesn’t need
a lot of justifications, the algorithm simply chooses the best step among
three: a constrained trust-region step, a reflected step and a constrained
Cauchy step (a minimizer along -g_h in “hat” space, or -D^2 g in the original
space).
Another feature is that a trust-region radius control strategy is modified to
account for appearance of the diagonal C matrix (called diag_h in the code).
Note that all described peculiarities are completely gone as we consider
problems without bounds (the algorithm becomes a standard trust-region type
algorithm very similar to ones implemented in MINPACK).
The implementation supports two methods of solving the trust-region problem.
The first, called ‘exact’, applies SVD on Jacobian and then solves the problem
very accurately using the algorithm described in [JJMore]. It is not
applicable to large problem. The second, called ‘lsmr’, uses the 2-D subspace
approach (sometimes called “indefinite dogleg”), where the problem is solved
in a subspace spanned by the gradient and the approximate Gauss-Newton step
found by scipy.sparse.linalg.lsmr. A 2-D trust-region problem is
reformulated as a 4th order algebraic equation and solved very accurately by
numpy.roots. The subspace approach allows to solve very large problems
(up to couple of millions of residuals on a regular PC), provided the Jacobian
matrix is sufficiently sparse.
.. admonition:: References
[STIR] (1,2)Branch, M.A., T.F. Coleman, and Y. Li, “A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999.
[JJMore]More, J. J., “The Levenberg-Marquardt Algorithm: Implementation and Theory,” Numerical Analysis, ed. G. A. Watson, Lecture
- class nlsq.core.trf.SVDCache(U, s, V, J_h, x_hash)[source]
Bases:
NamedTupleCache SVD decomposition across inner loop iterations when Jacobian unchanged.
This cache stores the SVD components (U, s, V) along with the scaled Jacobian J_h to avoid redundant SVD computations during inner loop iterations where the step is rejected and parameters remain unchanged.
- U
Left singular vectors (m x k), where m is residuals and k = min(m, n).
- Type:
jnp.ndarray
- s
Singular values (k,).
- Type:
jnp.ndarray
- V
Right singular vectors (n x k), where n is parameters.
- Type:
jnp.ndarray
- J_h
Scaled Jacobian in “hat” space (m x n).
- Type:
jnp.ndarray
- x_hash
Hash of parameter vector for cache validation. Cache is valid only when the current parameter hash matches this value.
- Type:
Notes
The cache is valid only when x_hash matches the current parameter vector’s hash. When a step is rejected (actual_reduction <= 0), the parameters don’t change, so the SVD can be reused. When a step is accepted, the cache must be invalidated.
The expected speedup from SVD caching is 20-40% on problems with frequent step rejections, as SVD computation is O(mn^2) and dominates iteration time.
- U: Array
Alias for field number 0
- s: Array
Alias for field number 1
- V: Array
Alias for field number 2
- J_h: Array
Alias for field number 3
- x_hash: int
Alias for field number 4
- class nlsq.core.trf.TRFConfig(ftol=1e-08, xtol=1e-08, gtol=1e-08, max_nfev=None, x_scale='jac', loss='linear', tr_solver='exact', verbose=0)[source]
Bases:
objectImmutable TRF algorithm configuration.
Groups algorithm configuration parameters passed to TRF optimizer functions. This is an internal implementation detail - the public API remains unchanged.
- ftol
Tolerance for termination by change of cost function.
- Type:
- xtol
Tolerance for termination by change of independent variables.
- Type:
- gtol
Tolerance for termination by norm of gradient.
- Type:
- max_nfev
Maximum number of function evaluations. None for unlimited.
- Type:
int or None
- x_scale
Characteristic scale of variables. ‘jac’ for automatic scaling.
- Type:
- loss
Loss function type (‘linear’, ‘soft_l1’, ‘huber’, ‘cauchy’, ‘arctan’).
- Type:
- tr_solver
Trust-region subproblem solver (‘exact’, ‘lsmr’, ‘cg’).
- Type:
- verbose
Verbosity level (0=silent, 1=termination, 2=iterations).
- Type:
- ftol: float
- xtol: float
- gtol: float
- x_scale: str
- loss: str
- tr_solver: str
- verbose: int
- __post_init__()[source]
Validate configuration values.
- __init__(ftol=1e-08, xtol=1e-08, gtol=1e-08, max_nfev=None, x_scale='jac', loss='linear', tr_solver='exact', verbose=0)
- class nlsq.core.trf.StepContext(x, f, J, cost, g, trust_radius, iteration, scale, scale_inv, alpha=0.0)[source]
Bases:
objectMutable state container for TRF step computation.
Groups the iteration state variables passed between TRF helper methods. This reduces parameter count and improves code clarity.
- x
Current parameter values.
- Type:
jnp.ndarray
- f
Residual vector at x.
- Type:
jnp.ndarray
- J
Jacobian matrix at x.
- Type:
jnp.ndarray
- cost
Current cost (0.5 * ||f||^2).
- Type:
- g
Gradient vector (J^T @ f).
- Type:
jnp.ndarray
- trust_radius
Current trust region radius (Delta).
- Type:
- iteration
Current iteration number.
- Type:
- scale
Variable scaling factors.
- Type:
jnp.ndarray
- scale_inv
Inverse scaling factors.
- Type:
jnp.ndarray
- alpha
Levenberg-Marquardt parameter.
- Type:
- x: Array
- f: Array
- J: Array
- cost: float
- g: Array
- trust_radius: float
- iteration: int
- scale: Array
- scale_inv: Array
- alpha: float
- __init__(x, f, J, cost, g, trust_radius, iteration, scale, scale_inv, alpha=0.0)
- class nlsq.core.trf.BoundsContext(lb, ub, x_scale, x_offset, lb_scaled, ub_scaled)[source]
Bases:
objectBound constraint data for TRF optimization.
Groups the bound-related arrays used in bounded optimization.
- lb
Lower bounds on parameters.
- Type:
jnp.ndarray
- ub
Upper bounds on parameters.
- Type:
jnp.ndarray
- x_scale
Scaling factors for bounded variables.
- Type:
jnp.ndarray
- x_offset
Offset for bounded variables (center of bounds).
- Type:
jnp.ndarray
- lb_scaled
Scaled lower bounds.
- Type:
jnp.ndarray
- ub_scaled
Scaled upper bounds.
- Type:
jnp.ndarray
- lb: Array
- ub: Array
- x_scale: Array
- x_offset: Array
- lb_scaled: Array
- ub_scaled: Array
- classmethod from_bounds(lb, ub, x_scale=None)[source]
Create BoundsContext from bounds arrays.
- Parameters:
lb (array_like) – Lower bounds.
ub (array_like) – Upper bounds.
x_scale (array_like, optional) – Scaling factors. If None, uses 1.0.
- Returns:
Initialized bounds context.
- Return type:
BoundsContext
- __init__(lb, ub, x_scale, x_offset, lb_scaled, ub_scaled)
- class nlsq.core.trf.TrustRegionReflective(enable_stability=False)[source]
Bases:
TrustRegionJITFunctions,TrustRegionOptimizerBaseTrust Region Reflective algorithm for bounded least squares optimization.
Implements the TRF algorithm with variable scaling to handle parameter bounds. Supports exact (SVD) and iterative (CG) solvers for trust region subproblems.
- __init__(enable_stability=False)[source]
Initialize the TrustRegionReflective optimizer.
Creates JIT-compiled functions and sets up logging infrastructure. All optimization functions are compiled during initialization for maximum performance during solve operations.
- Parameters:
enable_stability (bool, default False) – Enable numerical stability checks and fixes
- trf(fun, xdata, ydata, jac, data_mask, transform, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, f_scale, x_scale, loss_function, tr_options, verbose, timeit=False, solver='exact', diagnostics=None, callback=None, **kwargs)[source]
Minimize a scalar function of one or more variables using the trust-region reflective algorithm. Although I think this is not good coding style, I maintained the original code format from SciPy such that the code is easier to compare with the original. See the note from the algorithms original author below.
For efficiency, it makes sense to run the simplified version of the algorithm when no bounds are imposed. We decided to write the two separate functions. It violates the DRY principle, but the individual functions are kept the most readable.
- Parameters:
fun (callable) – The residual function
xdata (array_like or tuple of array_like) – The independent variable where the data is measured. If xdata is a tuple, then the input arguments to fun are assumed to be
(xdata[0], xdata[1], ...).ydata (jnp.ndarray) – The dependent data
jac (callable) – The Jacobian of fun.
data_mask (jnp.ndarray) – The mask for the data.
transform (jnp.ndarray) – The uncertainty transform for the data.
x0 (jnp.ndarray) – Initial guess. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
f0 (jnp.ndarray) – Initial residuals. Array of real elements of size (m,), where ‘m’ is the number of data points.
J0 (jnp.ndarray) – Initial Jacobian. Array of real elements of size (m, n), where ‘m’ is the number of data points and ‘n’ is the number of independent variables.
lb (jnp.ndarray) – Lower bounds on independent variables. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
ub (jnp.ndarray) – Upper bounds on independent variables. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
ftol (float) – Tolerance for termination by the change of the cost function.
xtol (float) – Tolerance for termination by the change of the independent variables.
gtol (float) – Tolerance for termination by the norm of the gradient.
max_nfev (int) – Maximum number of function evaluations.
f_scale (float) – Cost function scalar
x_scale (jnp.ndarray) – Scaling factors for independent variables.
loss_function (callable, optional) – Loss function. If None, the standard least-squares problem is solved.
tr_options (dict) – Options for the trust-region algorithm.
verbose (int) –
Level of algorithm’s verbosity:
0 (default) : work silently.
1 : display a termination report.
timeit (bool, optional) – If True, the time for each step is measured if the unbounded version is being ran. Default is False.
- trf_no_bounds(fun, xdata, ydata, jac, data_mask, transform, x0, f, J, lb, ub, ftol, xtol, gtol, max_nfev, f_scale, x_scale, loss_function, tr_options, verbose, solver='exact', callback=None, profiler=None, **kwargs)[source]
Unbounded version of the trust-region reflective algorithm.
- Parameters:
fun (callable) – The residual function
xdata (array_like or tuple of array_like) – The independent variable where the data is measured. If xdata is a tuple, then the input arguments to fun are assumed to be
(xdata[0], xdata[1], ...).ydata (jnp.ndarray) – The dependent data
jac (callable) – The Jacobian of fun.
data_mask (jnp.ndarray) – The mask for the data.
transform (jnp.ndarray) – The uncertainty transform for the data.
x0 (jnp.ndarray) – Initial guess. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
f0 (jnp.ndarray) – Initial residuals. Array of real elements of size (m,), where ‘m’ is the number of data points.
J0 (jnp.ndarray) – Initial Jacobian. Array of real elements of size (m, n), where ‘m’ is the number of data points and ‘n’ is the number of independent variables.
lb (jnp.ndarray) – Lower bounds on independent variables. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
ub (jnp.ndarray) – Upper bounds on independent variables. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
ftol (float) – Tolerance for termination by the change of the cost function.
xtol (float) – Tolerance for termination by the change of the independent variables.
gtol (float) – Tolerance for termination by the norm of the gradient.
max_nfev (int) – Maximum number of function evaluations.
f_scale (float) – Cost function scalar
x_scale (jnp.ndarray) – Scaling factors for independent variables.
loss_function (callable, optional) – Loss function. If None, the standard least-squares problem is solved.
tr_options (dict) – Options for the trust-region algorithm.
verbose (int) –
Level of algorithm’s verbosity:
0 (default) : work silently.
1 : display a termination report.
- Returns:
result (OptimizeResult) – The optimization result represented as a
OptimizeResultobject. Important attributes are:xthe solution array,successa Boolean flag indicating if the optimizer exited successfully andmessagewhich describes the cause of the termination. See OptimizeResult for a description of other attributes.profiler (TRFProfiler, NullProfiler, or None, optional) – Profiler for timing algorithm operations. If None, uses NullProfiler (zero overhead). Use TRFProfiler() for detailed performance analysis. Default is None.
- Return type:
Notes
The algorithm is described in [13].
- trf_bounds(fun, xdata, ydata, jac, data_mask, transform, x0, f, J, lb, ub, ftol, xtol, gtol, max_nfev, f_scale, x_scale, loss_function, tr_options, verbose, solver='exact', callback=None, **kwargs)[source]
Bounded version of the trust-region reflective algorithm.
- Parameters:
fun (callable) – The residual function
xdata (array_like or tuple of array_like) – The independent variable where the data is measured. If xdata is a tuple, then the input arguments to fun are assumed to be
(xdata[0], xdata[1], ...).ydata (jnp.ndarray) – The dependent data
jac (callable) – The Jacobian of fun.
data_mask (jnp.ndarray) – The mask for the data.
transform (jnp.ndarray) – The uncertainty transform for the data.
x0 (jnp.ndarray) – Initial guess. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
f0 (jnp.ndarray) – Initial residuals. Array of real elements of size (m,), where ‘m’ is the number of data points.
J0 (jnp.ndarray) – Initial Jacobian. Array of real elements of size (m, n), where ‘m’ is the number of data points and ‘n’ is the number of independent variables.
lb (jnp.ndarray) – Lower bounds on independent variables. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
ub (jnp.ndarray) – Upper bounds on independent variables. Array of real elements of size (n,), where ‘n’ is the number of independent variables.
ftol (float) – Tolerance for termination by the change of the cost function.
xtol (float) – Tolerance for termination by the change of the independent variables.
gtol (float) – Tolerance for termination by the norm of the gradient.
max_nfev (int) – Maximum number of function evaluations.
f_scale (float) – Cost function scalar
x_scale (jnp.ndarray) – Scaling factors for independent variables.
loss_function (callable, optional) – Loss function. If None, the standard least-squares problem is solved.
tr_options (dict) – Options for the trust-region algorithm.
verbose (int) –
Level of algorithm’s verbosity:
0 (default) : work silently.
1 : display a termination report.
- Returns:
result – The optimization result represented as a
OptimizeResultobject. Important attributes are:xthe solution array,successa Boolean flag indicating if the optimizer exited successfully andmessagewhich describes the cause of the termination. See OptimizeResult for a description of other attributes.- Return type:
Notes
The algorithm is described in [13].
References
- select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta, lb_jnp=None, ub_jnp=None)[source]
Select the best step according to Trust Region Reflective algorithm.
- Parameters:
x (np.ndarray) – Current set parameter vector.
J_h (jnp.ndarray) – Jacobian matrix in the scaled ‘hat’ space.
diag_h (jnp.ndarray) – Diagonal of the scaled matrix C = diag(g * scale) Jv?
g_h (jnp.ndarray) – Gradient vector in the scaled ‘hat’ space.
p (np.ndarray) – Trust-region step in the original space.
p_h (np.ndarray) – Trust-region step in the scaled ‘hat’ space.
d (np.ndarray) – Scaling vector.
Delta (float) – Trust-region radius.
lb (np.ndarray) – Lower bounds on variables.
ub (np.ndarray) – Upper bounds on variables.
theta (float) – Controls step back step ratio from the bounds.
lb_jnp (jnp.ndarray, optional) – Pre-converted JAX lower bounds (avoids repeated conversion).
ub_jnp (jnp.ndarray, optional) – Pre-converted JAX upper bounds (avoids repeated conversion).
- Returns:
step (np.ndarray) – Step in the original space.
step_h (np.ndarray) – Step in the scaled ‘hat’ space.
predicted_reduction (float) – Predicted reduction in the cost function.
- Return type:
- optimize(fun, x0, jac=None, bounds=(-inf, inf), **kwargs)[source]
Perform optimization using trust region reflective algorithm.
This method provides a simplified interface to the TRF algorithm. For full control and curve fitting applications, use the trf method directly.
- Parameters:
fun (callable) – The objective function to minimize. Should return residuals.
x0 (np.ndarray) – Initial guess for parameters
jac (callable, optional) – Jacobian function. If None, uses automatic differentiation.
bounds (tuple of arrays) – Lower and upper bounds for parameters
**kwargs (Any) – Additional optimization parameters
- Returns:
The optimization result
- Return type:
- Raises:
NotImplementedError – This simplified interface is not yet implemented. Use the trf method for full curve fitting functionality.