Gauss-Newton Majorization

The least squares loss function was defined here. It has the form where is an positive semi-definite matrix of weights, and we minimize over .

Now and The structure of the Hessian suggest to define We have . Note that is positive semi-definite for all .

In the classical Gauss-Newton method we make the approximation and the corresponding iterative algorithm is If the algorithm converges to , and the are small, then the least squares loss function will be small, and will be small as well. Since the iteration matrix is we can expect rapid convergence. But convergence is not guaranteed, and consequently we need safeguards. Majorization provides one such safeguard.

If we can find such that

Note: Use Nesterov's Gauss-Newton paper 03/13/15