Majorization Algorithms | Block Relaxation Algorithms in Statistics -- Part II

II.1.2.3: Majorization Algorithm

The basic idea of a majorization algorithm is simple: it is the augmentation algorithm applied to the majorization function.

Suppose our current best approximation to the minimum of $f$ is $x^{(k)}$ , and we have a $g$ that majorizes $f$ on $\mathcal{X}$ in $x^{(k)}$ . If $x^{(k)}$ already minimizes $g$ we stop, otherwise we update $x^{(k)}$ to $x^{(k+1)}$ by minimizing $g$ over $\mathcal{X}$ .

If we do not stop, we have the sandwich inequality $f(x^{(k+1)})\leq g(x^{(k+1)})<g(x^{(k)})=f(x^{(k)}),$ and in the case of strict majorization even $f(x^{(k+1)})<g(x^{(k+1)})<g(x^{(k)})=f(x^{(k)}).$

We then select a new function $g$ that majorizes $f$ on $\mathcal{X}$ at $x^{(k+1)}$ . Repeating these steps produces a decreasing sequence of function values, and the usual compactness and continuity conditions guarantee convergence of both sequences $x^{(k)}$ and $f(x^{(k)})$ .

Here is an artificial example, chosen because of its simplicity. Consider $f(x)=x^4-10x^2.$ Because $x^2\geq y^2+2y(x-y)=2yx-y^2$ we see that $g(x,y)=x^4-20yx+10y^2$ is a suitable majorization function. The majorization algorithm is $x^{(k+1)}=\sqrt[3]{5x^{(k)}}.$

The first iterations of the algorithm are illustrated in Figure 1. We start with $x^{(0)}=3,$ where $f$ is $-9$ . Then $g(x,3)$ is the blue function. It is minimized at $x^{(1)}\approx 2.4662,$ where $g(x^{(1)},3)\approx -20.9795,$ and $f(x^{(1)})\approx -23.8288.$ We then majorize by using the green function $g(x,x^{(1)})$ , which has its minimum at about $2.3103,$ equal to about $-24.6430.$ The corresponding value of $f$ at this point is about $-24.8861.$ Thus we are rapidly getting close to the local minimum at $\sqrt{5}\approx 2.2361,$ with value $-25.$ The linear convergence rate at the stationary point is $\frac{1}{3}.$

plot of chunk bookfig5

Figure 1: Toy example, first iterations

Table 1 show the iterations to convergence, with an estimate of the iteration radius in the last column.

## Iteration:    1 fold:   -9.00000000 fnew:  -23.82883884 xold:    3.00000000 xnew:    2.46621207 ratio:    0.00000000 
## Iteration:    2 fold:  -23.82883884 fnew:  -24.88612919 xold:    2.46621207 xnew:    2.31029165 ratio:    0.32250955 
## Iteration:    3 fold:  -24.88612919 fnew:  -24.98789057 xold:    2.31029165 xnew:    2.26054039 ratio:    0.32971167 
## Iteration:    4 fold:  -24.98789057 fnew:  -24.99867394 xold:    2.26054039 xnew:    2.24419587 ratio:    0.33212463 
## Iteration:    5 fold:  -24.99867394 fnew:  -24.99985337 xold:    2.24419587 xnew:    2.23877400 ratio:    0.33293027 
## Iteration:    6 fold:  -24.99985337 fnew:  -24.99998373 xold:    2.23877400 xnew:    2.23696962 ratio:    0.33319896 
## Iteration:    7 fold:  -24.99998373 fnew:  -24.99999819 xold:    2.23696962 xnew:    2.23636848 ratio:    0.33328854 
## Iteration:    8 fold:  -24.99999819 fnew:  -24.99999980 xold:    2.23636848 xnew:    2.23616814 ratio:    0.33331840 
## Iteration:    9 fold:  -24.99999980 fnew:  -24.99999998 xold:    2.23616814 xnew:    2.23610137 ratio:    0.33332836 
## Iteration:   10 fold:  -24.99999998 fnew:  -25.00000000 xold:    2.23610137 xnew:    2.23607911 ratio:    0.33333167 
## Iteration:   11 fold:  -25.00000000 fnew:  -25.00000000 xold:    2.23607911 xnew:    2.23607169 ratio:    0.33333278 
## Iteration:   12 fold:  -25.00000000 fnew:  -25.00000000 xold:    2.23607169 xnew:    2.23606921 ratio:    0.33333315 
## Iteration:   13 fold:  -25.00000000 fnew:  -25.00000000 xold:    2.23606921 xnew:    2.23606839 ratio:    0.33333327 
## Iteration:   14 fold:  -25.00000000 fnew:  -25.00000000 xold:    2.23606839 xnew:    2.23606811 ratio:    0.33333331 
## Iteration:   15 fold:  -25.00000000 fnew:  -25.00000000 xold:    2.23606811 xnew:    2.23606802 ratio:    0.33333333

Table 1: Toy example, iterations to convergence

We also show the cobwebplot (see section 14.11.2) for the iterations, which illustrates the decrease of the difference between subsequent iterates.

plot of chunk cobexample

Figure 2