Convergence | Block Relaxation Algorithms in Statistics -- Part II

II.5.5.3: Convergence

Suppose $a$ is such that $g(x,y)=f(y)+f'(y)(x-y)+\frac{1}{2}a(x-y)^2\tag{1}$ majorizes $f(x)$ for all $y$ . The majorization algorithm is simply $x^{(k+1)}=x^{(k)}-\frac{1}{a}f'(x^{(k)}),$ i.e. it is a gradient algorithm with constant step size. From Ostrowski's Theorem the linear convergence rate is $\kappa(x_\infty)=1-\frac{f''(x_\infty)}{a}.\tag{2}$ Note that if $g$ in $\text(1)$ majorizes $f$ , then any $g$ of the same form with a larger $a$ also majorizes $f$ . But $\text{(2)}$ shows a smaller $a$ will generally lead to faster convergence.

For all $k$ we have $f(x^{(k+1)})\leq g(x^{(k+1)},x^{(k)})=f(x^{(k)})-\frac12\frac{(f'(x^{(k)}))^2}{a}.$ Adding these inequalities gives $f(x^{(k+1)})-f(x^{(0)})=\sum_{i=0}^k(f(x^{(i+1)})-f(x^{(i)})\leq-\frac{1}{2a}\sum_{i=0}^k(f'(x^{(i)}))^2,$ and thus, with $f_\star=\min f(x)$ , $\frac{1}{2a}\sum_{i=0}^k(f'(x^{(i)}))^2\leq f(x^{(0)})-f_\star.\tag{3}$ The left hand side of $\text{(3)}$ is an increasing sequence which is bounded above, and consequently converges. This implies $\lim_{k\rightarrow\infty} f'(x^{(k)})=0$

Generalize to more variables, generalize to constraints.