Block Order | Block Relaxation Algorithms in Statistics -- Part I

I.3.5: Block Order

If there are more than two blocks, we can move through them in different ways. In analogy with linear methods such as Gauss-Seidel and Gauss-Jacobi, we distinguish cyclic and free-steering methods. We could select the block, for instance, that seems most in need of improvement. This is the greedy choice. We can pivot through the blocks in order, and start again when all blocks have been visited. Or we could go back in the reverse order after arriving at the last block. We can even choose blocks in random order, or use some other chaotic strategy.

We emphasize, however, that the methods we consider are all of the Gauss-Seidel type, i.e. as soon as we upgrade a block we use the new values in subsequent computations. We do not consider Gauss-Jordan type strategies, in which all blocks are updated independently, and then all blocks are replaced simultaneously. The latter strategy leads to fewer computations per cycle, but it will generally violate the monotonicity requirement for the loss function values.

We now give a formalization of these generalizations, due to Fiorot and Huard \citep{fiohua}. Suppose $\Delta_s$ are $p$ point-to-set mappings of $\Omega$ into $\mathcal{P}(\Omega),$ the set of all subsets of $\Omega.$ We suppose that $\omega\in\Delta_s(\omega)$ for all $s=1,\cdots,p.$ Also define $\Gamma_s(\omega)\mathop{=}\limits^{\Delta}\hbox{argmin} \{\psi(\overline\omega)\mid\overline\omega\in\Delta_s(\omega)\}.$ There are now two versions of the generalized block-relaxation method which are interesting.

In the free-steering version we set $\omega^{(k+1)}\in\cup_{s=1}^p\Gamma_s(\omega^{(k)}).$ This means that we select, from the $p$ subsets defining the possible updates, one single update before we go to the next cycle of updates.

In the cyclic method we set $\omega^{(k+1)}\in\otimes_{s=1}^p\Gamma_s(\omega^{(k)}).$ In a little bit more detail this means $\begin{align*} \omega^{(k,0)}&=\omega^{(k)},\\ \omega^{(k,1)}&\in\Gamma_s(\omega^{(k,0)}),\\ \cdots&\in\cdots,\\ \omega^{(k,p)}&\in\Gamma_s(\omega^{(k,p-1)}),\\ \omega^{(k+1)}&=\omega^{(k,p)}. \end{align*}$ Since $\omega\in\Delta_s(\omega),$ we see that, for both methods, if $\xi\in\Gamma(\omega)$ then $\psi(\xi)\leq\psi(\omega).$ This implies that Theorem \ref{T:triv} continues to apply to this generalized block relaxation method.

A simple example of the $\Delta_s$ is the following. Suppose the $G_s$ are arbitrary mappings defined on $\Omega.$ They need not even be real-valued. Then we can set $\Delta_s(\omega)\mathop{=}\limits^{\Delta}\{\xi\in\Omega\mid G_s(\xi)=G_s(\omega)\}.$ Obviously $\omega\in\Delta_s(\omega)$ for this choice of $\Delta_s.$

There are some interesting special cases. If $G_s$ projects on a subspace of $\Omega,$ then $\Delta(\omega)$ is the set of all $\xi$ which project into the same point as $\omega.$ By defining the subspaces using blocks of coordinates, we recover the usual block-relaxation method discussed in the previous section. In a statistical context, in combination with the EM algorithm, functional constraints of the form $G_s(\overline\omega)=G_s(\omega)$ were used by Meng and Rubin \citep{menru}. They call the resulting algorithm the ECM algorithm.