Multidimensional Unfolding | Block Relaxation Algorithms in Statistics -- Part I

I.6.6.2: Multidimensional Unfolding

Now a data analysis example. In least-squares-squared metric unfolding (LSSMU) we must minimize $\sigma(X,Y)= \sum_{i=1}^{n}\sum_{j=1}^{m}w_{ij}(\delta_{ij}^{2}-[x_{i}'x_{i}^{}+y_{j}'y_{j}^{}-2x_{i}'y_{j}^{}])^{2}.$ over the $n\times p$ and $m\times p$ configuration matrices $X$ and $Y$ . This has been typically handled by block decomposition. The $(n+m)p$ unknowns are partitioned into a number of subsets. Block relaxation algorithms then cycle through the subsets, minimizing over the parameters in the subset while keeping all parameters fixed at their current values. One cycle through the subsets is one iteration of the algorithm.

In ALSCAL (Takane et all [1977]) coordinate descent is used, which means that the blocks consist of a single coordinate. There are $(n+m)p$ blocks. Solving for the optimal coordinate, with all other fixed, means minimizing a quartic, which in turn means finding the roots of a cubic. The algorithm converges to a stationary point which is a global minimum with respect to each coordinate separately. An alternative algorithm, proposed by Browne [1987], uses the $n+m$ points as blocks. Each substep is again an easy unidimensional minimization. Their algorithm converges to a stationary point which is a global minimum with respect to each point. Generally it is considered to be desirable to have fewer blocks, both to increase the speed of convergence and to restrict the class of local minima we can converge to.

Let us use our basic theorem to construct a four-block algorithm for LSSMU. Minimizing~\eqref{E:sstress} is the same as minimizing $\sigma(X,Y,\alpha,\beta)= \sum_{i=1}^{n}\sum_{j=1}^{m}w_{ij}(\delta_{ij}^{2}-[\alpha_{i}^{2}+\beta_{j}^{2}-2\alpha_{i}\beta_{j}x_{i}'y_{j}^{}])^{2}$ over $\alpha,\beta, X,$ and $Y$ , where the configuration matrices $X$ and $Y$ are constrained by $\mathbf{diag}(XX')=I$ and $\mathbf{diag}(YY')=I$ .

The algorithm starts with values $\Theta^{(0)}=(\alpha^{(0)},\beta^{(0)},X^{(0)},Y^{(0)})$ satisfying the constraints. Suppose we have arrived at $\Theta^{(k)}$ . We then update $\begin{eqnarray} \alpha^{(k+1)}&=\mathop{\mathbf{argmin}}\limits_{\alpha}&\sigma(X^{(k)},Y^{(k)},\alpha,\beta^{(k)}),\\ \beta^{(k+1)}&=\mathop{\mathbf{argmin}}\limits_{\beta}&\sigma(X^{(k)},Y^{(k)},\alpha^{(k+1)},\beta),\\ X^{(k+1)}&=\mathop{\mathbf{argmin}}\limits_{\mathbf{diag}(XX')=I}&\sigma(X,Y^{(k)},\alpha^{(k+1)},\beta^{(k+1)}),\\ Y^{(k+1)}&=\mathop{\mathbf{argmin}}\limits_{\mathbf{diag}(YY')=I}&\sigma(X^{(k+1)},Y,\alpha^{(k+1)},\beta^{(k+1)}). \end{eqnarray}$ This gives $\Theta^{(k+1)}$ . It is understood that in each of the four substeps of~\eqref{E:alg} we compute the global minimum, and if the global minimum happens to be nonunique we select any of them. We also remark that, as with any block relaxation method having more than two blocks, there are many variations on this basic scheme. We can travel through the substeps in a different order, we can change the order in each cycle, we can pass through the substeps in random order, we can cycle through the first two substeps a number of times before going to the third and fourth, and so on. Each of these strategies has its own overall convergence rate, and further research would be needed to determine what is best.

Let us look at the subproblems a bit more in detail to see how they can be best solved. Expanding~\eqref{E:expand} and organizing terms by powers of $\alpha$ gives $\begin{align*} \sigma(X,Y,\alpha,\beta)&=\sum_{i=1}^{n}\alpha_{i}^{4}\sum_{j=1}^{m}w_{ij}+\\ &-\sum_{i=1}^{n}\alpha_{i}^{3}\sum_{j=1}^{m}w_{ij}\beta_{j}c_{ij}+\\ &+\sum_{i=1}^{n}\alpha_{i}^{2}\sum_{j=1}^{m}w_{ij}(4\beta_{j}^{2}c_{ij}^{2}+2\beta_{j}^{2}-2\delta_{ij}^{2})+\\ &-\sum_{i=1}^{n}\alpha_{i}^{2}\sum_{j=1}^{m}4w_{ij}\beta_{j}^{3}c_{ij}+\\ &+\sum_{i=1}^{n}\sum_{j=1}^{m}w_{ij}(\delta_{ij}^{4}+\beta_{j}^{4}+4\delta_{ij}^{2}c_{ij}-2\delta_{ij}^{2}\beta_{j}^{2}), \end{align*}$ where $c_{ij}=x_{i}'y_{j}^{}$ . This is a sum of $n$ univariate quartic polynomials, which can be minimized separately to give the global minimum over $\alpha$ . Obviously the same applies to minimization over $\beta$ .

For minimization over $X$ and $Y$ we define $\begin{align*} r_{ij}&=\frac{\delta_{ij}^{2}-[\alpha_{i}^{2}+\beta_{j}^{2}]}{2\alpha_{i}\beta_{j}},\\ w_{ij}&=4\alpha_{i}^{2}\beta_{j}^{2}w_{ij}. \end{align*}$ Then $\sigma(X,Y,\alpha,\beta)=\sum_{i=1}^{n}\sum_{j=1}^{m}w_{ij}[r_{ij}-x_{i}'y_{j}^{}]^{2}.$ Expanding and collecting terms gives $\sigma(X,Y,\alpha,\beta)=\sum_{i=1}^{n}\psi_{i}(x_{i})$ with $\psi_{i}(x_{i})=f_{i}^{}-2x_{i}'g_{i}^{}+x_{i}'H_{i}^{}x_{i}^{})$ and $\begin{align*} f_{i}&=\sum_{j=1}^{m}w_{ij}^{}r_{ij}^{2},\\ g_{i}&=\sum_{j=1}^{m}w_{ij}r_{ij}y_{j},\\ H_{i}&=\sum_{j=1}^{m}w_{ij}^{}y_{j}^{}y_{j}'. \end{align*}$ Again this is the sum of $n$ separate functions $\psi_{i}$ , quadratics in this case, which can be minimized separately for each $x_{i}$ . By symmetry, we have the same strategy to minimize over $Y$ .

Mimizing over $x_{i}$ , under the constraint $x_{i}'x_{i}^{}=1$ , leads to the secular equation problem discussed in the Appendix. Since typically $p$ is two or at most three, the subproblems are very small indeed and can be solved efficiently.