I.5.4.5: Scaling and Splitting
Early on in the development of ALS algorithms some interesting complications where discovered. Let us consider canonical correlation analysis with optimal scaling. There we want to minimize where the and the are optimally scaled or transformed variables. This problem is analyzed in detail in Van der Burg and De Leeuw \cite{canals}. This seems like a perfectly straightforward ALS problem. It can be formulated as a problem with the two blocks and or as a problem with the four blocks But no matter how one formulates it, a normalization must be chosen to prevent trivial solutions. In the spirit of canonical analysis it makes sense to require or Both sets of conditions basically lead to the same solution, but in the intermediate iterations the normalization condition creates a problem, because it involves elements from two different blocks. Also, although is a simple constraint on for given it is not such a simple constraint on for given
The solution to this dilemma, basically due to Takane, is to constrain either or always update the unconstrained block, and switch normalizations after each update. Global convergence (at least of loss function values) is guaranteed by the following analysis.
Theorem:
Proof: and minimizing the right-hand side over clearly proves the first part of the Theorem. The second part goes the same. QED