Partially Observed Linear Systems

De Leeuw [2004] discusses the problem of finding an approximate solution to the homogeneous linear system $AB=0$ when there are cone and orthonormality restrictions on the columns of $A$ and when some elements of $B$ are restricted to known values, most commonly to zero. Think of the columns of $A$ as variables or sets of variables, and think of $B$ as regression coefficients or weights.

The loss function used by De Leeuw [2004] is $f(R)\mathop{=}\limits^{\Delta}\min_{B\in\mathcal{B}}\mathbf{tr} B'RB,\tag{1}$ with $R\mathop{=}\limits^{\Delta}A'A$ and with $\mathcal{B}$ coding the constraints on $B$ . Note that the computation of the optimal $B$ in $\text{(1)}$ is a least squares problem, and even with linear inequality constraints on $B$ it is still a straightforward quadratic programming problem.

The function $f$ in $\text{(1)}$ is the pointwise minimum of linear functions in $R$ , and thus it is a concave function of $R$ . This means we are in the "aspects of correlation matrices" framework discussed in the previous section.

In particular we define $B(R)\mathop{=}\limits^{\Delta}i\{\hat B\mid\mathbf{tr}\ \hat B'R\hat B=\min_{B\in\mathcal{B}}\mathbf{tr}\ B'RB\},$ then the subgradient of $f$ at $R$ is $\partial f(R)=\mathbf{conv}(BB'\mid B\in B(R)).$ The subgradient inequality now says that for all correlation matrices $R$ and $S$ we have $f(R)\leq \mathbf{tr}\ \nabla R$ for all $\nabla\in\partial f(S)$ .

The constraints on $A$ discussed in De Leeuw [2004] make it possible to fit a wide variety of multivariate analysis techniques. Columns of $A$ , or variables, are partitioned into blocks. Some blocks contain only a single variable, such variables are called single. Some blocks are constrained to be orthoblocks, which means that the variabes in the block are required to be orthonormal. Single variables may be cone-constrained, which means the corresponding column of $A$ is constrained to be in a cone in $\mathbb{R}^n$ . And orthoblocks may be subspace-constrained, which means all columns must be in the same subspace.

We mention some illustrative special cases here. Common factor analysis of a data matrix $Y$ means finding an approximate solution to the system $\begin{bmatrix} Y&\mid&U&\mid&E \end{bmatrix} \begin{bmatrix} \ \ I\\ -\Gamma\\ -\Delta \end{bmatrix} = 0$ with $U'U=I$ , $E'E=I$ , $U'E=0$ , and $\Delta$ diagonal. The common factor scores are in $U$ , the unique factor scores in $E$ , the factor loadings in $\Gamma$ and the uniquenesses in $\Delta.$ This example can be generalized to cover structural equation models

Homeogeneity analysis [Gifi 1990] is the linear system $\begin{bmatrix} X&\mid&Q_1&\mid&\cdots&\mid&Q_m \end{bmatrix} \begin{bmatrix} I&I&I&\cdots&I\\ -\Gamma_1&0&0&\cdots&0\\ 0&-\Gamma_2&0&\cdots&0\\ 0&0&-\Gamma_3&\cdots&0\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&\cdots&-\Gamma_m \end{bmatrix}=0$ where $X$ is and orthoblock of object scores, while the $Q_j$ are orthoblocks in the subspaces defined by the indicator matrices (or B-spline bases) of variable $j$ . For single variables $Q_j$ only has a single column, which can be cone-constrained. For multiple correspondence analysis $X$ and all $Q_j$ have the same number of columns. For nonlinear principal component analysis all variables are single and the $\Gamma_j$ are $1\times p$ .

In both examples the majorization algorithm is actually an alternating least squares algorithm. In the factor analysis example the loss functon is $\sigma(Y,U,\Gamma,E,\Delta)=\|Y-U\Gamma-E\Delta\|^2,$ and in homogeneity analysis it is $\sigma(X,Q_1\cdots,Q_m,\Gamma_1,\cdots,\Gamma_j)=\sum_{j=1}^m\|X-Q_j\Gamma_j\|^2.$