Eigenvalues and Eigenvectors of Symmetric Matrices

In this section we give a fairly complete introduction to eigenvalue problems and generalized eigenvalue problems. We use a constructive variational approach, basically using the Rayleigh quotient and deflation. This works best for positive semi-definite matrices, but after dealing with those we discuss several generalizations.

Suppose $A$ is a positive semi-definite matrix of order $n$ . Consider the problem of maximizing the quadratic form $f(x)=x'Ax$ on the sphere $x'x=1$ . At the maximum, which is always attained, we have $Ax=\lambda x$ , with $\lambda$ a Lagrange multiplier, as well as $x'x=1$ . It follows that $\lambda=f(x)$ . Note that the maximum is not necessarily attained at a unique value. Also the maximum is zero if and only if $A$ is zero.

Any pair $(x,\lambda)$ such that $Ax=\lambda x$ and $x'x=1$ is called an eigen-pair of $A$ . The members of pair are the eigenvector $x$ and the corresponding eigenvalue $\lambda$ .

Result 1: Suppose $(x_1,\lambda_1)$ and $(x_2,\lambda_2)$ are two eigen-pairs, with $\lambda_1\not=\lambda_2.$ Then premultiplying both sides of $Ax_2=\lambda_2x_2$ by $x_1'$ gives $\lambda_1x_1'x_2=\lambda_2x_1'x_2$ , and thus $x_1'x_2=0$ . This shows that $A$ cannot have more than $n$ distinct eigenvalues. If there were $p>n$ distinct eigenvalues, then the $n\times p$ matrix $X$ , which has the corresponding eigenvectors as columns, would have column-rank $p$ and row-rank $n$ , which is impossible. In words: one cannot have more than $n$ orthonormal vectors in $n-$ dimensional space. Suppose the distinct values are $\tilde\lambda_1>\cdots>\tilde\lambda_s$ , with $s=1,\cdots,p.$ Thus each of the eigenvalues $\lambda_i$ is equal to one of the $\tilde\lambda_s.$

Result 2: If $(x_1,\lambda)$ and $(x_2,\lambda)$ are two eigen-pairs with the same eigenvalue $\lambda$ then any linear combination $\alpha x_1+\beta x_2,$ suitably normalized, is also an eigenvector with eigenvalue $\lambda$ . Thus the eigenvectors corresponding with an eigenvalue $\lambda$ form a linear subspace of $\mathbb{R}^n$ , with dimension, say, $1\leq n_s\leq n$ . This subspace can be given an orthonormal basis in an $n\times n_s$ matrix $X_s$ . The number $n_s$ is the multiplicity of $\tilde\lambda_s,$ and by implication of the eigenvalue $\lambda_i$ equal to $\tilde\lambda_s$ .

Of course these results are only useful if eigen-pairs exist. We have shown that at least one eigen-pair exists, the one corresponding to the maximum of $f$ on the sphere. We now give a procedure to compute additonal eigen-pairs.

Consider the following algorithm for generating a sequence $A^{(k)}$ of matrices. We start with $k=1$ and $A^{(1)}=A$ .

Test: If $A^{(k)}=0$ stop.
Maximize: Computes the maximum of $x'A^{(k)}x$ over $x'x=1$ . Suppose this is attained at an eigen-pair $(x^{(k)},\lambda^{(k)})$ . If the maximizer is not unique, select an arbitrary one.
Orthogonalize: Replace $x^{(k)}$ by $x^{(k)}-\sum_{\ell=1}^{k-1}((x^{(\ell)})'x^{(k)})x^{(\ell)}.$
Deflate: Set $A^{(k+1)}=A^{(k)}-\lambda^{(k)}x^{(k)}(x^{(k)})'$ ,
Update: Go back to step 1 with $k$ replaced by $k+1$ .

If $k=1$ then in step (2) we compute the largest eigenvalue of $A$ and a corresponding eigenvector. In that case there is no step (3). Step (4) constructs $A^{(2)}$ by deflation, which basically removes the contribution of the largest eigenvalue and corresponding eigenvector. If $x$ is an eigenvector of $A$ with eigenvalue $\lambda<\lambda^{(1)}$ , then $A^{(2)}x=Ax-\lambda^{(1)}x^{(1)}(x^{(1)})'x=Ax=\lambda x$ by result (1) above. Also, of course, $A^{(2)}x^{(1)}=0$ , so $x^{(1)}$ is an eigenvector of $A^{(2)}$ with eigenvalue $0$ . If $x\not= x^{(1)}$ is an eigenvector of $A$ with eigenvalue $\lambda=\lambda^{(1)}$ , then by result (2) we can choose $x$ such that $x'x^{(1)}=0,$ and thus $A^{(2)}x=Ax-\lambda x^{(1)}(x^{(1)})'x=\lambda(I-x^{(1)}(x^{(1)})')x=\lambda x.$ We see that $A^{(2)}$ has the same eigenvectors as $A$ , with the same multiplicities, except for $\lambda^{(1)}$ , which now has its old multiplicity $-1$ , and zero, which now has its old multiplicity $+1$ . Now if $x^{(2)}$ is the eigenvector corresponding with $\lambda^{(2)}$ , the largest eigenvalue of $A^{(2)}$ , then by result (1) $x^{(2)}$ is automatically orthogonal to $x^{(1)}$ , which is an eigenvalue of $A^{(2)}$ with eigenvalue zero. Thus step (3) is not ever necessary, although it will lead to more precise numerical computation.

Following the steps of the algorithm we see thatit defines $p$ orthonormal matrices $X_s$ , which moreover satisfy $X_s'X_t=0$ for $s\not= t$ , and with $\sum_{s=1}^p n_s=\mathbf{rank}(A)$ . Also $A=\sum_{s=1}^p \tilde\lambda_sP_s,\tag{1a}$ where $P_s$ is the projector $X_sX_s'$ . This is the eigen decomposition or the spectral decomposition of a positive semi-definite $A$ .

Our algorithm stops when $A^{(k)}=0$ , which is the same as $\sum_{s=1}^p n_s=\mathbf{rank}(A)$ . If $\mathbf{rank}(A)<n$ then the minimum eigenvalue is zero, and has multiplicity $n-\mathbf{rank}(A)$ . $P_s=I-P_1-\cdots-P_{s-1}=X_sX_s'$ is the orthogonal projector of the null-space of $A$ , with $\mathbf{rank}(Q)=\mathbf{tr}(Q)=n-\mathbf{rank}(A)$ . Using the square orthonormal $X=\begin{bmatrix}X_1&\cdots&X_s\end{bmatrix}$ we can write the eigen decomposition in the form $A=X\Lambda X',\tag{1b}$ where the last $n-\mathbf{rank}(A)$ diagonal elements of $\Lambda$ are zero. Equation $(1b)$ can also be written as $X'AX=\Lambda,\tag{1c}$ which says that the eigenvectors diagonalize $A$ and that $A$ is orthonormally similar to the diagonal matrix of eigenvalues,

We have shown that the largest eigenvalue and corresponding eigenvector exist, but we have not indicated , at least in this section, how to compute them. Conceptually the power method is the most obvious way. It is a tangential minorization method, using the inquality $x'Ax\geq y'Ay+2y'A(x-y)$ , which means that the iteration function is $x^{(k+1)}=\frac{Ax^{(k)}}{\|Ax^{(k)}\|}.$ See the Rayleigh Quotient section for further details.

We now discuss a first easy generalization. If $A$ is real and symmetric but not necessarily positive semi-definite then we can apply our previous results to the matrix $A+kI$ , with $k\geq\min_i\lambda_i$ . Or we can apply it to $A^2=X\Lambda^2X'$ . Or we can modify the algorithm if we run into an $A^{(k)}\not= 0$ with maximum eigenvalue equal to zero. If this happens we switch to finding the smallest eigenvalues, which will be negative. No matter how we modify the constructive procedure, we will still find an eigen decomposition of the same form $(1a)$ and $(1b)$ as in the positive semi-definite case.

The second generalization, also easy, are generalized eigenvalues of a pair of real symmetric matrices $(A,B)$ . We now maximize $f(x)=x'Ax$ over $x$ satisfying $x'Bx=1$ . In data analysis, and the optimization problems associated with it, we almost invariably assume that $B$ is positive definite. In fact we might as well make the weaker assumption that $B$ is positive semi-definite, and $Ax=0$ for all $x$ such that $Bx=0.$ Suppose $B=\begin{bmatrix}K&K_\perp\end{bmatrix} \begin{bmatrix}\Lambda^2&0\\0&0\end{bmatrix} \begin{bmatrix}K'\\K_\perp'\end{bmatrix}$ is an eigen decomposition of $B.$ Change variables by writing $X$ as $x=K\Lambda^{-1}u+K_\perp v$ . Then $x'Bx=u'u$ and $x'Ax=u'\Lambda^{-1}K'AK\Lambda^{-1}u$ . We can find the generalized eigenvalues and eigenvectors from the ordinary eigen decomposition of $\Lambda^{-1}K'AK\Lambda^{-1}$ . This defines the $u^{(s)}$ in $x^{(s)}=K\Lambda^{-1}u^{(s)}+K_\perp v$ , and the choice of $v$ is completely arbitrary.

Now suppose $L$ is the square orthonormal matrix of eigenvectors diagonalizing $\Lambda^{-1}K'AK\Lambda^{-1},$ with $\Gamma$ the corresponding eigenvalues, and $S\mathop{=}\limits^{\Delta}K\Lambda^{-1}L$ . Then $S'AS=\Gamma$ and $S'BS=I$ . Thus $S$ diagonalizes both $A$ and $B$ . For the more general case, in which we do not assume that $Ax=0$ for all $x$ with $Bx=0$ , we refer to De Leeuw [1982].