Eigenvalues and Eigenvectors of Symmetric Matrices
In this section we give a fairly complete introduction to eigenvalue problems and generalized eigenvalue problems. We use a constructive variational approach, basically using the Rayleigh quotient and deflation. This works best for positive semi-definite matrices, but after dealing with those we discuss several generalizations.
Suppose is a positive semi-definite matrix of order . Consider the problem of maximizing the quadratic form on the sphere . At the maximum, which is always attained, we have , with a Lagrange multiplier, as well as . It follows that . Note that the maximum is not necessarily attained at a unique value. Also the maximum is zero if and only if is zero.
Any pair such that and is called an eigen-pair of . The members of pair are the eigenvector and the corresponding eigenvalue .
Result 1: Suppose and are two eigen-pairs, with Then premultiplying both sides of by gives , and thus . This shows that cannot have more than distinct eigenvalues. If there were distinct eigenvalues, then the matrix , which has the corresponding eigenvectors as columns, would have column-rank and row-rank , which is impossible. In words: one cannot have more than orthonormal vectors in dimensional space. Suppose the distinct values are , with Thus each of the eigenvalues is equal to one of the
Result 2: If and are two eigen-pairs with the same eigenvalue then any linear combination suitably normalized, is also an eigenvector with eigenvalue . Thus the eigenvectors corresponding with an eigenvalue form a linear subspace of , with dimension, say, . This subspace can be given an orthonormal basis in an matrix . The number is the multiplicity of and by implication of the eigenvalue equal to .
Of course these results are only useful if eigen-pairs exist. We have shown that at least one eigen-pair exists, the one corresponding to the maximum of on the sphere. We now give a procedure to compute additonal eigen-pairs.
Consider the following algorithm for generating a sequence of matrices. We start with and .
- Test: If stop.
- Maximize: Computes the maximum of over . Suppose this is attained at an eigen-pair . If the maximizer is not unique, select an arbitrary one.
- Orthogonalize: Replace by
- Deflate: Set ,
- Update: Go back to step 1 with replaced by .
If then in step (2) we compute the largest eigenvalue of and a corresponding eigenvector. In that case there is no step (3). Step (4) constructs by deflation, which basically removes the contribution of the largest eigenvalue and corresponding eigenvector. If is an eigenvector of with eigenvalue , then by result (1) above. Also, of course, , so is an eigenvector of with eigenvalue . If is an eigenvector of with eigenvalue , then by result (2) we can choose such that and thus We see that has the same eigenvectors as , with the same multiplicities, except for , which now has its old multiplicity , and zero, which now has its old multiplicity . Now if is the eigenvector corresponding with , the largest eigenvalue of , then by result (1) is automatically orthogonal to , which is an eigenvalue of with eigenvalue zero. Thus step (3) is not ever necessary, although it will lead to more precise numerical computation.
Following the steps of the algorithm we see thatit defines orthonormal matrices , which moreover satisfy for , and with . Also where is the projector . This is the eigen decomposition or the spectral decomposition of a positive semi-definite .
Our algorithm stops when , which is the same as . If then the minimum eigenvalue is zero, and has multiplicity . is the orthogonal projector of the null-space of , with . Using the square orthonormal we can write the eigen decomposition in the form where the last diagonal elements of are zero. Equation can also be written as which says that the eigenvectors diagonalize and that is orthonormally similar to the diagonal matrix of eigenvalues,
We have shown that the largest eigenvalue and corresponding eigenvector exist, but we have not indicated , at least in this section, how to compute them. Conceptually the power method is the most obvious way. It is a tangential minorization method, using the inquality , which means that the iteration function is See the Rayleigh Quotient section for further details.
We now discuss a first easy generalization. If is real and symmetric but not necessarily positive semi-definite then we can apply our previous results to the matrix , with . Or we can apply it to . Or we can modify the algorithm if we run into an with maximum eigenvalue equal to zero. If this happens we switch to finding the smallest eigenvalues, which will be negative. No matter how we modify the constructive procedure, we will still find an eigen decomposition of the same form and as in the positive semi-definite case.
The second generalization, also easy, are generalized eigenvalues of a pair of real symmetric matrices . We now maximize over satisfying . In data analysis, and the optimization problems associated with it, we almost invariably assume that is positive definite. In fact we might as well make the weaker assumption that is positive semi-definite, and for all such that Suppose is an eigen decomposition of Change variables by writing as . Then and . We can find the generalized eigenvalues and eigenvectors from the ordinary eigen decomposition of . This defines the in , and the choice of is completely arbitrary.
Now suppose is the square orthonormal matrix of eigenvectors diagonalizing with the corresponding eigenvalues, and . Then and . Thus diagonalizes both and . For the more general case, in which we do not assume that for all with , we refer to De Leeuw [1982].