Derivatives of Eigenvalues and Eigenvectors

This appendix summarizes some of the results in De Leeuw [2007], De Leeuw [2008], and De Leeuw and Sorenson [2012]. We refer to those reports for more extensive calculations and applications.

Suppose $A$ and $B$ are two real symmetric matrices depending smoothly on a real parameter $\theta$ . The notation below suppresses the dependence on $\theta$ of the various quantities we talk about, but it is important to remember that all eigenvalues and eigenvectors we talk about are functions of $\theta$ .

The generalized eigenvalue $\lambda_s$ and the corresponding generalized eigenvector $x_s$ are defined implicitly by $Ax_s=\lambda_sBx_s$ . Moreover the eigenvector is identified by $x_s'Bx_s^{\ }=1$ . We suppose that in a neighborhood of $\theta$ the eigenvalue $\lambda_s$ is unique and $B$ is positive definite. A precise discussion of the required assumptions is, for example, in Wilkinson [1965] or Kato [1976].

Differentiating $Ax_s=\lambda_sBx_s$ gives the equation $(A-\lambda_sB)(\mathcal{D}x_s)=-((\mathcal{D}A)-\lambda_s(\mathcal{D}B))x_s+(\mathcal{D}\lambda_s)Bx_s, \tag{1}$ while $x_s'Bx_s=1$ gives $x_s'B(\mathcal{D}x_s)=-\frac12 x_s'(\mathcal{D}B)x_s. \tag{2}$ Premultiplying $\mathbf{(1)}$ by $x_s'$ gives $\mathcal{D}\lambda_s=x_s'((\mathcal{D}A)-\lambda_s(\mathcal{D}B))x_s$ Now suppose $AX=BX\Lambda$ with $X'BX=I$ . Then from $\mathbf{(1)}$ , for $t\not= s$ , premultiplying by $x_t'$ gives $(\lambda_t-\lambda_s)x_t'B(Dx_s)=-x_t'((\mathcal{D}A)-\lambda_s(\mathcal{D}B))x_s.$ If we define $g$ by $g_t\mathop{=}\limits^{\Delta}\begin{cases}\frac{1}{\lambda_t-\lambda_s}x_t'((\mathcal{D}A)-\lambda_s(\mathcal{D}B))x_s&\text{ for }t\not= s,\\ \frac12 x_t'(\mathcal{D}B)x_t&\text{ for }t=s, \end{cases}$ then $X'B(\mathcal{D}x_s)=-g$ and thus $\mathcal{D}x_s=-Xg$ .

A first important special case is the ordinary eigenvalue problem, in which $B=I,$ which obviously does not depend on $\theta$ , and consequently has $\mathcal{D}B=0$ . Then $\mathcal{D}\lambda_s=x_s'(\mathcal{D}A)x_s,$ while $g_t\mathop{=}\limits^{\Delta}\begin{cases}\frac{1}{\lambda_t-\lambda_s}x_t'(\mathcal{D}A)x_s&\text{ for }t\not= s,\\ 0&\text{ for }t=s. \end{cases}$ If we use the Moore_Penrose inverse the derivatives of the eigenvector can be written as $\mathcal{D}x_s=-(A-\lambda_s I)^+(\mathcal{D}A)x_s.$ Written in a different way this expression is $\mathcal{D}x_s=\sum_{t\not =s}\frac{u_{st}}{\lambda_s-\lambda_t}x_t,$ with $U\mathop{=}\limits^{\Delta}X'(\mathcal{D}A)X$ , so that $\mathcal{D}\lambda_s=u_{ss}$ .

In the next important special case is the singular value problem The singular values and vectors of an $n\times m$ rectangular $Z$ , with $n\geq m$ , solve the equations $Zy_s=\lambda_s x_s$ and $Z'x_s=\lambda_s y_s$ . It follows that $Z'Zy_s=\lambda_s^2y_s$ , i.e. the right singular vectors are the eigenvectors and the singular values are the square roots of the eigenvalues of $A=Z'Z$ .

Now we can apply our previous results on eigenvalues and eigenvectors. If $A=Z'Z$ then $\mathcal{D}A=Z'(\mathcal{D}Z)+(\mathcal{D}Z)'Z$ . We have, at an isolated singular value $\lambda_s$ , $\mathcal{D}\lambda_s^2=y_s'(Z'(\mathcal{D}Z)+(\mathcal{D}Z)'Z)y_s=2\lambda_s x_s'(\mathcal{D}Z)y_s,$ and thus $\mathcal{D}\lambda_s=x_s'(\mathcal{D}Z)y_s.$ For the singular vectors our previous results on eigenvectors give $\mathcal{D}y_s=-(Z'Z-\lambda_s^2 I)^+(Z'(\mathcal{D}Z)+(\mathcal{D}Z)'Z)y_s,$ and in the same way $\mathcal{D}x_s=-(ZZ'-\lambda_s^2 I)^+(Z(\mathcal{D}Z)'+(\mathcal{D}Z)Z')x_s.$

Now let $Z=X\Lambda Y'$ , with $X$ and $Y$ square orthonormal, and with $\Lambda$ and $n\times m$ diagonal matrix (with $\mathcal{rank}(Z)$ positive diagonal entries in non-increasing order along the diagonal).

Also define $U\mathop{=}\limits^{\Delta}X'(DZ)Y$ . Then $\mathcal{D}\lambda_s=u_{ss}$ , and $\mathcal{D}y_s =\sum_{t\not= s}\frac{\lambda_su_{st}+\lambda_t u_{ts}}{\lambda_s^2-\lambda_t^2} y_t,$ and $\mathcal{D}x_s =\sum_{t\not= s}\frac{\lambda_t u_{st}+\lambda_su_{ts}}{\lambda_s^2-\lambda_t^2} x_t.$ Note that if $Z$ is symmetric we have $X=Y$ and $U$ is symmetric, so we recover our previous result for eigenvectors. Also note that if the parameter $\theta$ is actually element $(i,j)$ of $Z$ , i.e. if we are computing partial derivatives, then $u_{st}=x_{is}y_{jt}$ .

The results on eigen and singular value decomposition can be applied in many different ways. mostly by simply using the product rule for derivatives, For a square symmetric $A$ or order $n$ , for example, we have $f(A)\mathop{=}\limits^{\Delta}\sum_{s=1}^n f(\lambda_s)x_sx_s'.$ and thus $\mathcal{D}f(A)=\sum_{s=1}^nDf(\lambda_s)(\mathcal{D}\lambda_s)x_sx_s'+f(\lambda_s)(x_s(\mathcal{D}x_s)'+(\mathcal{D}x_s)x_s').$ The generalized inverse of a rectangular $Z$ is $Z^+\mathop{=}\limits^{\Delta}\sum_{s=1}^r \frac{1}{\lambda_s}y_sx_s',$ where $r=\mathbf{rank}(Z)$ . Summation is over the positive singular values, and for differentiability we must assume that the rank of $Z$ is constant in a neighborhood of $\theta$ .

The Procrustus transformation of a rectangular $Z$ , which is the projection of $Z$ on the Stiefel manifold of orthonormal matrices, is $\mathbf{proc}(Z)\mathop{=}\limits^{\Delta}Z(Z'Z)^{-\frac12}=\sum_{s=1}^m x_sy_s',$ where we assume for differentiability that $Z$ is of full column rank.

The projection of $Z$ on the set of all matrices of rank less than or equal to $r$ , which is of key importance in PCA and MDS, is $\Pi_r(Z)\mathop{=}\limits^{\Delta}\sum_{s=1}^r\lambda_s x_sy_s'=Z\sum_{s=1}^ry_sy_s',$ where summation is over the $r$ largest singular values.