Loglinear Models | Block Relaxation Algorithms in Statistics -- Part I

I.4.3.3: Loglinear Models

Let $\mathcal{L}(\theta)=\sum_{k=1}^K n_k\log\lambda_k(\theta)-\lambda_k(\theta),$ be a Poisson-log-likelihood with $\lambda_k(\theta)=\exp\sum_{j=1}^m x_{kj}\theta_j.$ We see that $\mathcal{D}_j\mathcal{L}(\theta)=\sum_{k=1}^n n_kx_{kj}-\sum_{k=1}^n \lambda_k(\theta)x_{kj},$ and $\mathcal{D}_{j\ell}\mathcal{L}(\theta)=-\sum_{k=1}^n\lambda_k(\theta)x_{kj}x_{k\ell}.$ Thus the log-likelihood is concave. Normally we would apply a safe-guarded version of Newton's method, but here we want to illustrate CCA.

Now suppose $X=\left\{x_{kj}\right\}$ is a design-type matrix, with elements equal to $0$ or $1.$ Let $\mathcal{K}_j\mathop{=}\limits^{\Delta}\left\{k\mid x_{kj}=1\right\}.$ Then the likelihood equations are $\sum_{k\in\mathcal{K}_j}n_k=\sum_{k\in\mathcal{K}_j}\lambda_k(\theta).$ Solving each of these in turn is CCA (since we are maximizing), which is also known in this context as the iterative propertional fitting or IPF algorithm.

We have, using $e_j$ for the coordinate directions, $\lambda_k(\theta+\tau e_j)=\begin{cases} \lambda_k(\theta)& \text{if $k\not\in\mathcal{K}_j$,}\\ \mu\lambda_k(\theta)& \text{if $k\in\mathcal{K}_j$}, \end{cases}$ with $\mu=\exp\tau.$ This explains the name of the algorithm, because the $\lambda_k$ in $\mathcal{K}_j$ are adjusted with the same proportionality factor.

Thus the optimal $\mu$ is simply $\mu={\frac{\sum_{k\in\mathcal{K}_j}n_k}{\sum_{k\in\mathcal{K}_j}\lambda_k(\theta)}}.$

This example can be extended to the case in which the elements of the design matrix are $-1,0,$ and $+1.$ We define $\mathcal{K}_j^+\mathop{=}\limits^{\Delta}\left\{k\mid x_{kj}=1\right\},$ and $\mathcal{K}_j^-\mathop{=}\limits^{\Delta}\left\{k\mid x_{kj}=-1\right\}.$ We now have to solve the quadratic equation $\mu^2\sum_{k\in\mathcal{K}_j^+}\lambda_k(\theta)-\mu\Delta_j-\sum_{k\in\mathcal{K}_j^-}\lambda_k(\theta)=0,$ with $\Delta_j\mathop{=}\limits^{\Delta}\sum_{k\in\mathcal{K}_j^+}n_k-\sum_{k\in\mathcal{K}_j^-}n_k$ for the proportionality factor. If $\mathcal{K}_j^-=\emptyset$ then $\mu={\frac{\sum_{k\in\mathcal{K}_j^+}n_k}{\sum_{k\in\mathcal{K}_j^+}\lambda_k(\theta)}}$ as before. If $\mathcal{K}_j^+=\emptyset$ then $\mu={\frac{\sum_{k\in\mathcal{K}_j^-}\lambda_k(\theta)}{\sum_{k\in\mathcal{K}_j^-}n_k}}$ If the positive and negative index sets are both nonempty, then the quadratic always has one positive and one negative root, and we select the positive one.

Basically the same CCA/IPF technique can also be applied if the elements of $X$ are arbitrary integers. To avoid trivialities we assume each column of $X$ has at least one non-zero element. In that case solving for $\mu$ amounts to solving a higher degree polynomial equation. Suppose the non-zero elements of column $j$ of $X$ are from a set $\mathcal{I}_j$ of integers. Elements of $\mathcal{I}_j$ can be positive or negative. Define $\begin{align*} \overline{n}_{ij}&\mathop{=}\limits^{\Delta}\sum\{n_k\mid x_{kj}=i\},\\ \overline{\lambda}_{ij}(\theta)&\mathop{=}\limits^{\Delta} \sum\{\lambda_k(\theta)\mid x_{kj}=i\}. \end{align*}$ Also define $\Delta_j\mathop{=}\limits^{\Delta}\sum_{i\in\mathcal{I}_j}i\overline{n}_{ij}.$ To find the optimal $\mu$ for coordinate $j$ we must solve $g_j(\mu)=\Delta_j$ , where $g_j(\mu)\mathop{=}\limits^{\Delta}\sum_{i\in\mathcal{I}_j}\mu^i i\overline{\lambda}_{ij}(\theta).$ Note that $\mathcal{D}g_j(\mu)>0$ for all $\mu>0$ , i.e. $g_j$ is strictly increasing. Let $i_j^+$ be the maximum of the $i\in\mathcal{I}_j$ and let $i_j^-$ be the minimum. We can distinguish three different behaviors of $g_j$ on the positive reals.

If $i_j^->0$ then $g_j$ increases from 0 to $+\infty$ .
If $i_j^+<0$ then $g_j$ increases from $-\infty$ to 0.
If $i_j^-<0$ and $i_j^+>0$ then $g_j$ increases from $-\infty$ to $+\infty$ .

In all three cases there is a unique positive root of the equation $g_j(\mu)=\Delta_j$ . To solve we note that if $i_j^->0$ we need to find the unique positive real root of a polynomial of degree $i_j^+$ . If $i_j^+<0$ we solve the equation for $\frac{1}{\mu}$ , again finding the unique positive real root of a polynomial of degree $-i_j^-$ . In case 3, in which $\mathcal{I}_j$ has both negative and positive elements, we multiply both sides of the equation by $\mu^{-{i_j}^-}$ to get $\sum_{i\in\mathcal{I}_j}\mu^{i-{i_j}^-} i\overline{\lambda}_{ij}(\theta)=\Delta_j\mu^{-{i_j}^-},$ which is a polynomial equation of degree $i_j^+-i_j^-$ , again with a single positive real root. I have written an R program for this general case. The function polyLogLinF does the computations, the function polyLogLin is the driver for the iterations.

[Insert polyLoglin.R Here](../code/polyLoglin.R)

Consider the example with

> x
      [,1] [,2] [,3] [,4]
 [1,]    1    1   -1    0
 [2,]    1    1    1    0
 [3,]    1    1   -1    0
 [4,]    1    2    1    0
 [5,]    1    2   -1    0
 [6,]    1    2    1   -1
 [7,]    1    3   -1   -1
 [8,]    1    3    1   -1
 [9,]    1    3   -1   -1
[10,]    1    0    1   -1

and n equal to 1:10. We find, for the final iterations,

Iteration:   34 fold:    4.83068380 fnew:    4.83068066
Iteration:   35 fold:    4.83068066 fnew:    4.83067878
Iteration:   36 fold:    4.83067878 fnew:    4.83067765
Iteration:   37 fold:    4.83067765 fnew:    4.83067697
$lbd
 [1] 3.049407 2.988786 3.049407 2.925556 2.984896 7.968232 7.957861 7.799660
 [9] 7.957861 8.316384

$f
[1] 4.830677

$theta
[1]  1.12628967 -0.02138247 -0.01004005 -1.00197798

Note that if we say

polyLogLin(n,x+3)

then we find the same solution, although much more slowly,

Iteration:  428 fold:    4.83072092 fnew:    4.83071986
Iteration:  429 fold:    4.83071986 fnew:    4.83071882
Iteration:  430 fold:    4.83071882 fnew:    4.83071780
Iteration:  431 fold:    4.83071780 fnew:    4.83071681
$lbd
 [1] 3.049790 2.993221 3.049790 2.932093 2.987506 7.967771 7.952558 7.805050
 [9] 7.952558 8.303460

$f
[1] 4.830717

$theta
[1]  1.053848982 -0.020633792 -0.009361352 -0.999688396