I.6.5.2: Optimal Scaling with ORDINALS

In LINEALS (section x.x.x) we try to find quantifications of the variables that linearize all bivariate regressions. De Leeuw [1988] has suggested to find standardized quantifications in such a way that the loss function f(y)=j{yjC jlD1C jy jyjC jly yC jy j} is minimized.

A more general loss function is g(y,z)=j(zjlD1jCjy)Dj(zjlD1jCjy), which must be minimized over both y and z. The zjl are m(m1) vectors, called regression targets, and target zjl has kj elements.

To see that this loss function generalizes (1) suppose we constrain z by requiring that zj is proportional to yj, i.e. zj=rjlyj. Then, using yjDjyj=1, g(y,R)=jr2jl2jrjyjCjy+jyCjD1jCjy. This is minimized over R by rj=yjCjy, and the minimum is precisely the loss function (1). Thus f(y)=minRg(y,R), and g is an augmentation of f. Block relaxation for g alternates minimization over R for fixed y, which we have shown to be easy, and minimization over y for fixed R, which is a modified eigenvalue problem of the kind discussed in BRAS3, section x.x.x. This is not necessarily simpler than the direct minimum eigenvalue problem for minimizing f in section x.x.x.

The major advantage from augmenting f is that it now becomes simple to incorporate quite general restrictions on the zj. For example, they can be required to be monotone with the original data, or a spline transformation, or a monotone spline. Or a mixture of these options. Thus we can constrain each individual regression functions D1jCjy to have one of a pre-determined number of shapes.

In ordinals.R we implement the three standard options of the Gifi system. A vector yj is treated as nominal, ordinal, or numerical. If it is nominal then it is unconstrained, except for the normalization. In that case the zj are also unconstrained for all . If yj is treated as ordinal is must be monotone with the data, and so must all zj. And a numerical yj must be linear with the data, together with its targets zj. Of course if all variables are numerical there is nothing to optimize, and we just compute correlations. If all variables are nominal there is nothing to optimize either, because we immediately get zero loss from any starting point.


[Insert ordinals.R Here](../code/ordinals.R)