All in a day’s work

A friend of mine is teaching engineering-type stuff at another university, and he relayed a question to me, which I think he said came from his students.  He asked me if I could prove why a certain transformation works.  After a false start in interpretation, wherein I proved that it works, which he was already convinced of and for which didn’t need any further corroboration, I think I understand the spirit of his question well enough to provide a (hopefully) satisfactory answer.  And since I will be writing the answer up anyway, I may as well blog it up here.

The problem:  You have two coordinate frames, each Cartesian, with one frame rotated and displaced relative to the other.  Let $R$ be the matrix that describes the rotation, and let $v$ be the vector that gives the displacement of the origin.  A point, $p$, with coordinates given in the second frame (as a column vector) can be expressed in the first frame by the transformation $p_1 = Rp + v$.  If you picked the wrong direction to rotate or translate, replace $R$ with $R^{-1}$ or $v$ with $-v$.  Regardless, this transformation is annoyingly affine.

The solution:  Augment $R$ with $v$ as a new column, and row filled with zeros in all entries except the last, which will be 1.  Let’s call this augmented matrix $T$.  Append a 1 to the bottom of $p$, too.  Let’s call that $p'$.  Now $Tp'$ will also have a 1 in the bottom entry, and $p_1$ can be read off by ignoring that extra 1.  The question of whether this will work is left as an exercise to the reader; it is not difficult to convince oneself it will always work.

The puzzle:  Why does adding a dimension like this convert our transformation from one that is affine to one that is linear?

An aside, as to the engineering significance of the problem, suppose you have several rods connected by rotating joints.  If you want to know the position of the end of the last rod, relative to the base of the first rod, this kind of transformation, possibly composed several times, would be a way to determine that position.  The potential for composing the transformation several times is very good reason why it is so nice that the affine transformation can be converted to a linear one.

The justification:  Generalize the problem to the case where the direction of the displacement is fixed, but the magnitude is not.  That is, $p_1 = Rp +\lambda v$, where $\lambda \in \mathbb{R}$.  The reason for this generalization is that it makes the affine transformation we are interested in be a special case of this transformation, with $\lambda=1$, and just as importantly, this transformation is a linear combination of linear transformations!  The value of $\lambda$ is independent of the rotation, which means we have the side effect of increasing the dimension of the transformation by 1, as seen in the solution.

A calculation analogous to the exercise above shows for this generalization, the matrix representing it is $R$, augmented by $v$.  This is not a square matrix, but square matrices are rather convenient, since the vector space in the codomain is the same as in the domain.  We can make it square by adding a row.  This row will be filled with zeros except the last column, which will be 1, in order to make the determinant 1.  This is exactly $T$.  Now let’s take a look at why $p'$ is what it is.  In order to translate by $\lambda v$, $\lambda$ is appended to $p$ instead of 1.  But as we have already seen, the transformation we are actually interested in is when $\lambda=1$.  Therefore the linearization ought to have the form prescribed by the solution.