$\newcommand{\v}{\vec}$

Intuitive understanding of rank of design matrix

We have seen that the rank of the design matrix plays an important role in determining uniqueness of the least squares solution. The solution is unique if and only if $X$ is full column rank, i.e., rank equals number of columns. In practice, however, this condition may not be met. We can of course apply some linear algebra algorithm (like Gaussian elimination) to find the rank and/or find a subset of columns that span $\col(X).$ However, it is often possible to avoid these numerical algorithms and resolve the problem intuitively. That is what we are going to learn now.

Guessing if rank $<$ number of columns

Since we know that this is equivalent to nonuniqueness of least squares solution, hence we try to tweak one least squares solution into another. If we succeed then the rank must be less than number of columns.

EXAMPLE: Consider the 1-way ANOVA model: $y_{ij} = \mu + \alpha_i + \epsilon_{ij}.$

Suppose that I give you some least squares solution $\h \mu$ and $\h \alpha_i$'s. Now the intuitive thinking goes like this:

Since $y_{ij}\approx \mu + \alpha_i,$ we may think as if $\mu + \alpha_i$'s are being "watched" by $y_{ij}$'s. If any of the $\mu + \alpha_i$'s change, then it would ring an alarm bell.

But it is quite possible that we can tweak $\mu $ and $\alpha_i$'s so that $\mu + \alpha_i$'s never change, then that would give us a new least squares solution. For instance, add 5 to $\mu,$ and adjust by subtracting 5 from all the $\alpha_i$'s.

This shows that $X$ is not full column rank.

Here is another example.

EXAMPLE: Again we consider a 1-way ANOVA model: $y_{ij} = \mu_i + \epsilon_{ij}.$ Here the $\mu_i$'s are "watched". So can't do any tweaking without getting detected. Hence the design matrix is full column rank here.

Guessing $r(X)$

The same intuitive way of thinking can often allow us to guess the $r(X)$.

EXAMPLE: Again consider the model: $y_{ij} = \mu + \alpha_i + \epsilon_{ij}$ for $i=1,...,p,$ say. The range of $j$ does not really matter for finding $r(X)$. (Why?)

There are $p+1$ columns in $X.$ We have already seen that $X$ is not full column rank. Hence $r(X) < p+1.$ To guess $r(X)$ we shall again play the "tweak parameters without setting off the alarm" game. But this time we shall impose an extra constraint: pick any parameter (just any!), say $\mu,$ and never tweak it. Now you'll see that no tweaking is possible. Since you can tweak neither $\mu+\alpha_i$ nor $\mu $, hence you cannot tweak $\alpha_i$ either. Thus, just one constraint is enough to prevent tweaking. The conclusion is: $r(X)$ is exactly one less than the number of columns.

Here is a more complicated example.

EXAMPLE: The 2-way ANOVA model without interaction:$y_{ij} = \mu+\alpha_i+\beta_j+\epsilon_{ij}.$

Here the "watched" quantities are $\mu+\alpha_i+\beta_j.$ Clearly, we can add something to $\mu$ and adjust by subtracting that amount from all the $\alpha_i$'s (or all the the $\beta_j$'s). So not full column rank.

To guess the exact rank, let's impose an additional constraint: "Thou shalt not tweak $\mu$."

Still we can manage to tweak the $\alpha_i$'s and $\beta_j$'s without letting off the alarm bell. For instance, add 5 to all the $\alpha_i$'s and subtract the same amount from all the $\beta_j$'s.

OK, pick any other parameter that is not already fixed by earlier constraints (say $\alpha_1$) and impose a new constraint: "Thou shalt not tweak $\alpha_1$ either."

Now, $\mu $ and $\alpha_1$ both being fixed, and $\mu+\alpha_1+\beta_j$'s being watched, we cannot tweak any of the $\beta_j$'s. So none of the other $\alpha_i$'s can be tweaked either. Hence no tweaking at all! And we needed just two constraints.

Conclusion: $r(X)$ is two less than the number of columns.

Reducing $X$ to a full column rank matrix

"Reducing $X$ to a full column rank matrix" means, linear algebraically, picking a subset of columns of $X$ that constitute a basis for $\col(X).$ Finding a column-echelon form for $X$ is one possible sledge hammer to break this peanut. However, our "tweak without letting off the alarm" game again comes to help. Indeed, it is preferable to the sledge hammer method, because the particular least squares solution that we shall get by the intuitive method also has better interpretability.

EXAMPLE: Consider the 1-way ANOVA model once again: $y_{ij} = \mu+\alpha_i+\epsilon_{ij}.$

Here is one possible scenario where it could be used. We have three different fertilisers None, Compost and NPK. We want to see their effect on the yield of paddy. Here the constraint $\alpha_1 = 0$ is a suitable one, since None is like a reference case. With this constraint the remaining parameters have the following interpretation:

$\mu$ is the effect of no fertiliser.
$\alpha_2$ is the extra effect due to Compost,
$\alpha_3$ is the extra effect due to NPK.

However, if the three fertilisers were Mg, Compost and NPK, then a more natural constraint would be $\sum \alpha_i = 0,$ leading to the following interpretation:

$\mu$ is the overall mean effect.
$\alpha_i$ is the extra effect due to the $i$-th fertiliser.

Another natural constraint would be: $\mu = 0.$ Here the interpretation is even simpler:

$\alpha_i$ is the effect of the $i$-th fertiliser.

However, most people will prefer the constraint $\sum \alpha_i = 0$ to the constraint $\mu = 0,$ because under the former the signs of the $\h \alpha_i$'s immediately gives a clue to which fertiliser is a good and which is bad.

Each such constraint is effectively choosing a basis of $\col(X)$ leading to a unique least squares solution. Each software has its favourite constraint, which may not be the natural one for a given context. But it is easy to convert one least squares solution to another that satisfies a natural set of constraints. The next example illustrates this.

EXAMPLE: Consider the 1-way ANOVA model $y_{ij} = \mu + \alpha_i + \epsilon_{ij}.$ for $i=1,2,3$ and $j=1,...,10.$

R uses the constraint $\alpha_1 = 0.$

However, we want the constraint $\sum \alpha_i = 0.$

If the estimates produced by R are $$ \h \mu = 23.4, \quad \h \alpha_1 = 0,\quad \h \alpha_2 = 45.9,\quad \h \alpha_3 = -3.4, $$ then find the estimates that satisfies our constraint.

SOLUTION: Just average the $\h \alpha_i$'s and subtract this from all the $\h \alpha_i$'s. Adjust by adding the same quantity to $\h \mu.$

Notice that you really do not need to know what constraint(s) R uses internally in order to impose your set of constraints.

Exercises

For the model $y_{ij} = \mu + \alpha_i + \epsilon_{ij},$ a software produces the estimates $$ \h \mu = 23.4, \quad \h \alpha_1 = 2.0,\quad \h \alpha_2 = 45.9,\quad \h \alpha_3 = -3.4. $$ Find the estimates under the constraint $\mu = 0.$
Consider the model: $y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_{ij} + \epsilon_{ijk},$ for $i=1,2,3$ and $j=1,2$ and $k=1,...,5.$ Is the design matrix full column rank? Find its rank. Find two possible sets of constraints to make the solution unique.
This exercises will give you an idea why the "tweak without letting off the alarm" game always detects whether $X$ is not full column rank. Consider a linear model $\v y = X \v \beta + \v \epsilon,$ where $\v \beta = (\beta_1,...,\beta_4).$ It is seen that if we tweak by adding 5 to $\beta_1$ and subtracting 3 from $\beta_2$ and adding 1 to $\beta_3$ (leaving $\beta_4$ unaltered), then the alarm does not go off. In other words, we added the vector $(5,-3,1,0)'$ to $\v \beta$ without firing the alarm. We shall call such a vector a tweak vector (not a standard term).
1. Show that the set of all tweak vectors is a subspace.
2. How is this subspace related to $X?$
3. Prove that the existence of a nonnull tweak vector implies $X$ is not full column rank.
This exercise is a continuation of the last. Here you'll see why the "tweak without letting off the alarm" game always guesses the rank correctly. Again consider the same linear model as in the last problem. Suppose that the only tweak vector of the form $(0,0,a,b)'$ is the zero vector. What can you conclude about $r(X)?$
Consider the linear model $y_{i_1\cdots i_t} = \mu + \alpha_{1i_1}+\cdots + \alpha_{ti_t}+\epsilon_{i_1\cdots i_t}.$ What is the rank of the design matrix here?

Comments

To post an anonymous comment, click on the "Name" field. This will bring up an option saying "I'd rather post as a guest."

Table of contents