[Home]

Table of contents


$ \newcommand{\v}{\vec} \newcommand{\hv}[1]{\hat{\vec#1}} \newcommand{\col}{\mathcal{C}} \newcommand{\argmin}{\mathrm{argmin}} $

What's a linear model?

We learn to solve a linear system of equations during our school days. Here is a familiar example: $3a+4b=10,$ $4a+b=9$ and $2a+3b=7.$

The solution turns out to be unique: $a=2$ and $b=1.$

Simple! But why would we ever want to solve such a system in real life? Here is one situation. We are finding the mass of two different atoms, $A$ and $B.$ We cannot weigh them separately. However, we can weigh different compounds like $A_3B_4$ and $A_4B$ and $A_2B_3.$ Hence the three equations.

I admit it is a contrived example. But even this example will show a difficulty that is present all the time in real life. When we measure anything we make random error. Repeatedly measuring the same quantity leads to slightly differing results. So we may get measurements like 9.8, 9.1 and 7.0 in place of the exact values 10, 9 and 7. So the system is now $$ 3a+4b\approx 9.8\\ 4a+b\approx 9.1\\ 2a+3b\approx 7.0 $$ Our first impulse may be just to pretend that the $\approx$'s are $=$'s, solve the system as before, then report the solution as approximate values for $x$ and $y.$ Just try this out, and you'll see that it's impossible. The system is inconsistent!

It might seem strange, but a real life measurement scenario mostly produces inconsistent systems. In mathematics an inconsistent system is just absurd. But from the view point of real life applications, we should better call them almost consistent. It should be possible to get values for the unknowns such that the LHS is pretty close to the RHS. Possibly, the simplest such example is when we measure the same length, $\ell,$ twice and get two values 10.50 and 10.49. This is actually an inconsistent system: $$ \ell = 10.50\\ \ell = 10.49 $$ Mathematically, this is impossible! However, when we write this more correctly as $$ \ell \approx 10.50\\ \ell \approx 10.49 $$ there is nothing to make us feel uncomfortable. In fact, we may even feel happy about the precision of the measurements.

Such an approximate system of linear equations is called a linear model. The traditional notation is $$ \v y = X \v \beta + \v \epsilon. $$ Here $\v y$ is the measurement vector, $X$ is the fixed known matrix, $\v \beta $ is the vector of unknowns and $\v \epsilon $ is the random error vector.

We call $\v y$ the response, $X$ the design matrix, $\v \beta $ the parameter vector.

For example, in our weighing example, we have $$ \v y = \left[\begin{array}{ccccccccccc}9.8\\9.1\\7.0 \end{array}\right],\quad X = \left[\begin{array}{ccccccccccc}3 & 4\\4 & 1\\2 & 3 \end{array}\right]\text{ and } \beta = \left[\begin{array}{ccccccccccc}a\\b \end{array}\right]. $$

Estimating $\h \beta $

This may be done by choosing that value of $\v \beta$ for which $\v y$ is as close as possible to $X\v \beta.$ In other words, $$ \hv \beta = \argmin_{\v \beta} \|\v y-X\v \beta\|. $$ Here $\|\cdots\|$ denotes the Euclidean distance. So we are using our familiar least squares method. Pictorially, this means projecting $\v y$ on $\col(X)$ and expressing the projection ($\hv y,$ the foot of the perpendicular) in terms of the columns of $X.$

For example, in the weighing case, we are looking for $a,b$ such that $X \beta $ is as close as possible to $y.$ Now $$ X \beta = \left[\begin{array}{ccccccccccc}3 & 4\\4 & 1\\2 & 3 \end{array}\right]\left[\begin{array}{ccccccccccc}a\\b \end{array}\right] = a \left[\begin{array}{ccccccccccc}3\\4\\2 \end{array}\right] + b \left[\begin{array}{ccccccccccc}4\\1\\3 \end{array}\right], $$ a linear combination of the two columns of $X.$ So our interest lies in finding the linear combination of the columns of X that is closest to $\v y.$
Slideshow
Pictures here.
Messages here.

Mathematically, this amounts to solving the normal equations $$ (X'X)\hv \beta = X'\v y. $$ (Why?) This system is always consistent. (Why?)

Unique?

Do the normal equations always produce unique solution? Not necessarily. For example, if the linear model is $$ 2.9 = \beta_1 + \beta_2 + \epsilon_1\\ 3.0 = \beta_1 + \beta_2 + \epsilon_2\\ 2.5 = \beta_3 + \epsilon_3, $$ then you can never hope to get $\h\beta_1$ and $\h\beta_2$ uniquely. But $\h\beta_3$ may be found uniquely. Also, $\h\beta_1+\h\beta_2$ is unique, i.e., whatever least squares solutions $\h\beta_1$ and $\h\beta_2$ you take, their sum will always be the same.

The foot of the perpendicular (from $\v y$ to $\col(X)$) is $\hv y$, and is unique. Since this is in $\col(X),$ so it can be expressed as a linear combination of the columns of $X.$ However, there may be many ways to do so. It will be unique if and only if the columns of $X$ are all independent.
Theorem Least square solution of $\v y = X \v\beta + \v \epsilon$ is unique if and only if $X$ is full column rank. In this case, $X'X$ is invertible, and the unique solution is given by $$ \hv \beta = (X'X) ^{-1} X'\v y. $$
Thus, the projection of $\v y$ onto $\col(X)$ is $$\hv y = X\hv \beta = \underbrace{X(X'X)^{-1} X'}_{P_X} \v y.$$ So $P_X = X(X'X)^{-1} X' $ is the (orthogonal) projection matrix for $\col(X).$

Recall from linear algebra that a real matrix is an orthogonal projection matrix if and only if it is symmetric and idempotent.

EXERCISE: Quickly check that $P_X$ is indeed symmetric and orthogonal.

Using R

We may use R to perform least squares estimation.

EXAMPLE:  Solve the weighing problem using R.

SOLUTION: First construct the design matrix and the response vector:
X = matrix(c(3,4,2,4,1,3)column-wise entries
,3nrow
,2ncol
) 
y = c(9.8,9.1,7.0)
Now invoke the lm function (lm is abbreviation of linear model) as follows:
lm(y ~ X-1)
The -1 may look wierd. Actually, R has the habit of adding a column of 1's before the design matrix. The -1 prevents that.

Exercises

  1. Solve the following approximate system using R: $$\begin{eqnarray*} 3a + 4b + c & \approx & 3.4\\ 3a + 4b + c & \approx & 3.5\\ 4a + 3b + 2c & \approx & 10.1\\ 4a + 3b + 2c & \approx & 9.8\\ 6a + 5b + 2c & \approx & 5.6 \end{eqnarray*}$$
  2. Let's see how R tackles a linear model where the design matrix that is not full column rank: $$ X = \left[\begin{array}{ccccccccccc} 1 & 1 & 0\\ 1 & 1 & 0\\ 1 & 0 & 1\\ 1 & 0 & 1\\ \end{array}\right],\quad \v y = \left[\begin{array}{ccccccccccc}3.4\\3.5\\10.5\\10.3 \end{array}\right]. $$ Here the first column of $X$ is a column of $1$'s. So you may just type the last two columns in R, and omit the -1.
  3. In the problem above R produced one least squares solution. But we know that there are infinitely many. Write down two more solutions. Can you write a general form for all least squares solutions here?
  4. R automatically stores various qunatities computed by lm. We shall explore some of them here. Let's work with the linear model from the last exercise. Create the full design matrix (including its first column) and type:
    myfit = lm(y~X-1)
    
    The variable myfit now contains lots of the information about the fit. You may extract the computed least squares solution $\hv \beta $ as
    myfit$coef
    
    This may be used in future computations. Compute $\h y = X\hv \beta.$ Remember that %*% is the R notation for matrix multiplication. This $\h y$ is the foot of the perpendicular dropped from $\v y$ to $\col(X).$ Usually $\hv y$ is called the fitted vector. R already computes them:
    myfit$fitted
    
    The vector $\v y - \hv y$ is called the residual vector:
    myfit$resid
    
    There are many other pieces of information packed in myfit:
    names(myfit)
    
  5. The closer $\v y$ is to $\col(X),$ the better is the fit of the linear model. Here are various ways to measure this "closeness":
    1. How close $\|\v y-\hv y\|^2$ is to 0.
    2. How close $\frac{\|\hv y\|^2 }{ \|\v y\|^2 }$ is to 1.
    Comment why these are not "good" measures of fit. Come up with pairs of examples where the fit is equally good (according to commonsense) yet these measures say otherwise.
  6. Consider a linear model $\v y = X \beta +\epsilon,$ where $X$ is not full col rank. Pick any basis of $\col(X).$ Stack these vectors side by side a columns to get a matrix $B.$ Let $\v w = B(B'B) ^{-1} B' \v y.$ Show that $\v w = \hv y$ irrespective of the choice of $B.$
  7. Consider a linear model with design matrix $$ X = \left[\begin{array}{ccccccccccc} 1 & 1 & 0\\ 1 & 1 & 0\\ 1 & 0 & 1\\ 1 & 0 & 1\\ \end{array}\right]. $$ If $\v \beta = (\beta_1, \beta_2, \beta_3)',$ then show that whatever least squares solution $\hv \beta $ you take, $\h \beta_2-\h \beta_3$ is always the same. Characterise all vectors $\v \ell\in{\mathbb R}^3$ such that $\v \ell' \hv \beta$ does not depend on the choice of the least squares solution.
  8. Generalise the characterisation from the last problem to arbitrary design matrix.
  9. Redo the above problem with the extra condition: $\beta_0-\alpha_0 = (\beta_1-a_1) x_0.$

Comments

To post an anonymous comment, click on the "Name" field. This will bring up an option saying "I'd rather post as a guest."