Warmup
Recall from last time we can set up a multiple linear regression model in the matrix form:
Give the name and dimensions of each term
Today’s goal
Derive the form of the estimates for the parameter vector .
Least Squares
Just like in simple linear regression, we’ll estimate by least squares. In simple linear regression this involved finding and to minimise the sum of squared residuals:
Your turn: What procedure do you use to minimise a function? E.g. if is a function of a single real value , how do you find the that minimises ?
(2 min discussion)
For multiple linear regression the least squares estimate of the is the vector that minimizes the sum of squared residuals:
Your turn Expand the matrix product on the right into four terms. Be careful with the order of matrix multiplication and recall .
Consider the terms:
Argue that these can be combined into the single term
(Hint: consider the dimensions of these terms)Finding the minimum
Now our objective is to find that minimises:
The usual procedure would be to take derivative with respect to , set to zero and solve for . Except is a vector! We need to use vector calculus.
Vector calculus
You should be familiar with the usual differentiation rules for scalar and :
There are analogs when we want to take derivative with respect to a vector :
- , where is a scalar
- , where is a vector
- , where is a matrix
Use the rules above to take the derivative of the sum of squared residuals with respect to the vector
Normal Equations
Setting the above derivative to zero leads to the Normal Equations. The least squares estimates satisfy:
If is invertible, the least squares estimates are (fill me in):
If has rank then will be invertible.
Fitted Values and Residuals
Plug in the least squares estimate for to find the fitted values and residuals
Hat matrix
The hat matrix is: named because it puts “hats” on the response, i.e. multiplying the response by the hat matrix gives the fitted values:
Your Turn: Show
Other properties of
- is symmetric, so is
- is idempotent (), and so is
- is invariant under (i.e. )
You can use these results to argue that the residuals are orthogonal to the columns of , i.e. show
Next time
What are the properties of the least squares estimates?