Least squares estimates of the regression parameters Jan 16 2019

Warmup

Recall from last time we can set up a multiple linear regression model in the matrix form: y=Xβ+ϵ

Give the name and dimensions of each term

Today’s goal

Derive the form of the estimates for the parameter vector β.

Least Squares

Just like in simple linear regression, we’ll estimate β by least squares. In simple linear regression this involved finding β0^ and β1^ to minimise the sum of squared residuals: sum of squared residuals SLR=i=1nei2=i=1n(yi(β^0+β^1xi))2

Your turn: What procedure do you use to minimise a function? E.g. if f(x) is a function of a single real value x, how do you find the x that minimises f(x)?

(2 min discussion)

For multiple linear regression the least squares estimate of the β is the vector β^ that minimizes the sum of squared residuals: sum of squared residuals MLR=i=1nei2=||e||2=(yXβ^)T(yXβ^)

Your turn Expand the matrix product on the right into four terms. Be careful with the order of matrix multiplication and recall (uX)T=XTuT.

i=1nei2=||e||2=(yXβ^)T(yXβ^)

Consider the terms: β^TXTyandyTXβ^

Argue that these can be combined into the single term 2β^TXTy

(Hint: consider the dimensions of these terms)

Finding the minimum

Now our objective is to find β^ that minimises: yTy2β^TXTy+β^TXTXβ^

The usual procedure would be to take derivative with respect to β^, set to zero and solve for β^. Except β^ is a vector! We need to use vector calculus.

Vector calculus

You should be familiar with the usual differentiation rules for scalar a and x:

There are analogs when we want to take derivative with respect to a vector x:

Use the rules above to take the derivative of the sum of squared residuals with respect to the vector β^

β^(yTy2β^TXTy+β^TXTXβ^)=

Normal Equations

Setting the above derivative to zero leads to the Normal Equations. The least squares estimates satisfy: XTy=XTXβ^

If XTX is invertible, the least squares estimates are (fill me in): β^=(XTX)1XTy

If X has rank p then XTX will be invertible.

Fitted Values and Residuals

Plug in the least squares estimate for β^ to find the fitted values and residuals y^=Xβ^=ϵ^=e=yXβ^=

Hat matrix

The hat matrix is: H=X(XTX)1XT named because it puts “hats” on the response, i.e. multiplying the response by the hat matrix gives the fitted values: Hy=y^

Your Turn: Show (IH)X=00

Other properties of H

You can use these results to argue that the residuals are orthogonal to the columns of X, i.e. show eTX=00 eTX=((IH)Y)TXplug in form for residuals=YT(IH)TXdistribute transpose=YT(IH)Xsymmetry=YT0from above=0

Next time

What are the properties of the least squares estimates?