Least squares estimates of the regression parameters

Warmup

Recall from last time we can set up a multiple linear regression model in the matrix form: $y = X β + ϵ$

Give the name and dimensions of each term

Today’s goal

Derive the form of the estimates for the parameter vector $β$ .

Least Squares

Just like in simple linear regression, we’ll estimate $β$ by least squares. In simple linear regression this involved finding $\hat{β_{0}}$ and $\hat{β_{1}}$ to minimise the sum of squared residuals: $sum of squared residuals SLR = \sum_{i = 1}^{n} e_{i}^{2} = \sum_{i = 1}^{n} {(y_{i} - ({\hat{β}}_{0} + {\hat{β}}_{1} x_{i}))}^{2}$

Your turn: What procedure do you use to minimise a function? E.g. if $f (x)$ is a function of a single real value $x$ , how do you find the $x$ that minimises $f (x)$ ?

(2 min discussion)

For multiple linear regression the least squares estimate of the $β$ is the vector $\hat{β}$ that minimizes the sum of squared residuals: $sum of squared residuals MLR = \sum_{i = 1}^{n} e_{i}^{2} = | | e | |^{2} = {(y - X \hat{β})}^{T} (y - X \hat{β})$

Your turn Expand the matrix product on the right into four terms. Be careful with the order of matrix multiplication and recall ${(u X)}^{T} = X^{T} u^{T}$ .

$\sum_{i = 1}^{n} e_{i}^{2} = | | e | |^{2} = {(y - X \hat{β})}^{T} (y - X \hat{β})$

Consider the terms: $- {\hat{β}}^{T} X^{T} y and - y^{T} X \hat{β}$

Argue that these can be combined into the single term $- 2 {\hat{β}}^{T} X^{T} y$

(Hint: consider the dimensions of these terms)

Finding the minimum

Now our objective is to find $\hat{β}$ that minimises: $y^{T} y - 2 {\hat{β}}^{T} X^{T} y + {\hat{β}}^{T} X^{T} X \hat{β}$

The usual procedure would be to take derivative with respect to $\hat{β}$ , set to zero and solve for $\hat{β}$ . Except $\hat{β}$ is a vector! We need to use vector calculus.

Vector calculus

You should be familiar with the usual differentiation rules for scalar $a$ and $x$ :

$\frac{\partial}{\partial x} a = 0$
$\frac{\partial}{\partial x} a x = a$
$\frac{\partial}{\partial x} a x^{2} = 2 a x$

There are analogs when we want to take derivative with respect to a vector $x$ :

$\frac{\partial}{\partial x} a = 0$ , where $a$ is a scalar
$\frac{\partial}{\partial x} x^{T} u = u$ , where $u$ is a vector
$\frac{\partial}{\partial x} x^{T} A x = (A + A^{T}) x$ , where $A$ is a matrix

Use the rules above to take the derivative of the sum of squared residuals with respect to the vector $\hat{β}$

$\begin{aligned} \frac{\partial}{\partial \hat{β}} (y^{T} y - 2 {\hat{β}}^{T} X^{T} y + {\hat{β}}^{T} X^{T} X \hat{β}) & = \end{aligned}$

Normal Equations

Setting the above derivative to zero leads to the Normal Equations. The least squares estimates satisfy: $X^{T} y = X^{T} X \hat{β}$

If $X^{T} X$ is invertible, the least squares estimates are (fill me in): $\hat{β} = {()}^{- 1}^{T} y$

If $X$ has rank $p$ then $X^{T} X$ will be invertible.

Fitted Values and Residuals

Plug in the least squares estimate for $\hat{β}$ to find the fitted values and residuals $\begin{aligned} \hat{y} = X \hat{β} = \\ \hat{ϵ} = e = y - X \hat{β} = \end{aligned}$

Hat matrix

The hat matrix is: $H = X {(X^{T} X)}^{- 1} X^{T}$ named because it puts “hats” on the response, i.e. multiplying the response by the hat matrix gives the fitted values: $H y = \hat{y}$

Your Turn: Show $(I - H) X = 0 0$

Other properties of $H$

$H$ is symmetric, so is $(I - H)$
$H$ is idempotent ( $H^{2} = H$ ), and so is $I - H$
$X$ is invariant under $H$ (i.e. $H X = X$ )
$(I - H) H = H (I - H) = 0$

You can use these results to argue that the residuals are orthogonal to the columns of $X$ , i.e. show $e^{T} X = 0 0$ $\begin{aligned} e^{T} X & = ((I - H) Y)^{T} X plug in form for residuals \\ = Y^{T} (I - H)^{T} X distribute transpose \\ = Y^{T} (I - H) X symmetry \\ = Y^{T} 0 from above \\ = 0 \end{aligned}$

Next time

What are the properties of the least squares estimates?

Least squares estimates of the regression parameters Jan 16 2019