Summary of last week
For the linear regression model \[ Y = X\beta + \epsilon \] where \(\E{\epsilon} = 0_{n\times 1}\), \(\Var{\epsilon} = \sigma^2 I_n\), and the matrix \(X_{n \times p}\) is fixed with rank \(p\).
The least squares estimates are \[ \hat{\beta} = (X^TX)^{-1}X^TY \]
Furthermore, the least squares estimates are BLUE, and \[ \E{\hat{\beta}} = \beta, \qquad \Var{\hat{\beta}} = \sigma^2 (X^TX)^{-1} \]
We have not used any Normality assumptions to show these properties.
Today
Verify:
\[ \E{\hat{\sigma}^2} = \E{\tfrac{1}{n-p}\sum_{i=1}^n{e_i^2}} = \sigma^2 \]
Add Normal assumption to get inference on regression coefficents.
Go over the estimation of \(\sigma\)
Strategy: Write \(e_i^2\) as a linear combination of uncorrelated variables, \(\epsilon_i\).
Find expected value of \(||e||^2\) in terms of \(\text{trace}(I-H)\)
Show \(\E{\epsilon^T(I-H)\epsilon} = \sigma^2 \text{trace}(I-H)\)
Hint \[ x^TAx = \sum_{i = 1}^n\sum_{j = 1}^n x_i x_j A_{ij} \] where \[ x = \left(x_1, x_2, \ldots, x_n \right)^T, \quad A = \begin{pmatrix} A_{11}& A_{12}& \ldots \\ A_{21}& A_{22}& \ldots \\ \vdots & & \end{pmatrix}_{n\times n} \]
Find expected value of \(||e||^2\) in terms of \(\text{trace}(I-H)\)
\[ \E{\epsilon^T(I-H)\epsilon} = \phantom{\hspace{3in}} \]
Find \(\text{trace}(I-H)\)
Show \[\text{trace}(I-H)=n-p\]
Hint: \[ \begin{aligned} \text{trace}(A + B) &= \text{trace}(A) + \text{trace}(B) \\ \text{trace}(AB) &= \text{trace}(BA) \end{aligned} \]
\[ \text{trace}(I-H) = \phantom{aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} \]
Put it all together
\[ \E{\hat{\sigma}^2} = \phantom{\hspace{3in}} \]
Inference on the regression coefficients
Normality assumption
Assume \(\epsilon \sim N(0, \sigma^2 I)\).
Important reminders:
Leads to: \[ Y \sim N(\qquad, \qquad) \]
\[ \hat{\beta} \sim N(\qquad, \quad \qquad) \]
Inference on individual parameters
With the addition of the Normal assumption, it can be shown that
\[ \frac{\hat{\beta_j} - \beta_j}{SE(\hat{\beta_j})} \sim t_{n-p} \]
leads to the usual construction of tests and confidence intervals for single parameters.
Exercises
See handout.