Today
- Another view of F-tests
- Confidence intervals for single parameters
- Confidence intervals for linear combinations of parameters
- Confidence intervals for parameters jointly
Last time…
Certain hypotheses of interest can be set up as competing models. A full model and a simpler model (nested in the full model). A.K.A testing models.
Identify the models of interest. Fit both. Check fit of full model. Find F-statistic, and answer questions of interest.
Another way to set up F-tests A.K.A testing linear parametric functions
Assuming the regression model: \[ Y = X\beta + \epsilon, \quad \epsilon \sim N(0, \sigma^2 I) \] Consider the hypotheses: \[ \begin{aligned} H_0: K^T \beta = m \\ H_1: K^T \beta \ne m \end{aligned} \] where \(K_{k \times p}\) matrix with rank(K) = k.
Then under the null hypothesis, \[ F = \frac{\left((K^T\beta - m)^T \left(K^T(X^TX)^{-1}K \right)^{-1} (K^T\beta - m)\right)/k}{\text{RSS}/(n-p)} \sim F_{k, n-p} \]
(Don’t memorise for ST552, maybe for comps)
You get the same answer
This alternative is equivalent to the model testing setup we considered. Every null hypothesis of the form \(K^T \beta = m\) is comparing a full and reduced model and vice versa.
For example, consider \[ K = \left(\begin{matrix} 0 \\ \vdots \\ 0 \\ 1\\ 0 \\ \vdots \\ 0 \end{matrix}\right), \quad m = 0 \hspace{3cm} \] where the 1 in \(K\) occurs in the \(i\)th row.
What is the null hypothesis being tested?
Your turn
What are \(K\) and \(m\) for exercises 1 and 5 from the handout from last time? HW#4
Confidence intervals for individual \(\beta_j\)
The t-test for an individual parameter can be flipped around to give \(100(1 - \alpha)\%\) confidence intervals of the form
\[ \hat{\beta_j} \pm t^{(\alpha/2)}_{n-p} \SE{\hat{\beta_j}} \]
(Remember \(\SE{\hat{\beta_j}}\) is coming from the diagonal entry of the estimated variance-covariance matrix.)
Coagulation times
Dataset comes from a study of blood coagulation times. 24 animals were randomly assigned to four different diets and the samples were taken in a random order.
Consider the model: \[ \begin{aligned} \text{Coagulation time (s)}_i &= \beta_0 + \beta_1 1\{\text{Diet B}\}_i \\ &+ \beta_2 1\{\text{Diet C}\}_i + \beta_3 1\{\text{Diet D}\}_i + \epsilon_i \end{aligned} \]
Coagulation
data(coagulation, package = "faraway")
ggplot(coagulation, aes(diet, coag)) +
geom_dotplot(binaxis = "y", binwidth = 1)
Your turn: cont.
fit <- lm(coag ~ diet, data = coagulation)
broom::tidy(fit) %>%
knitr::kable(digits = 2)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 61 | 1.18 | 51.55 | 0 |
dietB | 5 | 1.53 | 3.27 | 0 |
dietC | 7 | 1.53 | 4.58 | 0 |
dietD | 0 | 1.45 | 0.00 | 1 |
Find a 95% CI for \(\beta_0\)?
\(t_{n-p}^{(0.975)}= t_{20}^{(0.975)} = 2.09\)
Your turn: cont.
In R:
broom::tidy(fit, conf.int = TRUE)
# OR
(cis <- confint(fit))
Confidence intervals for linear combinations of parameters of \(\beta_j\)
Similarly, confidence intervals for a linear combination of the parameters, \(c^T\beta\) where \(c_{p\times 1}\), can be formed with \[ c^T\hat{\beta} \pm t^{(\alpha/2)}_{n-p} \sqrt{\hat{\sigma}^2 c^T(X^TX)^{-1}c} \]
Your turn
With the coagulation
example \[
\begin{aligned}
\text{Coagulation time (s)}_i &= \beta_0 + \beta_1 1\{\text{Diet B}\}_i \\
&+ \beta_2 1\{\text{Diet C}\}_i + \beta_3 1\{\text{Diet D}\}_i + \epsilon_i
\end{aligned}
\]
What is \(c\) for the linear combination \(\beta_0 - \beta_1\)?
Find \(c^T(X^TX)^{-1}c\).
X <- model.matrix(fit)
round(solve(t(X) %*% X), 2)
## (Intercept) dietB dietC dietD
## (Intercept) 0.25 -0.25 -0.25 -0.25
## dietB -0.25 0.42 0.25 0.25
## dietC -0.25 0.25 0.42 0.25
## dietD -0.25 0.25 0.25 0.37
Joint confidence regions
A joint \(100(1-\alpha)\%\) confidence for the vector \(\beta\) can be formed using, \[ (\hat{\beta} - \beta)^TX^TX(\hat{\beta} - \beta) \le p \hat{\sigma}^2 F^{(\alpha)}_{p, n-p} \] and results in \(p\)-dimensional ellipsoids (very hard to visualise, but essential for communicating joint uncertainty when the parameters are correlated).