Stat 552

# Homework 4

##### Due 3pm Dec 5th in class

Reading:Faraway Chapter 3 (not 3.3 and not 3.6) & 4

Due Feb 5th 3pm Hardcopy in class, .Rmd on canvas. (Use #1 to practice your math typesetting in markdown).

1. Your turn from Monday: Consider hypotheses of the form $K^T\beta = m$. What are $K$ and $m$ for exercises 1 and 5 from the F-test exercises handout?

2. Consider the cheddar cheese example and full model fitted,

Examine the output from

Each line corresponds to an F-test. What models are being compared? Verify your answers by fitting the models explicitly and doing three separate anova calls (you might you need to use the scale argument to anova) (Hint: ?anova.lm)

3. Faraway 3.7 + (i) Produce a plot that displays the predicted distances as a function of left leg strength for a range of right leg strengths and average (according to this data) left and right leg flexibilities, using the model you fit in part (a). (j) Produce a plot that displays the predicted distances for your preferred model.

4. Generate a single fixed design matrix $X = (1 \quad X_1 \quad X_2 \quad X_3)$ with 30 rows where $X_1$, $X_2$ and $X_3$ are columns each generated by drawing $30$ observations from a Uniform(-1, 1) independently. (We are generating the $X$’s randomly but once you have done it once, we will treat them as fixed).

Simulate a response according to the model: $y = 2 + 3 X_1 + 4 X_2 + \epsilon$ where $\epsilon \sim N(0, 2I)$

Fit the regression model $Y = X\beta + \epsilon$ (using lm) and retain the coefficient estimates $\hat{\beta_0}, \hat{\beta_1}, \hat{\beta_2}, \hat{\beta_3}, \hat{\sigma^2}$.

Repeat 5000 times and produce:

• histograms (or density curves) of the parameter estimates (including $\hat{\sigma^2}$) with curves of their theoretical distributions overlaid.
• a histogram of $(\hat{\beta_1} - \beta_1)/\text{SE}(\hat{\beta})$ with a curve of its theoretical distribution overlaid
• a scatter plot of $\hat{\beta_1}$ and $\hat{\beta_0}$

This gives us a starting point for examining how violations to our assumption might affect our distributional results.