Stat 552

Homework 4

Due 3pm Dec 5th in class

Reading:Faraway Chapter 3 (not 3.3 and not 3.6) & 4

Due Feb 5th 3pm Hardcopy in class, .Rmd on canvas. (Use #1 to practice your math typesetting in markdown).

1. Your turn from Monday: Consider hypotheses of the form . What are and for exercises 1 and 5 from the F-test exercises handout?

2. Consider the cheddar cheese example and full model fitted,

data(cheddar, package = "faraway")
fit <- lm(taste ~ . , data = cheddar)

Examine the output from

anova(fit)
## Analysis of Variance Table
## 
## Response: taste
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## Acetic     1 2314.14 2314.14 22.5481 6.528e-05 ***
## H2S        1 2147.02 2147.02 20.9197 0.0001035 ***
## Lactic     1  533.32  533.32  5.1964 0.0310795 *  
## Residuals 26 2668.41  102.63                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Each line corresponds to an F-test. What models are being compared? Verify your answers by fitting the models explicitly and doing three separate anova calls (you might you need to use the scale argument to anova) (Hint: ?anova.lm)

3. Faraway 3.7 + (i) Produce a plot that displays the predicted distances as a function of left leg strength for a range of right leg strengths and average (according to this data) left and right leg flexibilities, using the model you fit in part (a). (j) Produce a plot that displays the predicted distances for your preferred model.

4. Generate a single fixed design matrix with 30 rows where , and are columns each generated by drawing observations from a Uniform(-1, 1) independently. (We are generating the ’s randomly but once you have done it once, we will treat them as fixed).

Simulate a response according to the model: where

Fit the regression model (using lm) and retain the coefficient estimates .

Repeat 5000 times and produce:

  • histograms (or density curves) of the parameter estimates (including ) with curves of their theoretical distributions overlaid.
  • a histogram of with a curve of its theoretical distribution overlaid
  • a scatter plot of and

This gives us a starting point for examining how violations to our assumption might affect our distributional results.