Stat 552

# Homework 2

## Part One

Hardcopy handed in (may be handwritten or typeset)

1. Consider the simple linear regression model:

where the $\epsilon_i$ are independent Normal$(0, \sigma^2)$ random errors.

a) Using the matrix form for multiple linear regression, write out the form of $y$, $X$ and $\epsilon$.

b) Calculate $X^TX$, $X^Ty$ and $(X^TX)^{-1}$.

c) Find the least squares estimates, $\hat{\beta} = (X^TX)^{-1}X^Ty$.

d) (Extra credit) Show the least squares estimates above, are equivalent to the usual form for the estimates in simple linear regression:

2. Using the matrix from of the least squares estimates, derive the form of the estimate of slope in a simple linear regression without an intercept:

$y_i = \beta_1 x_i + \epsilon_i, \quad i = 1, \ldots, n$ where the $\epsilon_i$ are independent Normal$(0, \sigma^2)$ random errors.

## Part Two

Hand in a .Rmd file on canvas, and as a hardcopy of the compiled .Rmd file (i.e. a .pdf of .doc).

1. The dataset teengamb in the package faraway contains survey data from a study to investigate teen gambling in the U.K.

Consider the regression model: $\text{gamble}_i = \beta_0 + \beta_1 \text{sex}_i + \beta_2 \text{status}_i + \beta_3 \text{income}_i+ \beta_4 \text{verbal}_i + \epsilon_i \quad i = 1, \ldots, 47$

a) Construct the design matrix $X$ and response vector $y$ in R.

b) Find the least squares estimates using matrix algebra in R. Verify your answers by fitting the regression model using lm.

c) Find the fitted values and residuals using matrix algebra in R and present a plot of residuals against fitted values.

2. Consider the following regression model:

Simulate a realization of the model in R by setting up the $X$ matrix, $\beta$ vector, and the error vector and using matrix algebra. Include a plot of $y$ against $i$.

3. (Faraway 2.3) Generate some artifical data by:

Fit a polynomial in x for predicting y. Compute $\hat{\beta}$ in two ways: using lm, and directly using matrix algebra. At what degree of polynomial does the direct calculation fail?

A regression model that uses a polynomial in $x$ of degree $k$ to predict $y$ is $y_i = \beta_0 + \beta_1 x_i + \ldots + \beta_k x_i^k + \epsilon_i$

(Take away: while the matrix formulas we use in class are analytically correct, they donâ€™t neccessarily describe a good way to get answers numerically.)