---
title: Predicting Body Fat
author: Charlotte Wickham
date: "2019-02-01"
---

Based on code from http://www.maths.bath.ac.uk/~jjf23/LMR/scripts2/prediction.R, also found in Section 4.2 of Linear Models with R

```{r setup}
library(faraway)
library(tidyverse)
data(fat, package = "faraway")
```

## Background

**Q1**: Take a quick read through of the documentation on this dataset, `?fat`.  We will be using the `brozek` variable, a hard to obtain but accurate measure of body fat percentage, as our response.

In context of the data (discuss with your neighbours):

  * What would a confidence interval on the mean response tell us? When might it be useful?
  * What would a prediction interval on a response tell us? When might it be useful?


Faraway says: 

> "Normally, we would start with an exploratory analysis of the data and a detailed consideration of what model to use but let's be rash and just fit a model and start predicting."

```{r}
lmod <- lm(brozek ~ age + weight + height + neck + chest + 
    abdom + hip + thigh + knee + ankle + biceps + forearm + wrist, data = fat)

X <- model.matrix(lmod)
```

## Prediction for a "typical" man

This is Faraway's definition of an average man:
```{r}
(x0 <- apply(X, 2, median))  # 
```

We can then predict bodyfat for a man with these physical characteristics:
```{r}
(y0 <- sum(x0 * coef(lmod))) # x_0^T \hat{\beta}
```

Or let R do the work:
```{r}
predict(lmod, new = data.frame(t(x0)), se = TRUE)
# se.fit is the standard error in the predicted *mean* response.
# residual.scale is estimate of sigma
```

**Q2**: Find the `se.fit` value by using matrix algebra i.e. $\hat{\sigma}\sqrt{x_0^T(X^TX)^{-1}x_0}$ and `residual.scale` instead.

```{r}

```


`predict()` will construct a prediction interval if you ask for it:
```{r}
predict(lmod, new = data.frame(t(x0)), interval = "prediction")
```

**Q3** Using the values from Q2 replicate the calculation of this interval.

```{r}

```

If you ask for a `"confidence"` interval instead you'll get the interval for the **mean** response:
```{r}
predict(lmod, new = data.frame(t(x0)), interval = "confidence")
```

## Width of intervals

Intervals are wider the further we are from the average explanatory values. Let's look at a prediction for someone who is at the (sample) 95th percentile on all variables:
```{r}
(x1 <- apply(X, 2, quantile, probs = 0.95))
```

Compare the width of these intervals to those above:
```{r}
predict(lmod, new = data.frame(t(x1)), interval = "prediction")
predict(lmod, new = data.frame(t(x1)), interval = "confidence")
```

## Aside on other packages

The `add_predictions()` function we saw in lab doesn't calculate standard errors so it isn't very useful in this scenario.  However `data_grid()` may be.  If you don't provide  variables it will pick "typical" values for all those needed in `.model`:
```{r}
library(modelr)
data_grid(fat, .model = lmod)
```

It returns a tibble so it works nicely with `predict()` without having to do any `t()` or `data.frame()` calls:
```{r}
typical_man <- data_grid(fat, .model = lmod)
predict(lmod, newdata = typical_man, interval = "prediction")
```

The broom package has an `augment()` function that will add predictions and their standard errors and return then in tibble form:
```{r}
library(broom)
augment(lmod, newdata = typical_man)
```
Notice the new columns `.fitted` and `.se.fit` which you could use to construct intervals.