---
title: Predicting Body Fat
author: Charlotte Wickham
date: "2019-02-01"
---
Based on code from http://www.maths.bath.ac.uk/~jjf23/LMR/scripts2/prediction.R, also found in Section 4.2 of Linear Models with R
```{r setup}
library(faraway)
library(tidyverse)
data(fat, package = "faraway")
```
## Background
**Q1**: Take a quick read through of the documentation on this dataset, `?fat`. We will be using the `brozek` variable, a hard to obtain but accurate measure of body fat percentage, as our response.
In context of the data (discuss with your neighbours):
`y`: body fat percentage (based on Brozek's eqn.)
`x_0`: a specific value for age, weight, height and the 10 body circumference.
* What would a confidence interval on the mean response tell us? When might it be useful?
For men with characteristic given by `x_0`, the 95% confidence interval for _average_ body fat percentage. Public health scenario...
* What would a prediction interval on a response tell us? When might it be useful?
For a man with characteristics given by `x_0`, 95% prediction interval for body percentage. E.g. Nick wants to know his body fat. Doctor who wants to estimate body fat for a specific male patient in front of them.
Faraway says:
> "Normally, we would start with an exploratory analysis of the data and a detailed consideration of what model to use but let's be rash and just fit a model and start predicting."
```{r}
lmod <- lm(brozek ~ age + weight + height + neck + chest +
abdom + hip + thigh + knee + ankle + biceps + forearm + wrist, data = fat)
X <- model.matrix(lmod)
```
## Prediction for a "typical" man
This is Faraway's definition of an average man:
```{r}
(x0 <- apply(X, 2, median)) #
```
He decided he wanted a prediction for an "average man". We needed an `x0` to play with.
We can then predict bodyfat for a man with these physical characteristics:
```{r}
(y0 <- sum(x0 * coef(lmod))) # x_0^T \hat{\beta}
```
Or let R do the work:
```{r}
(pred <- predict(lmod, new = data.frame(t(x0)), se = TRUE))
# se.fit is the standard error in the predicted *mean* response.
# residual.scale is estimate of sigma
```
**Q2**: Find the `se.fit` value by using matrix algebra i.e. $\hat{\sigma}\sqrt{x_0^T(X^TX)^{-1}x_0}$ and `residual.scale` instead.
```{r}
pred$residual.scale * sqrt( t(x0) %*% solve(t(X) %*% X) %*% x0 )
```
`predict()` will construct a prediction interval if you ask for it:
```{r}
predict(lmod, new = data.frame(t(x0)), interval = "prediction")
```
**Q3** Using the values from Q2 replicate the calculation of this interval.
```{r}
se_mean <- pred$residual.scale *
sqrt(t(x0) %*% solve(t(X) %*% X) %*% x0)
se_pred <- pred$residual.scale *
sqrt(1 + t(x0) %*% solve(t(X) %*% X) %*% x0)
y0 + c(-1, 1) * qt( 0.975, df = 238) * as.numeric(se_pred)
```
If you ask for a `"confidence"` interval instead you'll get the interval for the **mean** response:
```{r}
predict(lmod, new = data.frame(t(x0)), interval = "confidence")
```
## Width of intervals
Intervals are wider the further we are from the average explanatory values. Let's look at a prediction for someone who is at the (sample) 95th percentile on all variables:
```{r}
(x1 <- apply(X, 2, quantile, probs = 0.95))
```
Compare the width of these intervals to those above:
```{r}
predict(lmod, new = data.frame(t(x1)), interval = "prediction")
predict(lmod, new = data.frame(t(x1)), interval = "confidence")
```
## Aside on other packages
The `add_predictions()` function we saw in lab doesn't calculate standard errors so it isn't very useful in this scenario. However `data_grid()` may be. If you don't provide variables it will pick "typical" values for all those needed in `.model`:
```{r}
library(modelr)
data_grid(fat, .model = lmod)
```
It returns a tibble so it works nicely with `predict()` without having to do any `t()` or `data.frame()` calls:
```{r}
typical_man <- data_grid(fat, .model = lmod)
predict(lmod, newdata = typical_man, interval = "prediction")
```
The broom package has an `augment()` function that will add predictions and their standard errors and return then in tibble form:
```{r}
library(broom)
augment(lmod, newdata = typical_man)
```
Notice the new columns `.fitted` and `.se.fit` which you could use to construct intervals.