Problems with the error | ST 552 Statistical Methods

Today

Problems with the errors

Generalized Least Squares
Lack of fit F-tests
Robust regression

Generalized Least Squares

$Y = X β + ϵ$

We have assumed $Var (ϵ) = σ^{2} I$ , but what if we know $Var (ϵ) = σ^{2} Σ$ , where $σ^{2}$ is unknown, but $Σ$ is known. For example, we know the form of the correlation and/or non-constant variance in the response.
The usual least squares estimates ${\hat{β}}_{L S}$ are unbiased, but they are no longer BLUE.

Let $S$ be the matrix square root of $Σ$ , i.e. $Σ = S S^{T}$ .

Define a new regression equation by multiplying both sides by $S^{- 1}$ : $\begin{aligned} S^{- 1} Y & = S^{- 1} X β + S^{- 1} ϵ \\ Y^{'} & = X^{'} β + ϵ^{'} \end{aligned}$

Your Turn

Show $Var (ϵ^{'}) = Var (S^{- 1} ϵ) = σ^{2} I$ .

Show the least squares estimates for the new regression equation reduce to: $\hat{β} = (X^{T} Σ^{- 1} X)^{- 1} X^{T} Σ^{- 1} Y$

Can also show $Var (β) = (X^{T} Σ^{- 1} X)^{- 1} σ^{2}$ .
The estimates: $\hat{β} = (X^{T} Σ^{- 1} X)^{- 1} X^{T} Σ^{- 1} Y$ are known as estimates.
In practice, $Σ$ might only be know up to a few parameters that also need to be estimated.

Common cases of GLS

$Σ$ defines a temporal or spatial correlation structure
$Σ$ defines a grouping structure
$Σ$ is diagonal and defines a weighting structure ()

In R

?lm # use weights argument
library(nlme)
?gls # has weights and/or correlation argument

Oat yields

Data from an experiment to compare 8 varieties of oats. The growing area was heterogeneous and so was grouped into 5 blocks. Each variety was sown once within each block and the yield in grams per 16ft row was recorded.

${yield}_{i} = β_{0} + β_{1} {variety}_{i} + ϵ_{i} i = 1, \dots, 40$ $Var (ϵ_{i}) = σ^{2}, Cor (ϵ_{i}, ϵ_{j}) = {\begin{cases} ρ, & {block}_{i} = {if block}_{j} \\ 0, & otherwise \end{cases}$

library(nlme) 
fit_gls <- gls(yield ~ variety, data = oatvar, 
  correlation = corCompSymm(form = ~ 1 | block))

Oat yields in R

intervals(fit_gls)

## Approximate 95% confidence intervals
## 
##  Coefficients:
##                  lower  est.       upper
## (Intercept) 291.542999 334.4 377.2570009
## variety2     -4.903898  42.2  89.3038984
## variety3    -18.903898  28.2  75.3038984
## variety4    -94.703898 -47.6  -0.4961016
## variety5     57.896102 105.0 152.1038984
## variety6    -50.903898  -3.8  43.3038984
## variety7    -63.103898 -16.0  31.1038984
## variety8      2.696102  49.8  96.9038984
## attr(,"label")
## [1] "Coefficients:"
## 
##  Correlation structure:
##          lower      est.     upper
## Rho 0.06596382 0.3959955 0.7493731
## attr(,"label")
## [1] "Correlation structure:"
## 
##  Residual standard error:
##    lower     est.    upper 
## 33.39319 47.04679 66.28298

Lack of fit F-tests

$\hat{σ^{2}}$ should be (if our model is specified correctly) an unbiased estimate of $σ^{2}$ .
A “model free” estimate of $σ^{2}$ is available if there are replicates (multiple observations at combinations of the explanatory values).
If our $\hat{σ^{2}}$ from our model is much bigger than the “model-free” estimate, we have evidence of .

In practice

Fit a saturated model. Compare saturated model to proposed model with an F-test. .
Saturated: every combination of explanatory variables is allowed its own mean (i.e. every group of replicates is allowed its own mean). A model that includes every explantory as categorical and every possible interaction between variables.

Example

data(corrosion, package = "faraway")
lm_cor <- lm(loss ~ Fe, data = corrosion)
lm_sat <- lm(loss ~ factor(Fe), data = corrosion)
anova(lm_cor, lm_sat)

## Analysis of Variance Table
## 
## Model 1: loss ~ Fe
## Model 2: loss ~ factor(Fe)
##   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
## 1     11 102.850                                
## 2      6  11.782  5    91.069 9.2756 0.008623 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# significant lack of fit

Robust regression

Remember to define our least squares estimates we looked for $β$ to minimise $\sum_{i = 1}^{n} {(y_{i} - x_{i}^{T} β)}^{2}$

In practice, since we are squaring residuals, observations with large residuals carry a lot of weight. For, robust regression, we want to downweight the observations with large residuals.

The idea of M-estimators is to extend this to the general situation where we want to find $β$ to minimise $\sum_{i = 1}^{n} ρ (y_{i} - x_{i}^{T} β)$ where $ρ ()$ is some function we specify.

$\sum_{i = 1}^{n} ρ (y_{i} - x_{i}^{T} β)$

Least squares: $ρ (e_{i}) = e_{i}^{2}$
Least absolute deviation, L $_{1}$ regression: $ρ (e_{i}) = | e_{i} |$
Huber’s method $ρ (e_{i}) = {\begin{cases} e_{i}^{2} / 2 & if | e_{i} | \leq c \\ c | e_{i} | - c^{2} / 2 & otherwise \end{cases}$
Tukey’s bisquare $ρ (e_{i}) = {\begin{cases} \frac{1}{6} (c^{6} - (c^{2} - e_{i}^{2})^{3}) & | e_{i} | \leq c \\ 0 & otherwise \end{cases}$

The models are usually fit in an iterative process.

Least trimmed squares

Minimise the smallest residuals $\sum_{i = 1}^{q} e_{(i)}^{2}$ where $q$ is some number smaller than $n$ and $e_{(i)}$ is the ith smallest residual.

One choice, $q = ⌊ n / 2 ⌋ + ⌊ (p + 1) / 2 ⌋$

Annual numbers of telephone calls in Belgium

Problems with the error Feb 22 2019