# Model Selection: Criterion Based methodsMar 01 2019

## Stepwise methods

Stepwise methods consider a single path through the models (and only ever one model of each size).

An alternative is to consider all possible models (a.k.a Best Subset), but we need a way to compare models (F-tests don’t help for non-nested models).

General strategy of best subsets: calculate metric for all possible models. Choose model with the “best” value of the metric

Some common metrics, in the regression setting \begin{aligned} \text{Akaike Information Criterion} = \text{AIC} &= n\log{(RSS / n)} + 2p \\ \text{Bayesian Information Criterion} = \text{BIC} &= n\log{(RSS / n)} + p\log{n} \\ \text{Mallow's Cp} &= \frac{RSS} {\hat{\sigma^2}} + 2p - n \\ & \quad \text{where } \hat{\sigma^2} \text{ is from the full model}\\ \text{Adjusted R}^2 &= 1 - \frac{n-1}{n-p}(1-R^2) \end{aligned}

Small is good for AIC, BIC and Mallow’s Cp.

Large is good for adjusted $$R^2$$.

Which is best? It depends…

## AIC

Arises from considering how to estimate the distance of a candidate model from the true model.

In particular using the Kullback-Leibler information to measure distance $I(f, g) = \int f(x) \log{\left( \frac{f(x)}{g(x|\theta)} \right)} dx$ In general, can estimate the expected value, at the maximum likelihood estimate of $$\theta$$ with $- \log{L(\hat{\theta})} + p + \text{constant}$

Akaike multiplied this by 2. We can ignore any constants that are the same for a given dataset and assumed error distribution for regression variable selection. Care should be taken in other contexts.

BIC

Arises from trying to find the model with the highest posterior probability.

Mallow’s Cp

Tries to estimate the mean square prediction error.

Find a model with the highest $$R^2$$, but $$R^2$$ always increases when you add a variable. Adjusted $$R^2$$ penalises for more variables.

## Limitations of best subsets methods (or criterion based methods as Faraway calls them)

1. p-values will generally overstate the importance of remaining predictors

2. Inclusion in the model doesn’t correspond to important, and exlcusion doesn’t correspond to unimportant.