Transforming the response

Motivation: generally we are hunting for a transformation that makes the relationship simpler.

We might believe the relationship with the explanatories is linear only after a transformation of the response, we might also hope on this transformed scale What is a good ?

Transforming the predictor

Motivation: we acknowledge that straight lines might not be appropriate and want to estimate something more flexible.

For example, we believe the model is something like or even,

We are generally interested in estimating .

Transforming the response

In general, transformations make interpretation harder.

We usually want to make statements about the response (not the transformed) response.

Predicted values are easily back-transformed, as well as the endpoints of confidence intervals.

Parameters often do not have nice interpretations on the backtransformed scale.

Special case: Log transformed response

Our fitted model on the transformed scale, predicts: If we backtransform, by taking exponential of both sides,

So, an increase in of one unit, will result in the predicted response being multiplied by .

If we are willing to assume that on the transformed scale the distribution of the response is symmetric, and back-transforming gives,

So, an increase in of one unit, will result in the median response being multiplied by .

(For monotone functions ,

but in general)

Example

library(faraway)
data(case0301, package = "Sleuth3")
head(case0301, 2)

##   Rainfall Treatment
## 1   1202.6  Unseeded
## 2    830.1  Unseeded

sumary(lm(log(Rainfall) ~ Treatment, data = case0301))

##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)        5.13419    0.31787 16.1519  < 2e-16
## TreatmentUnseeded -1.14378    0.44953 -2.5444  0.01408
## 
## n = 52, p = 2, Residual SE = 1.62082, R-Squared = 0.11

It is estimated the median rainfall for unseeded clouds is 0.32 times the median rainfall for seeded clouds.

(Assuming log rainfall is symmetric around its mean)

Box-Cox transformations

Assume, the response is positive, and and that is of the form

Estimate with maximum likelihood.

For prediction, pick as the MLE.

For explanation, pick “nice” within 95% CI.

Example:

library(MASS)
data(savings, package = "faraway")
lmod <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = savings)
boxcox(lmod, plotit = TRUE)

Your turn:

data(gala, package = "faraway")
lmod <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, 
  data = gala)
boxcox(lmod, plotit = TRUE)

Transforming the predictors

A very general approach is to let the function for each explanatory be represented by a finite set of basis functions. For example, for a single explanatory, X, where are the known basis functions, and the unknown basis coefficients.

Then where the columns of are , and we can find the with the usual least squares approach.

Your turn:

What are the columns in the design matrix for the model:

where ?

What do the functions look like?

Example: subset regression

cut <- 35
X <- with(savings, cbind(
  as.numeric(pop15 < cut), 
  as.numeric(pop15 >= cut),
  pop15 * (pop15 < cut),
  pop15 * (pop15 >= cut)))

lmod <- lm(sr ~ X - 1, 
  data = savings)
summary(lmod)

Example: subset regression

Broken stick regression

Broken stick:

Example: broken stick regression

Polynomials

Polynomials:
Orthogonal polynomials
Response surface, of degree

Cubic Splines

Knots: 0, 0, 0, 0, 0.2, 0.4, 0.6, 0.8, 1, 1, 1 and 1

Linear splines

Knots: 0, 0, 0.2, 0.4, 0.6, 0.8, 1 and 1

In practice splines provide a flexible fit

Smoothing splines: have a large set of basis functions, but penalize against wiggliness
Generalized Additive Models: simultaneously estimate

Transforming predictors with basis functions

The parameters in these regressions no longer have nice interpretations. The best way to present the results is a plot of the estimated function for each X, (or surfaces if variables interact),
The significance of a variable can still be assessed with an Extra Sum of Squares F-test, comparing to a model without any of the terms relating to a particular variable.

Transformations Feb 25 2019