Transformations Feb 25 2019

Transforming the response

Motivation: generally we are hunting for a transformation that makes the relationship simpler.

We might believe the relationship with the explanatories is linear only after a transformation of the response, E(g(Y))=Xβ we might also hope on this transformed scale Var(g(Y))=σ2I What is a good g?

Transforming the predictor

Motivation: we acknowledge that straight lines might not be appropriate and want to estimate something more flexible.

For example, we believe the model is something like Y=f1(X1)+f2(X2)++fp(Xp)+ϵ or even, Y=f(X1,X2,,Xp)+ϵ

We are generally interested in estimating f.

Transforming the response

Transforming the response

In general, transformations make interpretation harder.

We usually want to make statements about the response (not the transformed) response.

Predicted values are easily back-transformed, as well as the endpoints of confidence intervals.

Parameters often do not have nice interpretations on the backtransformed scale.

Special case: Log transformed response

Our fitted model on the transformed scale, predicts: logyi^=β^0+β^1xi1++β^pxip If we backtransform, by taking exponential of both sides, yi^=expβ^0exp(β^1xi1)exp(β^pxip)

So, an increase in x1 of one unit, will result in the predicted response being multiplied by exp(β1).

If we are willing to assume that on the transformed scale the distribution of the response is symmetric, Median(log(Y))=E(log(Y))=β0+β1x1++βpxp and back-transforming gives, exp(Median(log(Y)))=Median(Y)=exp(β0)exp(β1x1)exp(βpxp)

So, an increase in x1 of one unit, will result in the median response being multiplied by exp(β1).

(For monotone functions Median(f(Y))=f(Median(Y)),

but E(f(Y))f(E(Y)) in general)

Example

library(faraway)
data(case0301, package = "Sleuth3")
head(case0301, 2)
##   Rainfall Treatment
## 1   1202.6  Unseeded
## 2    830.1  Unseeded
sumary(lm(log(Rainfall) ~ Treatment, data = case0301))
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)        5.13419    0.31787 16.1519  < 2e-16
## TreatmentUnseeded -1.14378    0.44953 -2.5444  0.01408
## 
## n = 52, p = 2, Residual SE = 1.62082, R-Squared = 0.11

It is estimated the median rainfall for unseeded clouds is 0.32 times the median rainfall for seeded clouds.

(Assuming log rainfall is symmetric around its mean)

Box-Cox transformations

Assume, the response is positive, and g(Y)=Xβ+ϵ,ϵN(0,σ2I) and that g is of the form gλ(y)={yλ1λλ0log(y)λ=0

Estimate λ with maximum likelihood.

For prediction, pick λ as the MLE.

For explanation, pick “nice” λ within 95% CI.

Example:

library(MASS)
data(savings, package = "faraway")
lmod <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = savings)
boxcox(lmod, plotit = TRUE)

Your turn:

data(gala, package = "faraway")
lmod <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, 
  data = gala)
boxcox(lmod, plotit = TRUE)

Transforming the predictors

Transforming the predictors

A very general approach is to let the function for each explanatory be represented by a finite set of basis functions. For example, for a single explanatory, X, f(X)=k=1Kβkfk(X) where fk are the known basis functions, and βk the unknown basis coefficients.

Then yi=f(Xi)+ϵiyi=β1f1(Xi)++βKfK(Xi)+ϵiY=Xβ+ϵ where the columns of X are f1(X), f2(X) and we can find the β with the usual least squares approach.

Your turn:

What are the columns in the design matrix for the model:

yi=β11{Xi<5}+β21{Xi5}+β3Xi1{Xi<5}+β4Xi1{Xi5}+ϵi

where Xi=i,i=1,,10?

What do the functions fk(),k=1,,4 look like?

Example: subset regression

Example: subset regression

cut <- 35
X <- with(savings, cbind(
  as.numeric(pop15 < cut), 
  as.numeric(pop15 >= cut),
  pop15 * (pop15 < cut),
  pop15 * (pop15 >= cut)))

lmod <- lm(sr ~ X - 1, 
  data = savings)
summary(lmod)

Example: subset regression

Broken stick regression

Broken stick:

f1(x)={cxif x<c0otherwise  f2(x)={0if x<cxcotherwise 

yi=β0+β1f1(Xi)+β2f2(Xi)+ϵi

Example: broken stick regression

Polynomials

  • Polynomials: fk(x)=xk,k=1,,K

  • Orthogonal polynomials

  • Response surface, of degree d fkl(x,z)=xkzl,k,l0 s.t. k+l=d

Cubic Splines

Knots: 0, 0, 0, 0, 0.2, 0.4, 0.6, 0.8, 1, 1, 1 and 1

Linear splines

Knots: 0, 0, 0.2, 0.4, 0.6, 0.8, 1 and 1

In practice splines provide a flexible fit

  • Smoothing splines: have a large set of basis functions, but penalize against wiggliness

  • Generalized Additive Models: simultaneously estimate

yi=f(xi1)+g(xi2)++ϵi

Transforming predictors with basis functions

  • The parameters in these regressions no longer have nice interpretations. The best way to present the results is a plot of the estimated function for each X, (or surfaces if variables interact),

  • The significance of a variable can still be assessed with an Extra Sum of Squares F-test, comparing to a model without any of the terms relating to a particular variable.