Multiple Linear Regression Jan 14 2019

Today

Matrix warmup

See handout

Simple linear regression

Recall in simple linear regression:

Have n observations of a response yi, and a single explanatory variable, xi.

The response is related to the explanatory variable by: yi=β0+β1xi+ϵii=1,,n

where ϵi are independent and identically distributed with expected value 0, and variance σ2.

Multiple linear regression

Now we have more than one explanatory variable.

Have n observations of a response, yi and a set of explanatory variables, (xi1,xi2,,xi(p1)).

The response is related to the explanatory variables by: yi=β0+β1xi1+β2xi2++βp1xi(p1)+ϵii=1,,n

where ϵi are independent and identically distributed with expected value 0, and variance σ2.

Example: Galápagos Islands

Faraway 2.6

Measurements on 30 Galápagos Islands are made.

First 5 islands:

  Species Area Elevation Nearest Scruz Adjacent
Baltra 58 25.09 346 0.6 0.6 1.84
Bartolome 31 1.24 109 0.6 26.3 572.3
Caldwell 3 0.21 114 2.8 58.7 0.78
Champion 25 0.1 46 1.9 47.4 0.18
Coamano 2 0.05 77 1.9 1.9 903.8

Variable Descriptions

?gala
gala R Documentation

Species diversity on the Galapagos Islands

Format

The dataset contains the following variables

Species

the number of plant species found on the island

Endemics

the number of endemic species

Area

the area of the island (km2)

Elevation

the highest elevation of the island (m)

Nearest

the distance from the nearest island (km)

Scruz

the distance from Santa Cruz island (km)

Adjacent

the area of the adjacent island (square km)

A possible model

Speciesi=β0+β1Areai+β2Elevationi+β3Nearesti+β4Scruzi+β5Adjacenti+ϵii=1,,n

E.g. i=1, Baltra: 58=β0+β125.09+β2346+β30.6+β40.4+β51.84+ϵ1

Your turn:

General matrix form

(y1y2yn)=(1x11x12x1(p1)1x21x22x2(p1)1xn1xn2xn(p1))(β0β1βp1)+(ϵ1ϵ2ϵn)y=Xβ+ϵ where yn×1=(y1,y2,,yn)Tϵn×1=(ϵ1,ϵ2,,ϵn)Tβp×1=(β0,β1,,βp1)TXn×p=(1x11x12x1(p1)1x21x22x2(p1)1xn1xn2xn(p1))

Galápagos: Matrix form

y30×1=(58313252),X30×6=(125.093460.60.61.8411.241090.626.3572.3310.211142.858.70.7810.1461.947.40.1810.05771.91.9903.82)

β6×1=(β0β1β2β3β4β5),ϵ30×1=(ϵ1ϵ2ϵ3ϵ4ϵ5)

Your Turn

Write out the design matrix, X, for the following models, using the data for the first five islands:

Speciesi=β0+β1Areai+β2Nearesti+ϵiSpeciesi=β1Areai+β2Areai2+ϵiSpeciesi=β0+β11{Areai>1}+ϵi where 1{.} is an indicator variable that takes the value 1, when the condition in the argument is true, and 0 otherwise.

Fitted values and residuals

If we had an estimate for the β vector, β^=(β^0,β^1,,β^p1)T

Then we can define fitted value and residual vectors: y^=(y1^,,yn^)T=Xβ^e=ϵ^=(e1,,en)T=yXβ^

Questions to answer this week: