ST552 Lab 3 Jan 20th 2016

Goals:

Reduce repetition in code by:

- writing functions
- using for loops
- using the apply family

DRY is the principle in programming that every idea in your code should be captured in one place only. For us, in practice, this means it’s easy to change parameters in a simulation, or make an adjustment in a model without having to make the change in many places in our code.

Take HW #2 part II.3. **How did you do it? Share with your neighbour and TA. Can you identify repetition in your code?**

(Matt go through an example)

A pseudo-code version of what we are trying to achieve:

- Degree, k = 1
- Fit a polynomial model of degree k using the direct method.
- If the fit worked, increase k and repeat 2., otherwise stop.

In this section, we will write a function that captures the idea of: “fit a kth degree polynomial regression model, using the direct method”

If I already set up (we’ll do this in a better way later)

```
n <- 20
x <- 1:n
y <- x + rnorm(n)
X <- cbind(1, x, x^2, x^3, x^4, x^5, x^6, x^7, x^8, x^9, x^10)
```

**How would we fit (using the direct method) the model \(y_i = \beta_0 + \beta_1 x_i + \epsilon_i\)?**

**How would we fit (using the direct method) the model \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \epsilon_i\)?**

In general, how could we fit \[ y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \ldots + \beta_k x_i^k + \epsilon_i \] for \(k \le 10\).

We really want to capture this idea in a function. Fill in the body of this function so it finds the polynomial regression coefficients using the direct method

```
find_poly <- function(k){
}
```

(Matt: remind people how functions work.)

**Is there any repetition? Can you remove it?**

It’s good practice to not let functions depend on anything outside the function. So, we should supply the design matrix and response as arguments to our function.

Ok, now to get the task done we just do:

```
k <- 1
find_poly(k, X, y)
k <- 2
find_poly(k, X, y)
k <- 3
find_poly(k, X, y)
# and so on until it fails
```

There’s still a lot of repetition!

Loops are a general programming structure that perform repetition. A `for`

loop repeatedly evaluates it’s contents as it cycles through it’s argument. For example,

```
for (i in 1:10){
print(i)
}
```

```
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
```

the argument `i`

cycles through the values 1 to 10, each time evaluating `print(i)`

. **How can we write our task as a for loop?**

Try it. Now!

`for`

loops don’t return anything and if we want to keep our results from each fit we need to store them somewhere explicitly. For example,

```
coefs <- vector("list", 10) # an empty list
for (k in 1:10){
print(k)
coefs[[k]] <- find_poly(k, X, y)
}
```

Ok, our task is now (mostly) complete without repetition.

`lm`

approach?The key is to realize `lm`

can take a matrix on the RHS of a formula and it will include a term for every column:

`lm(y ~ X[, 1:3] - 1) # fitting the quadratic for example`

```
##
## Call:
## lm(formula = y ~ X[, 1:3] - 1)
##
## Coefficients:
## X[, 1:3] X[, 1:3]x X[, 1:3]
## 0.2229180 1.0295678 -0.0006584
```

`lm(y ~ X - 1) # fitting the degree 9 polynomial`

```
##
## Call:
## lm(formula = y ~ X - 1)
##
## Coefficients:
## X Xx X X X X
## 2.513e+01 -6.038e+01 5.341e+01 -2.281e+01 5.546e+00 -8.274e-01
## X X X X X
## 7.842e-02 -4.737e-03 1.767e-04 -3.711e-06 3.356e-08
```

You could either expand the `find_poly`

function to also do the `lm`

method, or give it it’s own function and put it in the same loop as the direct method.

The `apply`

functions in R, perform the same type of computation as loops, but with much less bookkeeping on your part. For example, our `for`

loop using `lapply`

looks like:

```
ks <- 1:6
coefs_2 <- lapply(ks, find_poly, X, y)
```

It’s also easy to create our design matrix with `sapply`

:

`X <- sapply(0:10, function(k, x) x^k, x)`

If you don’t want your loop to stop at the first error, take a look at the function `try`

.