ST552 Lab 2 Jan 13th 2016

Goals

Reproducible reports: Rmarkdown

Last week we talked about reproducible code, today we’ll talk about reproducible reports. A reproducible report is a document that records a complete analysis so that it can be reproduced exactly (and automatically) at any point in the future.

For homeworks we will be able to record everything we need in a single document. In reality for more complicated projects, a reproducible analysis will probably be a directory of data, reproducible code files and report generating files, that ideally will be under version control.

We’ll use Rmarkdown to generate our reports. Rmarkdown files combine markdown (a kind of plain text markup language) with chunks of R code. When compiled, the R code in the file is evaluated, then the results are woven into the markdown. Markdown is flexible enough that it can then be turned into a pdf (via LaTeX), a Word document, or an html file (for hosting on the web for instance, like this lab!).

Let’s learn by example. Matt will walk you through this part:

The file that opens has a template to help you figure out how Rmarkdown works. Hit the ‘Knit Word’ button and watch what happens. Compare the Word document that opens to the contents of the .Rmd file. In particular, notice the R code chunks

```{r}
# R CODE HERE

```

in the .Rmd file are run, “echoed” and the results (output or plots) are included in the document.

Try adding the following line to the document:

The average speed is `r mean(cars$speed) `. 

Then try this chunk:

$$
\bar{x} = \frac{1}{n}\sum_{i = 1}^n x_i
$$

I’d actually recommend Knitting to pdf, but you will need to install LaTeX on the computer you use.

Matrix algebra in R

Head to http://www.statmethods.net/advstats/matrix.html to see a list of all the matrix functions you’ll need to complete your homework this week.

To practice create the following matrices with as little typing as possible:

\[ I_{10 \times 10} \]

\[ D = \left[ \begin{matrix} 1 & 0 & 0 & \ldots & 0\\ 0 & 2 & 0 & \ldots & 0\\ 0 & 0 & 3 & \ldots & 0\\ \vdots & \vdots & \vdots & \ddots & 0\\ 0 & 0 & 0 & \ldots & 10 \end{matrix}\right] \] \[ O = \pmb{1}_{10 \times 10} \quad (\text{a } 10 \times 10 \text{ matrix full of ones}) \] \[ X = \left[ \begin{matrix} 1 & 1\\ 1 & 2 \\ 1 & 3 \\ \vdots & \vdots \\ 1 & 10 \end{matrix}\right] \quad \]

Then calculate: \[ X^T, \quad D^{-1}, \text{and } X^TX \]

Simulation of Normal random variables in R

You probably already know this, but for completeness, to simulate a realization of \(n\) independent Normal random variables with mean 0 and standard deviation 1 in R:

n <- 10 # for example
rnorm(n)
##  [1]  1.178538998  0.472493981 -0.522909774  0.206437481  0.003171414
##  [6] -0.493388456 -1.067536904 -1.343552370 -0.915417901 -0.821686809

rnorm has arguments mean and sd if you need a different mean and standard deviation. Want dependence? Start with uncorrelated observations, and transform them (check the first answer) or use the function rmvnorm in the mvtnorm package.