Stat 552

Homework 1

Due 3pm Jan 15 in class

General homework guidelines

You may discuss homework but the final write-up and final code should always be your own work. Identical code or reports will be considered an act of academic dishonesty.

Pay attention to the required form for submissions and place of submission. Some parts should be submitted online through canvas, other parts as a hard copy in lecture, sometimes both.

Readings specified on homework are required, and should be done before attempting the questions below.

Tips for reading:

  • Take notes as you read. You are much more likely to retain the material.
  • Try the exercises as you go. This is a great way to test if you understood what you just read.
  • If the reading involves R code, you should have R open and try out code. If you can’t understand a line of complicated R code, try running pieces of it.

Reading

Part 1: Data structures in R

Your task is to write a single R script that completes the tasks below. Answers to questions can be given in code comments. Submit your R script to canvas before lecture class on Friday. Your R script will be graded on successfully completing the tasks below, its reproducibility and its adherence to the style guide.

1. Create a list, that describes you, like this one that describes me:

## $name
## [1] "Charlotte"
## 
## $number_of_siblings
## [1] 1
## 
## $female
## [1] TRUE

2. Create a named vector that contains the same information as the list you made in (1). What is the downside of using a vector in this case?

3. Create a named integer vector that has values that correspond to your immediate family members’ ages, and names corresponding to your family members’ names. This can be your actual family or a fictitious one if you prefer.

4. Multiply the vector in (3) by 5. Has anything about the structure of the vector changed?

5. Create a data frame about anything you want, it must have at least one numeric column, one character column and one factor column and at least three rows.

6. Convert the data frame from above to a matrix. Describe what happened.

7. Convert the column that is a factor to a double. Describe what happened.

8. From class the following code fits a linear regression model to Galton’s height data:

data(GaltonFamilies, package = "HistData")
slr <- lm(childHeight ~ midparentHeight, 
  data = GaltonFamilies)

What kind of object is slr? Write code to extract the residual degrees of freedom.

9. What kind of object is summary(slr)? Write code to extract the estimate of \(\sigma\).

Part 2: Simple linear regression review

This part should be handed in as a hard copy. In general for this style of question I expect you to interleave your calculations and answers (we’ll see an easy way to do this next week, for now you’ll probably do a lot of copying and pasting), so that the TA can follow your working and check your answer without flipping back and forward through your assignment. Some pointers:

  • Code should look like code (use a mono-spaced font, i.e. Courier)
  • You may include R output (that should also look like code), but be sure to answer the question in sentence form as well.

Consider again, the simple linear regression of Galton’s height data,

data(GaltonFamilies, package = "HistData")
slr <- lm(childHeight ~ midparentHeight, 
  data = GaltonFamilies)
summary(slr)

Construct:

  • a 95% confidence interval for the slope parameter
  • a 95% confidence interval for the mean child height when the midparentHeight is 72 in, and
  • a 95% prediction interval for the child height when the midparentHeight is 72in.

For each interval, write a one sentence interpretation in a non-technical manner in the context of the study.

Extra Credit: midparentHeight is defined as the father’s height plus 1.08 times the mother’s height divided by two. Why? Can you figure out where the 1.08 came from?