Reading: Faraway 10, 11.3 & 11.4
10.7
Consider the dataset diamonds
in the ggplot2
package. I am providing you with a (random) 50% subset of the data
Build a regression model using diamonds_sub
to predict a diamonds price from the other available variables.
You can look at ?diamonds
to learn about the variables, but otherwise you should not examine the full data set and only use the subset provided.
Beware! Building a good predictive model can swallow a lot of time. Your answer needs to include at least:
Any extra work should only be done if it doesn’t impact your ability to meet your other commitments (inside or outside of school).
Your writeup for this question should follow the general guidelines for the report in HW #6. However, prediction is the goal here, so your methods and results sections will focus more on models you considered and their predictive performance, rather than assumptions and inference.
There is a prize for the person who has the best predictions on a 20% sample that is disjoint from the records in diamonds_sub
. To be eligible for the “best predictions” prize, you must also submit an .rda
(R binary) file containing a function that takes as input a data frame with the same columns as diamonds
and returns a vector of predicted prices. An easy way to create this from a fitted model is provided in hw8_make_preds.R.