We’ll use RStudio as our interface to R.
Projects are are useful way to organize your work in RStudio. When you open a project:
To create a new project for this class:
Create the project once. Next time you are going to work on ST552, open the project in RStudio (File -> Open Project), or use the project dropdown in the top right of RStudio .
(Accessing your SCIENCE drive from other computers: http://my.science.oregonstate.edu/mount_network_drives)
(Accessing your ONID drive from other computers: http://oregonstate.edu/helpdocs/accounts/onid-osu-network-id/using-your-onid/your-home-directory)
Reproducibility is one of the huge advantages of using a programming language for data analysis. Our code becomes a complete recipe to go from a possibly messy dataset, to numbers and figures for a statistical report. We can repeat our analyses in the future and get exactly the same result. However, writing truly reproducible code takes discipline.
By default, R (and RStudio) saves a copy of your workspace (packages you have loaded and objects you have created) when you exit R, and it loads it again when you come back. This may seem convenient, but it encourages bad habits for reproducibility. It’s too easy to rely on packages being loaded or accidentally relying on objects you created outside of your script. The first thing we will do is change this default behavior.
Go to Tools -> Project Options. In the General tab, set the following:
Now, when we start a fresh R session (Session -> Restart R) we know there is nothing from a previous session hanging around.
This does it for your ST552 project, but I’d encourage you to use these options for all your work (you can set them globally in Tools -> Global Options).
When I am writing R code, I will occasionally check for reproducibility, by restarting R, and sourcing my code (Code -> Source File, or Source button in Editor). Sourcing a file, runs all the code in the file from top to bottom, but it will stop if an error occurs. If an error does occur, fix the error, restart R and try sourcing again. At the very least do this before closing R, and before handing in code. Your first homework requires submitting an R code file that will be checked for reproducibility.
Reproducible code is the first step towards reproducible reports, next week…
Code is a form of communication. You should write it in a way that is easy for others (including your future self) to read and understand. Consistency is key, pick a style and stick with it.
We’ll follow http://adv-r.had.co.nz/Style.html Your first homework involves a submission of code that will be checked according to this guide. All code submitted for future homeworks in this class must conform to this style.
Download, code.r, a poorly styled file. Open it in RStudio. Find the problems with style and fix them according to the style guidelines.
attach leads to confusion.
It’s better to be explicit about where variables are coming from. Use the
data argument if a function has it, or use the