Reproducible Research Project Using R & R Markdown
Nick Hepler, University at Albany, College of Engineering and Applied Sciences
This project examines summary statistics concerning Equine Death and Breakdown data obtained from the New York State Gaming Commission. The data contains information on every horse that has broken down, died, sustained a serious injury, or been involved in an incident at a track in New York State since 2009.
The objective of the author was to utilize the R
language and environment for statistical computing and graphics to create a reproducible research project. The project employed Hadley Wickham's tidyverse collection of R
packages and principles as outlined in the R for Data Science book. The project performs the following steps with the data:
- Import
- Tidy
- Tranform
- Visualize
The final report was written using R Markdown from RStudio.
The intention of this research project is to be reproducible. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify and building upon the findings.
Source the download_data.R file in one of the following manners:
- From the
R
command line type:source("download_data.R")
- From the Linux/Mac terminal type:
R CMD BATCH download_data.R
from yourR
working director.
The following resources concerning the raw data are provided through the New York State Open Data website:
- New York State Gaming Commission Equine Death and Breakdown Overview
- New York State Gaming Commission Equine Death and Breakdown Data Dictionary
This project utilized a modified version of the ProjectTemplate
package architecture available in R
.
The following version of R
was used along with the following packages. These are required to complete the analysis. The version information for these packages is included as of the time of final review.
- R version 3.3.2 (2016-10-31) "Sincere Pumpkin Patch"
- tidyverse: Easily Install and Load 'Tidyverse' Packages
The Google's R Style Guide provides the foundation for the coding standards utilized in the R
source files.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.