The script does the following (in order).
step 5 to 13 are within a function expression. step 14 executes the function It's done this way so the intermitten variables aren't saved in memory once the combined data frame is made.
- Check wether the folder containing the data sets exists in the working directory.
- If not then see whether the zip file exist. If not then download it and un-zip it. Meanwhile checks your system to ensure the correct download method.
- If the zip exists then just un-zip it.
- Load dplyr
- Creates a function which makes a combined data frame
- Load activity_labels.txt
- Make the labels factor variable + assign labels with desired levels.
- Load features.txt
- Load X_train.txt and X_test.txt. Row bind them. Select only variables with mean and SD, excluding meanFrequency.
- Load y_train.txt and y_test.txt. Row bind them.
- Load subject_train.txt and subject_test.txt. Row bind them.
- Add activity labels according to the numbers in y$V1
- Column bind all data frames to make the combined data frame.
- A call to the method which actually makes the combined data frame. Call it p
- Give the df variable names. Seperate words with dots.
- Group by Subject Number and Activity
- Create average of each variable for each subject/activity group.
- Gather features into a single column
- Export the tidy data frame into .txt file
- Delete the downloaded files.