Merge pull request #271 from fhdsl/resources-added

Add resources from JHU Course
fhdsl · Jan 24, 2025 · fddccc7 · fddccc7
2 parents 27bc33d + d31c9bd
commit fddccc7
Show file tree

Hide file tree

Showing 2 changed files with 125 additions and 1 deletion.
diff --git a/resources.Rmd b/resources.Rmd
@@ -14,6 +14,7 @@ Here are additional resources to help you on your R journey - either before, dur
 
 <details open><summary> <span style = "color: #5383bb;"> **Help Getting Started**</span></summary><br>
 
+- [Guide to using Slack]( https://slack.com/help/articles/218080037-Getting-started-for-new-Slack-users)
 - [R reference card](http://cran.r-project.org/doc/contrib/Short-refcard.pdf)
 - [R introductory guide](https://cran.r-project.org/doc/manuals/r-release/R-intro.html)
 - [R jargon](https://link.springer.com/content/pdf/bbm%3A978-1-4419-1318-0%2F1.pdf)
@@ -71,6 +72,11 @@ Here are additional resources to help you on your R journey - either before, dur
 - [Video](https://www.youtube.com/watch?v=Ao9e0cDzMrE) for Mac users who want to see how to move files around (especially from downloads)
 - [Extra information about file paths](https://docs.google.com/presentation/d/18u1Vhd3Uq-QprC0btpxS_-Ka-LKVUvncyoqdbGdb-g4/edit?usp=sharing)
 
+**Need extra guidance on wrangling?**
+
+- [Guide on `janitor`](https://hutchdatascience.org/data_snacks/r_snacks/janitor.html)
+- [Guide on cleaning complicated names](https://daseh.org/resources/cleaning_names.html)
+
 **Need help with joins?**
 
 - [`full-join()` animation](https://github.com/gadenbuie/tidyexplain/blob/master/images/full-join.gif)
@@ -93,7 +99,13 @@ Here are additional resources to help you on your R journey - either before, dur
 - [Modeling 101](https://jhudatascience.org/tidyversecourse/model.html#linear-modeling)
 - [Common statistical tests are linear models](https://lindeloev.github.io/tests-as-linear/) (why understanding linear models will get you far!)
 - [Interpreting GLM output (e.g., deviance)](https://www.statology.org/null-residual-deviance/)
+- [Guide on why `set.seed` can be useful](https://rsample.tidymodels.org/reference/bootstraps.html)
+
+**Want help creating tables?**
 
+- [Guide on making nice tables from stats tests in R](https://www.danieldsjoberg.com/gtsummary/articles/tbl_summary.html)
+- [Guide on making custom styled tables in R with the `kableExtra` package](https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html)
+- [Guide on using `DT table` to make interactive tables](https://rstudio.github.io/DT/)
 
 </details>
 
@@ -132,6 +144,7 @@ Here are additional resources to help you on your R journey - either before, dur
 (See page 505)
 - [R <-> SAS Cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/sas-r.pdf)
 - [SAS to R Converter](https://www.codeconvert.ai/sas-to-r-converter)  
+- [Guide to learning R as a SAS user](https://hutchdatascience.org/data_snacks/r_snacks/sas2r.html)  
 - You might also find large language models like ChatGPT useful for code conversion. Be sure to check the output because AI makes mistakes!  
 
 </details>
@@ -140,7 +153,7 @@ Here are additional resources to help you on your R journey - either before, dur
 
 <details><summary> <span style = "color: #5383bb;"> **Comparison of Python and R**</span></summary><br>
 
-- A helpful [blog post](https://www.ibm.com/cloud/blog/python-vs-r) about the difference between these two languages.
+- A helpful [article about the difference between these two languages](https://www.ibm.com/think/topics/python-vs-r).
 
 </details>
 

diff --git a/resources/cleaning_names.Rmd b/resources/cleaning_names.Rmd
@@ -0,0 +1,111 @@
+---
+title: "Cleaning complicated column names"
+output: 
+  html_document:
+    css: ../docs/web_styles.css
+    toc: true
+---
+
+
+## Cleaning a common pattern from names
+
+Let's say that we already have technically clean names - in that they don't have spaces or punctuation or start with a number. However, let's say that there is a redundant word ("percent") that we want to remove or add to multiple columns.
+
+
+First let's load the packages we will need. We will show some functions from   `janitor` and the `tidyverse`:
+
+```{r, echo = FALSE}
+install.packages("janitor", repos='http://cran.us.r-project.org')
+```
+
+```{r}
+#install.packages("janitor")
+library(tidyverse)
+library(janitor)
+```
+
+First let's make some data:
+
+```{r}
+data_to_clean <- tibble(State = c("Texas", "Utah", "Maryland", "Ohio"),
+                        tax_percent = c(10, 20, 60, 40),
+                        literacy_percent = c(70, 80, 80, 75),
+                        above_poverty_percent = c(60, 70, 50, 60))
+data_to_clean
+
+```
+
+
+We can use the `rename_with` function of `dplyr` and `str_remove` of `stringr` to remove the pattern "_percent" from each of the column names.
+
+Here we use the `~` and the `.` to indicate that we are using `str_remove` and all the column names. If it finds the pattern it will remove it.
+
+
+```{r}
+data_to_clean %>% rename_with(~str_remove(., '_percent'))
+
+```
+
+Nice! That simplified our names very easily!
+
+## Cleaning names with numbers and punctuation
+
+
+We can use patterns with regex - see this [regex cheatsheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf) for help to remove unwanted characters! We adapted some code from this [source](https://stackoverflow.com/questions/71151470/remove-characters-from-column-names).
+
+First we will make some very messy data:
+
+```{r}
+
+d <- tibble("Year" = 1:5,
+       "Info" = 1:5,
+       "1. Products" = 1:5,
+       "2. Rate" = 1:5,
+       "3. Price" = 1:5,
+       "29. Other" = 1:5)
+d
+```
+
+Now we can remove the numbers and punctuation in a similar way as we did before using `rename_with` and `str_remove`, but this time we specify a few things:
+
+- that we want to remove digits with `[:digits:]` (based on the [regex cheatsheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf))
+
+- that we want to remove possibly one or more digits with the `+` (based on the [regex cheatsheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf))
+
+- that we want to remove a period (which needs two `\\` based on the (based on the [regex cheatsheet](https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf)) too!) and a space
+
+Here we go:
+
+```{r}
+d %>% 
+  rename_with(~str_remove(., "[:digit:]+\\. "))
+
+```
+
+Nice, that is better!
+
+## Using values of a specific row for column names
+
+First let's make some messy data that is missing values in the first row and has possible better column names in the second row. We adapted code from this [source](https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html#remove_constant-columns).
+
+This can often happen when we read in data.
+
+```{r}
+
+dirt <- data.frame(X_1 = c(NA, "ID", 1:3),
+           X_2 = c(NA, "Value", 4:6))
+
+dirt
+```
+
+
+The function `row_to_names` from the `janitor` package (not part of the `tidyverse` - so make sure you install and load it!) can be really helpful for this. 
+
+We can use the `row_number` argument of `row_to_names` to specify that the column names can be found in the second row.
+
+```{r}
+row_to_names(dirt, row_number = 2) # our column names can be found in row 2!
+
+```
+
+