diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd index 5397de0e..635a2bbd 100644 --- a/modules/Functions/Functions.Rmd +++ b/modules/Functions/Functions.Rmd @@ -11,7 +11,6 @@ library(dplyr) library(knitr) library(stringr) library(tidyr) -library(emo) library(readr) opts_chunk$set(comment = "") ``` @@ -192,7 +191,7 @@ loud(word = "hooray!") -## Functions for tibbles - curly braces{.codesmall} +## Functions for tibbles - curly braces ```{r} # get means and missing for a specific column @@ -203,13 +202,22 @@ get_summary <- function(dataset, col_name) { } ``` -Examples: +## Functions for tibbles - example{.codesmall} -```{r} +```{r message = FALSE} er <- read_csv(file = "https://daseh.org/data/CO_ER_heat_visits.csv") +``` + +```{r} get_summary(er, visits) +``` + +```{r message = FALSE} +yearly_co2 <- + read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") +``` -yearly_co2 <- read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv") +```{r} get_summary(yearly_co2, `2014`) ``` @@ -217,9 +225,9 @@ get_summary(yearly_co2, `2014`) - Simple functions take the form: - `NEW_FUNCTION <- function(x, y){x + y}` - - Can specify defaults like `function(x = 1, y = 2){x + y}` - -`return` will provide a value as output - - `print` will simply print the value on the screen but not save it + - Can specify defaults like `function(x = 1, y = 2){x + y}` + - `return` will provide a value as output +- Specify a column (from a tibble) inside a function using `{{double curly braces}}` ## Lab Part 1 @@ -245,7 +253,7 @@ sapply(, some_function) Let's apply a function to look at the CO heat-related ER visits dataset. -`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")` +🚨There are no parentheses on the functions!🚨 You can also pipe into your function. @@ -357,7 +365,6 @@ er %>% )) ``` - ## Applying functions with `across` from `dplyr` Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`) @@ -368,20 +375,6 @@ er %>% summarize(across(contains("cl"), mean, na.rm=T)) ``` - - - - - - - - - - - - - - ## Applying functions with `across` from `dplyr` {.smaller} Combining with `mutate()` - the `replace_na` function @@ -401,29 +394,15 @@ yearly_co2 %>% )) ``` +## GUT CHECK! - +Why use `across()`? - +A. Efficiency - faster and less repetitive - - - - - - - - - - - - - - - - - +B. Calculate the cross product +C. Connect across datasets ## `purrr` package @@ -433,22 +412,29 @@ While we won't get into `purrr` too much in this class, its a handy package for # Multiple Data Frames -## Multiple data frames {.smaller} +## Multiple data frames -Lists help us work with multiple data frames +Lists help us work with multiple tibbles / data frames ```{r} -AQ_list <- list(AQ1 = airquality, AQ2 = airquality, AQ3 = airquality) -str(AQ_list) +df_list <- list(AQ = airquality, er = er, yearly_co2 = yearly_co2) ``` +
+ +`select()` from each tibble the numeric columns: + +```{r} +df_list <- + df_list %>% + sapply(function(x) select(x, where(is.numeric))) +``` -## Multiple data frames: `sapply` +## Multiple data frames: `sapply` {.smaller} ```{r} -AQ_list %>% sapply(class) -AQ_list %>% sapply(nrow) -AQ_list %>% sapply(colMeans, na.rm = TRUE) +df_list %>% sapply(nrow) +df_list %>% sapply(colMeans, na.rm = TRUE) ``` @@ -457,7 +443,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE) - Apply your functions with `sapply(
, some_function)` - Use `across()` to apply functions across multiple columns of data - Need to use `across` within `summarize()` or `mutate()` -- Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously +- Can use `sapply` (or `purrr` package) to work with multiple data frames within lists simultaneously ## Lab Part 2 @@ -466,7 +452,20 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE) πŸ’» [Lab](https://daseh.org/modules/Functions/lab/Functions_Lab.Rmd) -```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'} +πŸ“ƒ [Day 9 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-9.pdf) + +πŸ“ƒ [Posit's purrr Cheatsheet](https://rstudio.github.io/cheatsheets/purrr.pdf) + +## Research Survey + +
+ +https://forms.gle/jVue79CjgoMmbVbg9 + +
+
+ +```{r, fig.alt="The End", out.width = "30%", echo = FALSE, fig.align='center'} knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg")) ``` diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd index 980b1af8..edefe55d 100644 --- a/modules/Functions/lab/Functions_Lab_Key.Rmd +++ b/modules/Functions/lab/Functions_Lab_Key.Rmd @@ -11,7 +11,7 @@ knitr::opts_chunk$set(echo = TRUE) # Part 1 -Load all the libraries we will use in this lab. +Load the `tidyverse` package. ```{r message=FALSE} library(tidyverse) @@ -19,21 +19,13 @@ library(tidyverse) ### 1.1 -Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500. +Create a function that: -``` -# General format -NEW_FUNCTION <- function(x, y) x + y -``` -or - -``` -# General format -NEW_FUNCTION <- function(x, y){ -result <- x + y -return(result) -} -``` +* Takes one argument, a vector. +* Returns the sum of the vector and then squares the result. +* Call it "sum_squared". +* Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500. +* Format is `NEW_FUNCTION <- function(x, y) x + y` ```{r 1.1response} nums <- c(2, 7, 21, 30, 90) @@ -50,7 +42,12 @@ sum_squared(x = nums) ### 1.2 -Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE. +Create a function that: + +* takes two arguments, (1) a vector and (2) a numeric value. +* This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. +* Call it `has_n`. +* Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE. ```{r 1.2response} nums <- c(2, 7, 21, 30, 90) @@ -74,11 +71,24 @@ has_n(x = nums) ### P.1 -Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome? +Create a function for the CalEnviroScreen Data. + +* Read in (https://daseh.org/data/CalEnviroScreen_data.csv) +* The function takes an argument for a column name. (use `{{col_name}}`) +* The function creates a ggplot with `{{col_name}}` on the x-axis and `Poverty` on the y-axis. +* Use `geom_point()` +* Test the function using the `Lead` column and `HousingBurden` columns, or other columns of your choice. ```{r P.1response} -b_num <- 11 -has_n(x = nums, n = b_num) +ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv") + +plot_ces <- function(col_name){ + ggplot(data = ces, aes(x = {{col_name}}, y = Poverty)) + + geom_point() +} + +plot_ces(Lead) +plot_ces(HousingBurden) ``` @@ -96,7 +106,12 @@ ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv") ### 2.2 -We want to get some summary statistics on water contamination. Use `across` inside `summarize` to get the sum total variable containing the string "water" AND ending with "Pctl". **Hint**: use `contains()` AND `ends_with()` to select the right columns inside `across`. Remember that `NA` values can influence calculations. +We want to get some summary statistics on water contamination. + +* Use `across` inside `summarize`. +* Choose columns about "water". **Hint**: use `contains("water")` inside `across`. +* Use `mean` as the function inside of `across`. +* Remember that `NA` values can influence calculations. ``` # General format @@ -110,19 +125,26 @@ data %>% ```{r 2.2response} ces %>% summarize(across( - contains("Water") & ends_with("Pctl"), - sum + contains("water"), + mean )) + +# Accounting for NA ces %>% summarize(across( - contains("Water") & ends_with("Pctl"), - function(x) sum(x, na.rm = T) + contains("water"), + function(x) mean(x, na.rm = T) )) ``` ### 2.3 -Use `across` and `mutate` to convert all columns containing the word "water" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl". +Convert all columns that are percentiles into proportions. + +* Use `across` and `mutate` +* Choose columns that contain "Pctl" in the name. **Hint**: use `contains("Pctl")` inside `across`. +* Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). +* Check your work - It will also be easier if you `select(contains("Pctl"))`. ``` # General format @@ -136,7 +158,7 @@ data %>% ```{r 2.3response} ces %>% mutate(across( - contains("water"), + contains("Pctl"), function(x) x / 100 )) %>% select(contains("Pctl")) @@ -149,25 +171,26 @@ ces %>% Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10. -- **Hint**: use `starts_with()` to select the columns that start with "PM". -- Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10. -- A logical test with `mutate` will automatically fill a column with TRUE/FALSE. +* **Hint**: use `starts_with()` to select the columns that start with "PM". +* Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10. +* A logical test with `mutate` (x > 10) will automatically fill a column with TRUE/FALSE. ```{r P.2response} ces %>% mutate(across( starts_with("PM"), function(x) x > 10 - )) + )) %>% + glimpse() # add glimpse to view the changes ``` ### P.3 Take your code from previous question and assign it to the variable `ces_dat`. -- Use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`. -- Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `Asthma` and (2) the y-axis is `PM2.5`. -- You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10" +- Create a ggplot where the x-axis is `Asthma` and the y-axis is `PM2.5`. +- Add a boxplot (`geom_boxplot()`) +- Change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10" ```{r P.3response} ces_dat <- @@ -175,16 +198,23 @@ ces_dat <- mutate(across( starts_with("PM"), function(x) x > 10 - )) %>% - filter(ApproxLocation != "Oakland") - -ces_boxplot <- function(df) { - ggplot(df) + - geom_boxplot(aes( - x = `Asthma`, - y = `PM2.5` - )) + + )) + +ggplot(data = ces_dat, aes(x = `Asthma`, y = `PM2.5`)) + + geom_boxplot() + + labs(x = "ER Visits for Asthma: PM2.5 greater than 10") + +# Make everything a function if you like! +ces_boxplot <- function() { + ces %>% + mutate(across( + starts_with("PM"), + function(x) x > 10 + )) %>% + ggplot(aes(x = `Asthma`, y = `PM2.5`)) + + geom_boxplot() + labs(x = "ER Visits for Asthma: PM2.5 greater than 10") } -ces_boxplot(ces_dat) + +ces_boxplot() ```