diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd
index 5397de0e..635a2bbd 100644
--- a/modules/Functions/Functions.Rmd
+++ b/modules/Functions/Functions.Rmd
@@ -11,7 +11,6 @@ library(dplyr)
library(knitr)
library(stringr)
library(tidyr)
-library(emo)
library(readr)
opts_chunk$set(comment = "")
```
@@ -192,7 +191,7 @@ loud(word = "hooray!")
-## Functions for tibbles - curly braces{.codesmall}
+## Functions for tibbles - curly braces
```{r}
# get means and missing for a specific column
@@ -203,13 +202,22 @@ get_summary <- function(dataset, col_name) {
}
```
-Examples:
+## Functions for tibbles - example{.codesmall}
-```{r}
+```{r message = FALSE}
er <- read_csv(file = "https://daseh.org/data/CO_ER_heat_visits.csv")
+```
+
+```{r}
get_summary(er, visits)
+```
+
+```{r message = FALSE}
+yearly_co2 <-
+ read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
+```
-yearly_co2 <- read_csv(file = "https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
+```{r}
get_summary(yearly_co2, `2014`)
```
@@ -217,9 +225,9 @@ get_summary(yearly_co2, `2014`)
- Simple functions take the form:
- `NEW_FUNCTION <- function(x, y){x + y}`
- - Can specify defaults like `function(x = 1, y = 2){x + y}`
- -`return` will provide a value as output
- - `print` will simply print the value on the screen but not save it
+ - Can specify defaults like `function(x = 1, y = 2){x + y}`
+ - `return` will provide a value as output
+- Specify a column (from a tibble) inside a function using `{{double curly braces}}`
## Lab Part 1
@@ -245,7 +253,7 @@ sapply(, some_function)
Let's apply a function to look at the CO heat-related ER visits dataset.
-`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
+π¨There are no parentheses on the functions!π¨
You can also pipe into your function.
@@ -357,7 +365,6 @@ er %>%
))
```
-
## Applying functions with `across` from `dplyr`
Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`)
@@ -368,20 +375,6 @@ er %>%
summarize(across(contains("cl"), mean, na.rm=T))
```
-
-
-
-
-
-
-
-
-
-
-
-
-
-
## Applying functions with `across` from `dplyr` {.smaller}
Combining with `mutate()` - the `replace_na` function
@@ -401,29 +394,15 @@ yearly_co2 %>%
))
```
+## GUT CHECK!
-
+Why use `across()`?
-
+A. Efficiency - faster and less repetitive
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+B. Calculate the cross product
+C. Connect across datasets
## `purrr` package
@@ -433,22 +412,29 @@ While we won't get into `purrr` too much in this class, its a handy package for
# Multiple Data Frames
-## Multiple data frames {.smaller}
+## Multiple data frames
-Lists help us work with multiple data frames
+Lists help us work with multiple tibbles / data frames
```{r}
-AQ_list <- list(AQ1 = airquality, AQ2 = airquality, AQ3 = airquality)
-str(AQ_list)
+df_list <- list(AQ = airquality, er = er, yearly_co2 = yearly_co2)
```
+
+
+`select()` from each tibble the numeric columns:
+
+```{r}
+df_list <-
+ df_list %>%
+ sapply(function(x) select(x, where(is.numeric)))
+```
-## Multiple data frames: `sapply`
+## Multiple data frames: `sapply` {.smaller}
```{r}
-AQ_list %>% sapply(class)
-AQ_list %>% sapply(nrow)
-AQ_list %>% sapply(colMeans, na.rm = TRUE)
+df_list %>% sapply(nrow)
+df_list %>% sapply(colMeans, na.rm = TRUE)
```
@@ -457,7 +443,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
- Apply your functions with `sapply(, some_function)`
- Use `across()` to apply functions across multiple columns of data
- Need to use `across` within `summarize()` or `mutate()`
-- Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously
+- Can use `sapply` (or `purrr` package) to work with multiple data frames within lists simultaneously
## Lab Part 2
@@ -466,7 +452,20 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
π» [Lab](https://daseh.org/modules/Functions/lab/Functions_Lab.Rmd)
-```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
+π [Day 9 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-9.pdf)
+
+π [Posit's purrr Cheatsheet](https://rstudio.github.io/cheatsheets/purrr.pdf)
+
+## Research Survey
+
+
+
+https://forms.gle/jVue79CjgoMmbVbg9
+
+
+
+
+```{r, fig.alt="The End", out.width = "30%", echo = FALSE, fig.align='center'}
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
```
diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd
index 980b1af8..edefe55d 100644
--- a/modules/Functions/lab/Functions_Lab_Key.Rmd
+++ b/modules/Functions/lab/Functions_Lab_Key.Rmd
@@ -11,7 +11,7 @@ knitr::opts_chunk$set(echo = TRUE)
# Part 1
-Load all the libraries we will use in this lab.
+Load the `tidyverse` package.
```{r message=FALSE}
library(tidyverse)
@@ -19,21 +19,13 @@ library(tidyverse)
### 1.1
-Create a function that takes one argument, a vector, and returns the sum of the vector and then squares the result. Call it "sum_squared". Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
+Create a function that:
-```
-# General format
-NEW_FUNCTION <- function(x, y) x + y
-```
-or
-
-```
-# General format
-NEW_FUNCTION <- function(x, y){
-result <- x + y
-return(result)
-}
-```
+* Takes one argument, a vector.
+* Returns the sum of the vector and then squares the result.
+* Call it "sum_squared".
+* Test your function on the vector `c(2,7,21,30,90)` - you should get the answer 22500.
+* Format is `NEW_FUNCTION <- function(x, y) x + y`
```{r 1.1response}
nums <- c(2, 7, 21, 30, 90)
@@ -50,7 +42,12 @@ sum_squared(x = nums)
### 1.2
-Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
+Create a function that:
+
+* takes two arguments, (1) a vector and (2) a numeric value.
+* This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`.
+* Call it `has_n`.
+* Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
```{r 1.2response}
nums <- c(2, 7, 21, 30, 90)
@@ -74,11 +71,24 @@ has_n(x = nums)
### P.1
-Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
+Create a function for the CalEnviroScreen Data.
+
+* Read in (https://daseh.org/data/CalEnviroScreen_data.csv)
+* The function takes an argument for a column name. (use `{{col_name}}`)
+* The function creates a ggplot with `{{col_name}}` on the x-axis and `Poverty` on the y-axis.
+* Use `geom_point()`
+* Test the function using the `Lead` column and `HousingBurden` columns, or other columns of your choice.
```{r P.1response}
-b_num <- 11
-has_n(x = nums, n = b_num)
+ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
+
+plot_ces <- function(col_name){
+ ggplot(data = ces, aes(x = {{col_name}}, y = Poverty)) +
+ geom_point()
+}
+
+plot_ces(Lead)
+plot_ces(HousingBurden)
```
@@ -96,7 +106,12 @@ ces <- read_csv("https://daseh.org/data/CalEnviroScreen_data.csv")
### 2.2
-We want to get some summary statistics on water contamination. Use `across` inside `summarize` to get the sum total variable containing the string "water" AND ending with "Pctl". **Hint**: use `contains()` AND `ends_with()` to select the right columns inside `across`. Remember that `NA` values can influence calculations.
+We want to get some summary statistics on water contamination.
+
+* Use `across` inside `summarize`.
+* Choose columns about "water". **Hint**: use `contains("water")` inside `across`.
+* Use `mean` as the function inside of `across`.
+* Remember that `NA` values can influence calculations.
```
# General format
@@ -110,19 +125,26 @@ data %>%
```{r 2.2response}
ces %>%
summarize(across(
- contains("Water") & ends_with("Pctl"),
- sum
+ contains("water"),
+ mean
))
+
+# Accounting for NA
ces %>%
summarize(across(
- contains("Water") & ends_with("Pctl"),
- function(x) sum(x, na.rm = T)
+ contains("water"),
+ function(x) mean(x, na.rm = T)
))
```
### 2.3
-Use `across` and `mutate` to convert all columns containing the word "water" into proportions (i.e., divide that value by 100). **Hint**: use `contains()` to select the right columns within `across()`. Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`). It will also be easier to check your work if you `select()` columns that match "Pctl".
+Convert all columns that are percentiles into proportions.
+
+* Use `across` and `mutate`
+* Choose columns that contain "Pctl" in the name. **Hint**: use `contains("Pctl")` inside `across`.
+* Use an anonymous function ("function on the fly") to divide by 100 (`function(x) x / 100`).
+* Check your work - It will also be easier if you `select(contains("Pctl"))`.
```
# General format
@@ -136,7 +158,7 @@ data %>%
```{r 2.3response}
ces %>%
mutate(across(
- contains("water"),
+ contains("Pctl"),
function(x) x / 100
)) %>%
select(contains("Pctl"))
@@ -149,25 +171,26 @@ ces %>%
Use `across` and `mutate` to convert all columns starting with the string "PM" into a binary variable: TRUE if the value is greater than 10 and FALSE if less than or equal to 10.
-- **Hint**: use `starts_with()` to select the columns that start with "PM".
-- Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
-- A logical test with `mutate` will automatically fill a column with TRUE/FALSE.
+* **Hint**: use `starts_with()` to select the columns that start with "PM".
+* Use an anonymous function ("function on the fly") to do a logical test if the value is greater than 10.
+* A logical test with `mutate` (x > 10) will automatically fill a column with TRUE/FALSE.
```{r P.2response}
ces %>%
mutate(across(
starts_with("PM"),
function(x) x > 10
- ))
+ )) %>%
+ glimpse() # add glimpse to view the changes
```
### P.3
Take your code from previous question and assign it to the variable `ces_dat`.
-- Use `filter()` to drop any rows where "Oakland" appears in `ApproxLocation`. Make sure to reassign this to `ces_dat`.
-- Create a ggplot boxplot (`geom_boxplot()`) where (1) the x-axis is `Asthma` and (2) the y-axis is `PM2.5`.
-- You change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
+- Create a ggplot where the x-axis is `Asthma` and the y-axis is `PM2.5`.
+- Add a boxplot (`geom_boxplot()`)
+- Change the `labs()` layer so that the x-axis is "ER Visits for Asthma: PM2.5 greater than 10"
```{r P.3response}
ces_dat <-
@@ -175,16 +198,23 @@ ces_dat <-
mutate(across(
starts_with("PM"),
function(x) x > 10
- )) %>%
- filter(ApproxLocation != "Oakland")
-
-ces_boxplot <- function(df) {
- ggplot(df) +
- geom_boxplot(aes(
- x = `Asthma`,
- y = `PM2.5`
- )) +
+ ))
+
+ggplot(data = ces_dat, aes(x = `Asthma`, y = `PM2.5`)) +
+ geom_boxplot() +
+ labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
+
+# Make everything a function if you like!
+ces_boxplot <- function() {
+ ces %>%
+ mutate(across(
+ starts_with("PM"),
+ function(x) x > 10
+ )) %>%
+ ggplot(aes(x = `Asthma`, y = `PM2.5`)) +
+ geom_boxplot() +
labs(x = "ER Visits for Asthma: PM2.5 greater than 10")
}
-ces_boxplot(ces_dat)
+
+ces_boxplot()
```