This workflow requires two operations: a grouping operation using the group_by
function and a summary operation using the summarise
/summarize
function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.
library(dplyr)
%>%
df group_by(Weekday) %>%
- summarize(min_delay = min(Delay), max_delay = max(Delay))
# A tibble: 5 × 3
Weekday min_delay max_delay
@@ -546,10 +546,10 @@
This time, the data table has four variables. We are wanting to summarize by Quater
and Week
which leaves one variable, Direction
, that needs to be collapsed.
-
+
%>%
df2 group_by(Quarter, Week) %>%
- summarize(min_delay = min(Delay), max_delay = max(Delay))
+summarise(min_delay = min(Delay), max_delay = max(Delay))
# A tibble: 8 × 4
# Groups: Quarter [4]
diff --git a/docs/search.json b/docs/search.json
index fa1d6c52..5c97cd79 100755
--- a/docs/search.json
+++ b/docs/search.json
@@ -326,7 +326,7 @@
"href": "group_by.html#summarizing-data-by-group",
"title": "10 Grouping and summarizing",
"section": "10.1 Summarizing data by group",
- "text": "10.1 Summarizing data by group\nLet’s first create a dataframe listing the average delay time in minutes, by day of the week and by quarter, for Logan airport’s 2014 outbound flights.\n\ndf <- data.frame(\n Weekday = factor(rep(c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\"), each = 4), \n levels = c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\")),\n Quarter = paste0(\"Q\", rep(1:4, each = 5)), \n Delay = c(9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8, 11.1, 10.2, 9.3, 12.2,\n 10.2, 9.2, 9.7, 12.2, 8.1, 7.9, 5.6))\n\nThe goal will be to summarize the table by Weekday as shown in the following graphic.\n\nThe data table has three variables: Weekday, Quarter and Delay. Delay is the value we will summarize which leaves us with one variable to collapse: Quarter. In doing so, we will compute the Delay statistics for all quarters associated with a unique Weekday value.\nThis workflow requires two operations: a grouping operation using the group_by function and a summary operation using the summarise function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.\n\nlibrary(dplyr)\n\ndf %>% \n group_by(Weekday) %>% \n summarize(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 5 × 3\n Weekday min_delay max_delay\n <fct> <dbl> <dbl>\n1 Mon 5.4 9.9\n2 Tues 4.9 9.7\n3 Wed 8.8 11.1\n4 Thurs 9.2 12.2\n5 Fri 5.6 12.2\n\n\nNote that the weekday follows the chronological order as defined in the Weekday factor.\nYou’ll also note that the output is a tibble. This data class is discussed at the end of this page.\n\n10.1.1 Grouping by multiple variables\nYou can group by more than one variable. For example, let’s build another dataframe listing the average delay time in minutes, by quarter, by weekend/weekday and by inbound/outbound status for Logan airport’s 2014 outbound flights.\n\ndf2 <- data.frame(\n Quarter = paste0(\"Q\", rep(1:4, each = 4)), \n Week = rep(c(\"Weekday\", \"Weekend\"), each=2, times=4),\n Direction = rep(c(\"Inbound\", \"Outbound\"), times=8),\n Delay = c(10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, \n 3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))\n\nThe goal will be to summarize the delay time by Quarter and by Week type as shown in the following graphic.\n\nThis time, the data table has four variables. We are wanting to summarize by Quater and Week which leaves one variable, Direction, that needs to be collapsed.\n\ndf2 %>% \n group_by(Quarter, Week) %>% \n summarize(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 8 × 4\n# Groups: Quarter [4]\n Quarter Week min_delay max_delay\n <chr> <chr> <dbl> <dbl>\n1 Q1 Weekday 9.7 10.8\n2 Q1 Weekend 10.4 15.5\n3 Q2 Weekday 8.9 11.8\n4 Q2 Weekend 3.3 5.5\n5 Q3 Weekday 8.8 10.6\n6 Q3 Weekend 5.2 6.6\n7 Q4 Weekday 7.3 9.1\n8 Q4 Weekend 4.4 5.3\n\n\nThe following section demonstrates other grouping/summarizing operations on a larger dataset."
+ "text": "10.1 Summarizing data by group\nLet’s first create a dataframe listing the average delay time in minutes, by day of the week and by quarter, for Logan airport’s 2014 outbound flights.\n\ndf <- data.frame(\n Weekday = factor(rep(c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\"), each = 4), \n levels = c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\")),\n Quarter = paste0(\"Q\", rep(1:4, each = 5)), \n Delay = c(9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8, 11.1, 10.2, 9.3, 12.2,\n 10.2, 9.2, 9.7, 12.2, 8.1, 7.9, 5.6))\n\nThe goal will be to summarize the table by Weekday as shown in the following graphic.\n\nThe data table has three variables: Weekday, Quarter and Delay. Delay is the value we will summarize which leaves us with one variable to collapse: Quarter. In doing so, we will compute the Delay statistics for all quarters associated with a unique Weekday value.\nThis workflow requires two operations: a grouping operation using the group_by function and a summary operation using the summarise/summarize function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.\n\nlibrary(dplyr)\n\ndf %>% \n group_by(Weekday) %>% \n summarise(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 5 × 3\n Weekday min_delay max_delay\n <fct> <dbl> <dbl>\n1 Mon 5.4 9.9\n2 Tues 4.9 9.7\n3 Wed 8.8 11.1\n4 Thurs 9.2 12.2\n5 Fri 5.6 12.2\n\n\nNote that the weekday follows the chronological order as defined in the Weekday factor.\nYou’ll also note that the output is a tibble. This data class is discussed at the end of this page.\n\n10.1.1 Grouping by multiple variables\nYou can group by more than one variable. For example, let’s build another dataframe listing the average delay time in minutes, by quarter, by weekend/weekday and by inbound/outbound status for Logan airport’s 2014 outbound flights.\n\ndf2 <- data.frame(\n Quarter = paste0(\"Q\", rep(1:4, each = 4)), \n Week = rep(c(\"Weekday\", \"Weekend\"), each=2, times=4),\n Direction = rep(c(\"Inbound\", \"Outbound\"), times=8),\n Delay = c(10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, \n 3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))\n\nThe goal will be to summarize the delay time by Quarter and by Week type as shown in the following graphic.\n\nThis time, the data table has four variables. We are wanting to summarize by Quater and Week which leaves one variable, Direction, that needs to be collapsed.\n\ndf2 %>% \n group_by(Quarter, Week) %>% \n summarise(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 8 × 4\n# Groups: Quarter [4]\n Quarter Week min_delay max_delay\n <chr> <chr> <dbl> <dbl>\n1 Q1 Weekday 9.7 10.8\n2 Q1 Weekend 10.4 15.5\n3 Q2 Weekday 8.9 11.8\n4 Q2 Weekend 3.3 5.5\n5 Q3 Weekday 8.8 10.6\n6 Q3 Weekend 5.2 6.6\n7 Q4 Weekday 7.3 9.1\n8 Q4 Weekend 4.4 5.3\n\n\nThe following section demonstrates other grouping/summarizing operations on a larger dataset."
},
{
"objectID": "group_by.html#a-working-example",
diff --git a/group_by.qmd b/group_by.qmd
index 1ad6a9cf..9926e941 100644
--- a/group_by.qmd
+++ b/group_by.qmd
@@ -30,14 +30,14 @@ The goal will be to summarize the table by `Weekday` as shown in the following g
The data table has three variables: `Weekday`, `Quarter` and `Delay`. `Delay` is the value we will summarize which leaves us with one variable to *collapse*: `Quarter`. In doing so, we will compute the `Delay` statistics for all quarters associated with a unique `Weekday` value.
-This workflow requires two operations: a grouping operation using the `group_by` function and a summary operation using the `summarise` function. Here, we'll compute two summary statistics: minimum delay time and maximum delay time.
+This workflow requires two operations: a grouping operation using the `group_by` function and a summary operation using the `summarise`/`summarize` function. Here, we'll compute two summary statistics: minimum delay time and maximum delay time.
```{r}
library(dplyr)
df %>%
group_by(Weekday) %>%
- summarize(min_delay = min(Delay), max_delay = max(Delay))
+ summarise(min_delay = min(Delay), max_delay = max(Delay))
```
Note that the weekday follows the chronological order as defined in the `Weekday` factor.
@@ -66,7 +66,7 @@ This time, the data table has four variables. We are wanting to summarize by `Qu
```{r}
df2 %>%
group_by(Quarter, Week) %>%
- summarize(min_delay = min(Delay), max_delay = max(Delay))
+ summarise(min_delay = min(Delay), max_delay = max(Delay))
```
The following section demonstrates other grouping/summarizing operations on a larger dataset.
diff --git a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.RData b/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.RData
deleted file mode 100644
index cf64d756..00000000
Binary files a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.RData and /dev/null differ
diff --git a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdb b/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdb
deleted file mode 100644
index 9fd2f85d..00000000
Binary files a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdb and /dev/null differ
diff --git a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdx b/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdx
deleted file mode 100644
index 184258b4..00000000
Binary files a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdx and /dev/null differ
diff --git a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.RData b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.RData
index c124d33a..356703ed 100644
Binary files a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.RData and b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.RData differ
diff --git a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdb b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdb
index 2bdfa73a..ad115d6c 100644
Binary files a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdb and b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdb differ
diff --git a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdx b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdx
index 803e59c1..17a55d0a 100644
Binary files a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdx and b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdx differ
diff --git a/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.RData b/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.RData
deleted file mode 100644
index c06aec45..00000000
Binary files a/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.RData and /dev/null differ
diff --git a/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.RData b/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.RData
new file mode 100644
index 00000000..c82e414d
Binary files /dev/null and b/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.RData differ
diff --git a/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.rdb b/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdb
similarity index 100%
rename from group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.rdb
rename to group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdb
diff --git a/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.rdx b/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdx
similarity index 100%
rename from group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.rdx
rename to group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdx
diff --git a/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.RData b/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.RData
new file mode 100644
index 00000000..aa3e22ca
Binary files /dev/null and b/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.RData differ
diff --git a/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdb b/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.rdb
similarity index 100%
rename from group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdb
rename to group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.rdb
diff --git a/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdx b/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.rdx
similarity index 100%
rename from group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdx
rename to group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.rdx
diff --git a/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.RData b/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.RData
deleted file mode 100644
index de730495..00000000
Binary files a/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.RData and /dev/null differ