-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathggplot_forcats.qmd
144 lines (106 loc) · 4.99 KB
/
ggplot_forcats.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: 'tidyverse: using forcats to improve your ggplots'
author: "Andy Catlin"
date: "8/30/2023"
output:
pdf:
toc: true
number-sections: true
colorlinks: true
html_document: default
---
Note that although forcats is part of the "tidyverse", it is not automatically loaded when you run `library(tidyverse)`
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(forcats)
library(gt)
```
# Handy forcats functions for ggplot2
## Comparing followers of world religions
Source: https://en.wikipedia.org/wiki/List_of_religious_populations
I was looking for a simple dataset with count data for many items to demonstrate some basic forcats functions that are useful when creating plots.
```{r, warning=FALSE}
religions = read_csv("https://raw.githubusercontent.com/acatlin/data/master/religions.csv",
show_col_types = FALSE, col_names = FALSE) %>%
rename(religion = X1, followers = X2) %>%
mutate(millions_of_followers = followers/1000000.0) %>%
select(religion, millions_of_followers)
religions
```
## 1A: basic ggplot
Q: What are the most followed religions? A: Use ggplot to compare religious populations
You can also embed plots, for example:
```{r}
religions %>%
ggplot(aes(x = religion, y = millions_of_followers)) +
geom_col(fill = "lightblue") +
labs(x = "religion", y = "millions of followers",
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations")
```
## 1B: How do I flip coordinates?
```{r}
religions %>%
ggplot(aes(x = religion, y = millions_of_followers)) +
geom_col(fill = "lightblue") +
labs(x = "religion", y = "millions of followers",
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations") + coord_flip()
```
## 2A: How do I change sort order?
Revised by: Andy Catlin
Q: How do we change the chart to show the most followed religions first? A: Use forcats::fct_reorder()
```{r}
library(forcats)
ggplot(religions, aes(x = fct_reorder(religion, millions_of_followers),
y = millions_of_followers)) +
geom_col(fill = "lightblue") +
labs(x = "religion", y = "millions of followers",
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations") +
coord_flip()
```
## 2B: How do I combine less frequently used categories?
Q: How do we combine the less-followed religions into a single group? A: Use forcats::fct_other()
```{r}
top5 = unlist(select(head(arrange(religions, desc(millions_of_followers)), 5), religion))
religions %>%
mutate(religion = fct_other(religion, keep = top5, other_level = "Other religions")) %>%
ggplot(aes(x = fct_reorder(religion, millions_of_followers), y = millions_of_followers)) +
geom_col(fill = "lightblue") +
labs(x = "religion", y = "millions of followers",
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations") +
coord_flip()
```
## 2C: Adding a title
Reference: https://www.geeksforgeeks.org/ggplot2-title-and-subtitle-with-different-size-and-color-in-r/
```{r}
religions %>%
mutate(religion = fct_other(religion, keep = top5, other_level = "Other religions")) %>%
ggplot(aes(x = fct_reorder(religion, millions_of_followers), y = millions_of_followers)) +
geom_col(fill = "lightblue") +
labs(x = "religion", y = "millions of followers",
title = "Most Popular Religions",
subtitle = "[2021]",
caption = "https://en.wikipedia.org/wiki/List_of_religious_populations") +
theme(plot.title = element_text(size = 18, color = "blue"),
plot.subtitle = element_text(size = 14, color = "gold")) +
coord_flip()
```
# Tabular Data
```{r}
religions |>
gt(rowname_col = "religion") |>
tab_header(
title = "Most popular religions",
subtitle = md("**2021**")) |>
tab_source_note(
source_note = md("https://en.wikipedia.org/wiki/List_of_religious_populations")) |>
opt_table_font(font = google_font("Montserrat"), weight = 500)
```
# Findings and Recommendations
To use the terminology of descriptive analytics (vs. predictive analytics), there is a single measure (millions of followers) across a single level of a single dimension (religion). Suppose we were able to find counts of religion data every 10 years for the past 200 years, by continent.
Two useful patterns of analysis in descriptive analytics are *relative contribution* and *changes over time*.
Relative contribution: What is the percent of the total that each religion represents (overall? by continent?) Changes over time: How did the counts (and percentages) of different relgions change over time (overall? by continent?)
What other measures might be interesting (e.g. by age group)
How would you represent the information in a table or a chart?
Would you be able to forecast religion counts (by continent) into the future?
etc.