-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tiny slide update in ggplot, small adjustments to ggplot presentation
- Loading branch information
1 parent
6b47dc7
commit fd721ee
Showing
39 changed files
with
1,726 additions
and
258 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,268 @@ | ||
--- | ||
title: "Presentation 3: ggplot2" | ||
format: html | ||
editor: visual | ||
project: | ||
type: website | ||
output-dir: ../docs | ||
--- | ||
|
||
## Importing libraries and data | ||
|
||
```{r, warning=FALSE,message=FALSE} | ||
library(readxl) | ||
library(writexl) | ||
library(tidyverse) | ||
``` | ||
|
||
## Load data | ||
|
||
The iris dataset is a widely-used dataset in data science, containing 150 observations of iris flowers with features like sepal length, sepal width, petal length, and petal width. It includes three species: Setosa, Versicolor, and Virginica, making it ideal for classification tasks and data visualization. | ||
|
||
```{r} | ||
data('iris') | ||
head(iris) | ||
``` | ||
|
||
## Data wrangling | ||
|
||
Making the data more fun to plot. | ||
|
||
```{r} | ||
# Define 5 colors and their respective ratios | ||
colors <- c(rep("Red", 40), | ||
rep("Blue", 20), | ||
rep("Yellow",30), | ||
rep("Green", 20), | ||
rep("Purple",40)) | ||
set.seed(123) # For reproducibility | ||
# Shuffle the colors to mix them randomly | ||
colors <- sample(colors, replace = TRUE) | ||
# Add the 'Flower.Color' column to the iris dataset | ||
iris$Flower.Color <- colors | ||
``` | ||
|
||
### Have a look at the dataset | ||
|
||
```{r} | ||
head(iris) | ||
colnames(iris) | ||
``` | ||
|
||
## ggplot2: The basic concepts | ||
|
||
This is the starting point of a ggplot. The dataframe and the columns we wish to plot are defined. We have not specified what type of plot we want, hence an empty plot is produced. | ||
|
||
```{r} | ||
ggplot(iris, # dataframe | ||
aes(x = Sepal.Length, # x-value | ||
y = Petal.Length)) # y-value | ||
# missing type of plot | ||
``` | ||
|
||
## Scatter plot with `geom_point` | ||
|
||
A scatter plot is made with the `geom_point` function and is used to get an overview over the relationship between two numeric variables. Here we see the relationship between sepal length and width. | ||
|
||
```{r} | ||
ggplot(iris, | ||
aes(x = Sepal.Length, | ||
y = Petal.Length)) | ||
# scatter plot | ||
# color all points | ||
# color by Species | ||
``` | ||
|
||
Change color of entire plot by setting it [outside]{.underline} `aes()`. | ||
|
||
## Scatter plot with `geom_point` with color stratification | ||
|
||
To change colors based on a feature, you need to set it [inside]{.underline} `aes()`. Here we see the relationship between sepal length and width colored by species. | ||
|
||
```{r} | ||
ggplot(iris, | ||
aes(x = Sepal.Length, | ||
y = Petal.Length, | ||
color = Species)) + | ||
geom_point() | ||
``` | ||
|
||
## Boxplot with `geom_histogram` | ||
|
||
Boxplots are great to get an overview of continues variables and spot outliers. Can be shown on either axis (x and y). | ||
|
||
```{r} | ||
ggplot(iris, | ||
aes(y = Sepal.Length)) + | ||
geom_boxplot() | ||
# boxplot | ||
# split up by Species (x) | ||
# color/fill by Species | ||
``` | ||
|
||
## Violin plot with `geom_violin` | ||
|
||
A violin plot shows the distribution of a continuous variable across different categories, combining the features of a box plot and a density plot. Also, the labels can be edited. | ||
|
||
```{r} | ||
ggplot(iris, | ||
aes(y = Sepal.Length, | ||
x = Species)) | ||
# violin plot | ||
# add labs | ||
``` | ||
|
||
## Histogram with `geom_histogram` | ||
|
||
Histogram shows the distribution of a continuous variable. You will sometimes get a message that suggests to select another `binwidth`. Do what is says and you will often get nicer plot (something nothing changes). | ||
|
||
```{r} | ||
ggplot(iris, | ||
aes(x = Sepal.Length)) | ||
# histrogram | ||
# binwidth = 0.5 | ||
``` | ||
|
||
## Bar chart with `geom_bar` | ||
|
||
A bar chart is made with the `geom_bar` function and is used to get an overview over the distribution of a single categorical variable, e.g. Flower.Color in this instance. The function treats the x-axis as categorical and **calculates the bar heights based on the number of occurrences** in each category. Here we see the number of flowers of each Flower.Color. Notice that the Flower.Colors are sorted alphabetically. | ||
|
||
```{r} | ||
# Save plot in p | ||
p <- ggplot(iris, | ||
aes(x = Flower.Color)) | ||
# barplot | ||
# Show p | ||
# Show p with new labels | ||
# Show p again | ||
# Save p with new lables in p (overwrite / reassign) | ||
# Show p | ||
``` | ||
|
||
Color by species. The bars are stacked by default. | ||
|
||
```{r} | ||
# copy plot from above | ||
# add fill Species | ||
# position 'dogde', 'stacked' | ||
# theme_bw() | ||
# theme_classic() | ||
# theme_minimal() | ||
# theme_dark() | ||
# facet_wrap(vars(Species)) | ||
``` | ||
|
||
## Ordering columns | ||
|
||
We can order the columns such that the count goes from lowest to highest. This is actually not that easy in R. | ||
|
||
First, we see that the class of the Flower.Color is character. Characters are always sorted alphabetically like we saw above. | ||
|
||
```{r} | ||
class(iris$Flower.Color) | ||
``` | ||
|
||
Extract the number of flowers for each Flower.Color. | ||
|
||
```{r} | ||
dl_Flower.Color <- iris %>% | ||
group_by(Flower.Color) %>% | ||
summarise(n = n()) %>% | ||
arrange(desc(n)) | ||
dl_Flower.Color | ||
dl_Flower.Color$Flower.Color | ||
``` | ||
|
||
Change the class of the Flower.Color feature to factor and add levels according to the number of flowers with each color. | ||
|
||
```{r} | ||
iris$Flower.Color <- factor(iris$Flower.Color, | ||
levels = dl_Flower.Color$Flower.Color) | ||
``` | ||
|
||
Check class now | ||
|
||
```{r} | ||
class(iris$Flower.Color) | ||
``` | ||
|
||
Now we do the same plot as before and we see that the order has changed to range from largest to smallest Flower.Colors group. The plot is saved in the variable p. | ||
|
||
```{r} | ||
p <- ggplot(iris, # dataframe | ||
aes(x = Flower.Color)) + # x-value | ||
geom_bar() # type of plot | ||
p | ||
``` | ||
|
||
We can also flip the chart. We update the plot, p, be reassignment. | ||
|
||
```{r} | ||
p <- p + coord_flip() | ||
``` | ||
|
||
Since we are working with colors, we can change the colors of the bars to match the groups. | ||
|
||
R color chart [here](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf) | ||
|
||
```{r} | ||
# Define color palette | ||
color_palette <- c("Red" = "red3", | ||
"Blue" = "cornflowerblue", | ||
"Yellow" = "lightgoldenrod1", | ||
"Green" = "darkolivegreen2", | ||
"Purple" = "darkorchid3") | ||
p <- p + | ||
aes(fill = Flower.Color) + # add the fill ascetics | ||
scale_fill_manual(values = color_palette) # set the fill color according to the color palette | ||
print(p) | ||
``` | ||
|
||
## Bar chart with `geom_col` | ||
|
||
Another way to make a bar chart is by using `geom_col`. Unlike `geom_bar`, which only requires an x-value and automatically counts occurrences, `geom_col` requires both x- and y-values. This makes `geom_col` ideal for cases where you already have pre-calculated values that you want to use as the bar heights. | ||
|
||
The mean of the sepal length within each color is calcualted using the `summarize` function. | ||
|
||
```{r} | ||
mean_sepal_length_pr_color <- iris %>% | ||
group_by(Flower.Color) %>% | ||
summarize(mean_Sepal.Length = mean(Sepal.Length)) | ||
head(mean_sepal_length_pr_color) | ||
``` | ||
|
||
```{r} | ||
ggplot(mean_sepal_length_pr_color, | ||
aes(x = Flower.Color, | ||
y = mean_Sepal.Length)) + | ||
geom_col() | ||
``` |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file modified
BIN
+733 Bytes
(100%)
Teachers/Slideshow/material_for_slides/ggplot2_additive_structure_plot1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+664 Bytes
(100%)
Teachers/Slideshow/material_for_slides/ggplot2_additive_structure_plot2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+1.04 KB
(100%)
Teachers/Slideshow/material_for_slides/ggplot2_additive_structure_plot3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-2.26 KB
(96%)
Teachers/Slideshow/material_for_slides/ggplot2_additive_structure_plot4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-2.25 KB
(96%)
Teachers/Slideshow/material_for_slides/ggplot2_additive_structure_plot5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-2.27 KB
(96%)
Teachers/Slideshow/material_for_slides/ggplot2_additive_structure_plot6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.