Skip to content

Commit

Permalink
tiny slide update in ggplot, small adjustments to ggplot presentation
Browse files Browse the repository at this point in the history
  • Loading branch information
helenewegener committed Nov 5, 2024
1 parent 6b47dc7 commit fd721ee
Show file tree
Hide file tree
Showing 39 changed files with 1,726 additions and 258 deletions.
8 changes: 4 additions & 4 deletions Teachers/Presentations/presentation3.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ ggplot(iris,

## Bar chart with `geom_bar`

A bar chart is made with the `geom_bar` function and is used to get an overview over the distribution of a single categorical variable, e.g. Flower.Color in this instance. Here we see the number of flowers of each Flower.Color. Notice that the Flower.Colors are sorted alphabetically.
A bar chart is made with the `geom_bar` function and is used to get an overview over the distribution of a single categorical variable, e.g. Flower.Color in this instance. The function treats the x-axis as categorical and **calculates the bar heights based on the number of occurrences** in each category. Here we see the number of flowers of each Flower.Color. Notice that the Flower.Colors are sorted alphabetically.

```{r}
# Save plot in p
Expand Down Expand Up @@ -287,7 +287,9 @@ print(p)

## Bar chart with `geom_col`

Summary of data:
Another way to make a bar chart is by using `geom_col`. Unlike `geom_bar`, which only requires an x-value and automatically counts occurrences, `geom_col` requires both x- and y-values. This makes `geom_col` ideal for cases where you already have pre-calculated values that you want to use as the bar heights.

The mean of the sepal length within each color is calcualted using the `summarize` function.

```{r}
mean_sepal_length_pr_color <- iris %>%
Expand All @@ -297,8 +299,6 @@ mean_sepal_length_pr_color <- iris %>%
head(mean_sepal_length_pr_color)
```

A bar chart with `geom_bar` is used to get an overview over the distribution of a categorical variable relative to a continues variable. Here we see the average Sepal.Length of the flowers for each Flower.Color.

```{r}
ggplot(mean_sepal_length_pr_color, # dataframe
aes(x = Flower.Color, # x-value
Expand Down
268 changes: 268 additions & 0 deletions Teachers/Presentations/presentation3_livecoding.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
---
title: "Presentation 3: ggplot2"
format: html
editor: visual
project:
type: website
output-dir: ../docs
---

## Importing libraries and data

```{r, warning=FALSE,message=FALSE}
library(readxl)
library(writexl)
library(tidyverse)
```

## Load data

The iris dataset is a widely-used dataset in data science, containing 150 observations of iris flowers with features like sepal length, sepal width, petal length, and petal width. It includes three species: Setosa, Versicolor, and Virginica, making it ideal for classification tasks and data visualization.

```{r}
data('iris')
head(iris)
```

## Data wrangling

Making the data more fun to plot.

```{r}
# Define 5 colors and their respective ratios
colors <- c(rep("Red", 40),
rep("Blue", 20),
rep("Yellow",30),
rep("Green", 20),
rep("Purple",40))
set.seed(123) # For reproducibility
# Shuffle the colors to mix them randomly
colors <- sample(colors, replace = TRUE)
# Add the 'Flower.Color' column to the iris dataset
iris$Flower.Color <- colors
```

### Have a look at the dataset

```{r}
head(iris)
colnames(iris)
```

## ggplot2: The basic concepts

This is the starting point of a ggplot. The dataframe and the columns we wish to plot are defined. We have not specified what type of plot we want, hence an empty plot is produced.

```{r}
ggplot(iris, # dataframe
aes(x = Sepal.Length, # x-value
y = Petal.Length)) # y-value
# missing type of plot
```

## Scatter plot with `geom_point`

A scatter plot is made with the `geom_point` function and is used to get an overview over the relationship between two numeric variables. Here we see the relationship between sepal length and width.

```{r}
ggplot(iris,
aes(x = Sepal.Length,
y = Petal.Length))
# scatter plot
# color all points
# color by Species
```

Change color of entire plot by setting it [outside]{.underline} `aes()`.

## Scatter plot with `geom_point` with color stratification

To change colors based on a feature, you need to set it [inside]{.underline} `aes()`. Here we see the relationship between sepal length and width colored by species.

```{r}
ggplot(iris,
aes(x = Sepal.Length,
y = Petal.Length,
color = Species)) +
geom_point()
```

## Boxplot with `geom_histogram`

Boxplots are great to get an overview of continues variables and spot outliers. Can be shown on either axis (x and y).

```{r}
ggplot(iris,
aes(y = Sepal.Length)) +
geom_boxplot()
# boxplot
# split up by Species (x)
# color/fill by Species
```

## Violin plot with `geom_violin`

A violin plot shows the distribution of a continuous variable across different categories, combining the features of a box plot and a density plot. Also, the labels can be edited.

```{r}
ggplot(iris,
aes(y = Sepal.Length,
x = Species))
# violin plot
# add labs
```

## Histogram with `geom_histogram`

Histogram shows the distribution of a continuous variable. You will sometimes get a message that suggests to select another `binwidth`. Do what is says and you will often get nicer plot (something nothing changes).

```{r}
ggplot(iris,
aes(x = Sepal.Length))
# histrogram
# binwidth = 0.5
```

## Bar chart with `geom_bar`

A bar chart is made with the `geom_bar` function and is used to get an overview over the distribution of a single categorical variable, e.g. Flower.Color in this instance. The function treats the x-axis as categorical and **calculates the bar heights based on the number of occurrences** in each category. Here we see the number of flowers of each Flower.Color. Notice that the Flower.Colors are sorted alphabetically.

```{r}
# Save plot in p
p <- ggplot(iris,
aes(x = Flower.Color))
# barplot
# Show p
# Show p with new labels
# Show p again
# Save p with new lables in p (overwrite / reassign)
# Show p
```

Color by species. The bars are stacked by default.

```{r}
# copy plot from above
# add fill Species
# position 'dogde', 'stacked'
# theme_bw()
# theme_classic()
# theme_minimal()
# theme_dark()
# facet_wrap(vars(Species))
```

## Ordering columns

We can order the columns such that the count goes from lowest to highest. This is actually not that easy in R.

First, we see that the class of the Flower.Color is character. Characters are always sorted alphabetically like we saw above.

```{r}
class(iris$Flower.Color)
```

Extract the number of flowers for each Flower.Color.

```{r}
dl_Flower.Color <- iris %>%
group_by(Flower.Color) %>%
summarise(n = n()) %>%
arrange(desc(n))
dl_Flower.Color
dl_Flower.Color$Flower.Color
```

Change the class of the Flower.Color feature to factor and add levels according to the number of flowers with each color.

```{r}
iris$Flower.Color <- factor(iris$Flower.Color,
levels = dl_Flower.Color$Flower.Color)
```

Check class now

```{r}
class(iris$Flower.Color)
```

Now we do the same plot as before and we see that the order has changed to range from largest to smallest Flower.Colors group. The plot is saved in the variable p.

```{r}
p <- ggplot(iris, # dataframe
aes(x = Flower.Color)) + # x-value
geom_bar() # type of plot
p
```

We can also flip the chart. We update the plot, p, be reassignment.

```{r}
p <- p + coord_flip()
```

Since we are working with colors, we can change the colors of the bars to match the groups.

R color chart [here](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf)

```{r}
# Define color palette
color_palette <- c("Red" = "red3",
"Blue" = "cornflowerblue",
"Yellow" = "lightgoldenrod1",
"Green" = "darkolivegreen2",
"Purple" = "darkorchid3")
p <- p +
aes(fill = Flower.Color) + # add the fill ascetics
scale_fill_manual(values = color_palette) # set the fill color according to the color palette
print(p)
```

## Bar chart with `geom_col`

Another way to make a bar chart is by using `geom_col`. Unlike `geom_bar`, which only requires an x-value and automatically counts occurrences, `geom_col` requires both x- and y-values. This makes `geom_col` ideal for cases where you already have pre-calculated values that you want to use as the bar heights.

The mean of the sepal length within each color is calcualted using the `summarize` function.

```{r}
mean_sepal_length_pr_color <- iris %>%
group_by(Flower.Color) %>%
summarize(mean_Sepal.Length = mean(Sepal.Length))
head(mean_sepal_length_pr_color)
```

```{r}
ggplot(mean_sepal_length_pr_color,
aes(x = Flower.Color,
y = mean_Sepal.Length)) +
geom_col()
```
Binary file modified Teachers/Slideshow/FromExceltoR.pdf
Binary file not shown.
Binary file modified Teachers/Slideshow/FromExceltoR.pptx
Binary file not shown.
49 changes: 22 additions & 27 deletions Teachers/Slideshow/material_for_slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,71 +12,65 @@ library(tidyverse)
```{r}
set.seed(123)
df = data.frame(sample = sample(x = c('A','B','C','D'), size = 100, replace = T),
measure = sample(x = 1:20, size = 100, replace = T))
df = data.frame(Sample = sample(x = c('A','B','C','D'), size = 100, replace = T),
Measure = sample(x = 1:20, size = 100, replace = T))
head(df)
```

```{r}
ggplot(df,
aes(x = sample,
y = measure))
aes(x = Sample,
y = Measure))
ggsave('material_for_slides/ggplot2_additive_structure_plot1.png')
```

```{r}
ggplot(df,
aes(x = sample,
y = measure)) +
aes(x = Sample,
y = Measure)) +
geom_boxplot()
ggsave('material_for_slides/ggplot2_additive_structure_plot2.png')
```

```{r}
ggplot(df,
aes(x = sample,
y = measure,
fill = sample)) +
aes(x = Sample,
y = Measure,
fill = Sample)) +
geom_boxplot()
ggsave('material_for_slides/ggplot2_additive_structure_plot3.png')
```

```{r}
ggplot(df,
aes(x = sample,
y = measure,
fill = sample)) +
aes(x = Sample,
y = Measure,
fill = Sample)) +
geom_boxplot() +
labs(x = 'Sample',
y = 'Measure (cm)',
title = 'Boxplots of measurements stratified by sample')
labs(title = 'Boxplots of measurements stratified by sample')
ggsave('material_for_slides/ggplot2_additive_structure_plot4.png')
```

```{r}
ggplot(df,
aes(x = sample,
y = measure,
fill = sample)) +
aes(x = Sample,
y = Measure,
fill = Sample)) +
geom_boxplot() +
labs(x = 'Sample',
y = 'Measure (cm)',
title = 'Boxplots of measurements stratified by sample') +
labs(title = 'Boxplots of measurements stratified by sample') +
scale_fill_manual(values = c("pink", "lightgreen", "lavender", "lightblue"))
ggsave('material_for_slides/ggplot2_additive_structure_plot5.png')
```

```{r}
ggplot(df,
aes(x = sample,
y = measure,
fill = sample)) +
aes(x = Sample,
y = Measure,
fill = Sample)) +
geom_boxplot() +
labs(x = 'Sample',
y = 'Measure (cm)',
title = 'Boxplots of measurements stratified by sample') +
labs(title = 'Boxplots of measurements stratified by sample') +
scale_fill_manual(values = c("pink", "lightgreen", "lavender", "lightblue")) +
theme_bw()
Expand All @@ -94,6 +88,7 @@ country <- sample(x = c('France', 'Italy'), size = nrow(wine), replace = TRUE)
wine$Country <- country
head(wine)
wine %>% select(Type, Alcohol, Country, Magnesium, Phenols) %>% head()
```

```{r}
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit fd721ee

Please sign in to comment.