-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathBW_Gather.Rmd
82 lines (61 loc) · 2.85 KB
/
BW_Gather.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
title: "BW_Gather"
author: "Ben Wolin"
date: "2024-11-10"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Demonstrating the Gather() functionality in tidyr
Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.
## Pulling in an example from the word bank data catalog.
https://energydata.info/dataset/world-climate-watch/resource/1631a4e8-a59a-4026-aa36-162df9b15340
```{r}
library(tidyr)
Consumption_raw <- read.csv('https://raw.githubusercontent.com/bwolin99/TestRepo/refs/heads/main/606%20Final/Meat_Consumption.csv',row.names = NULL)
head(Consumption_raw)
```
As we can see in this dataset there are many columns indicating the year in which the measurement was taken. We would like to group these years into one column and the total emissions into the other.
## Grouping with Gather()
```{r}
Consumption <- gather(Consumption_raw, key = 'year', value = 'Meat_Consumption',2:28)
head(Consumption)
```
Now we turned 28 columns into 3 using the Gather function.
## Expanding on Gather with Separate - added by LMG
In this section, we demonstrate how to use the `separate()` function from `tidyr` to split a composite column into multiple columns. For example, if the `country` column contains both country names and regions (e.g., `"USA-NorthAmerica"`), we can split it into two separate columns.
### Splitting the Country Column
```{r separate}
# Simulate composite column
Consumption_raw <- Consumption_raw %>%
mutate(country = paste0(country, "-Region"))
# Separate the composite column into two columns
Consumption_split <- Consumption_raw %>%
separate(country, into = c("country_name", "region"), sep = "-")
# View the first few rows of the updated data
head(Consumption_split)
```
After splitting the column, we can apply the gather() function to reshape the data as before:
```{r new gather}
# Use gather on the updated data
Consumption_final <- gather(Consumption_split, key = "year", value = "Meat_Consumption", 2:28)
# View the first few rows of the final dataset
head(Consumption_final)
```
Finally, we group the data by region to summarize average meat consumption:
```{r summarize}
# Summarize meat consumption by region
Consumption_summary <- Consumption_final %>%
group_by(region) %>%
summarize(avg_meat_consumption = mean(Meat_Consumption, na.rm = TRUE))
# View the summary
print(Consumption_summary)
```
### **Explanation of New Functionality**
1. **`separate()`**:
- Splits a single column into multiple columns based on a specified delimiter.
- Helps in preprocessing data when values in a single column encode multiple variables.
2. **`mutate()`**:
- Used here to simulate a composite column if the original data lacks such values.
- Adds flexibility to the example.