-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathTidyverse create Daniel Brusche.Rmd
62 lines (40 loc) · 2.18 KB
/
Tidyverse create Daniel Brusche.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
title: "Tidyverse create"
output: html_document
date: "2024-11-17"
---
Pulling data from Kaggle, we are examining maternal health risks for pregnant women using the dataset available at Maternal Health Risk Data.
```{r setup, include=FALSE}
Maternal_risk_data <- read.csv("C:/Users/Daniel.Brusche/Downloads/Maternal Health Risk Data Set.csv")
# Load the tidyverse package
library(tidyverse)
```
Since I'm looking at how age and blood pressure affect the maternal risk during pregnancy, I want to convert the RiskLevel column into a factor. This allows R to treat it as a categorical variable, rather than a string. It also enables us to order the levels so that "low risk" is seen as different from "high risk," creating a proper level ordering. This conversion also improves visualization and analysis.
```{r cars}
# Convert RiskLevel to a factor
health_data <- Maternal_risk_data %>%
mutate(
RiskLevel = factor(RiskLevel, levels = c("low risk", "mid risk", "high risk"))
)
# Check the data types
str(health_data)
```
From the visualization, we see that younger individuals have a higher proportion of low risk compared to those over 40. When exploring the relationship between diastolic and systolic blood pressure, we observe that as both increase, the likelihood of being classified in the high-risk maternal level also rises.
```{r pressure1, echo=FALSE}
health_data %>%
ggplot(aes(x = Age, fill = RiskLevel)) +
geom_histogram(binwidth = 1, position = "dodge", color = "black", alpha = 0.7) +
labs(title = "Age Distribution by Risk Level", x = "Age", y = "Count") +
scale_fill_manual(values = c("low risk" = "green", "mid risk" = "yellow", "high risk" = "red")) +
theme_minimal()
```
```{r pressure, echo=FALSE}
# Plot Blood Pressure (Systolic vs Diastolic) by Risk Level
health_data %>%
ggplot(aes(x = SystolicBP, y = DiastolicBP, color = RiskLevel)) +
geom_point(size = 4) +
labs(title = "Systolic vs Diastolic Blood Pressure by Risk Level",
x = "Systolic BP", y = "Diastolic BP") +
scale_color_manual(values = c("low risk" = "green", "mid risk" = "yellow", "high risk" = "red")) +
theme_minimal()
```