-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathherbert_catchment_heatmap.qmd
139 lines (104 loc) · 6.66 KB
/
herbert_catchment_heatmap.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: "Water Hot Plot: Fine-Scale Water Quality Visualization with Plotnine"
subtitle: "2024 Plotnine Competition Submission"
author: "Cameron Roberts"
format:
html:
code-fold: true
theme: cosmo
jupyter: python3
execute:
echo: true
warning: false
---
## Introduction
I am pleased to submit this Quarto document for the 2024 `Plotnine` competition. This visualization demonstrates the capabilities of `Plotnine` in creating compelling and informative data representations.
## Visualization Approach
I've adapted a successful visualization concept I had originally developed in R using `ggplot2`, translating and enhancing it with `Plotnine`. While I initially explored using a ridgeline plot for data concentration visualization, I discovered that a heatmap implementation in `Plotnine` yielded surprisingly effective results, surpassing my expectations from earlier attempts.
## Data Source and Description
The visualization is based on fine-scale water quality monitoring data from the Queensland Government Department of Environment, Science and Innovation (**DESI**). Specifically, it utilizes data from the Lower Herbert catchment near Ingham, Queensland, Australia.
**Key data characteristics:**
- Source: Water Quality and Investigations team, DESI
- Coverage: 17 in-situ nitrate (as N) sensors
- Sensor read frequency: 15-minute intervals
- Date Range: 2020 - 2024
- Aggregation: Monthly, across all monitoring sites
For more information on fine-scale water quality monitoring in Queensland, please refer to the [Queensland Governments Publications](https://www.publications.qld.gov.au/dataset/fine-scale-water-quality-monitoring-in-lower-burdekin-catchment).
## Data Import and Preparation
Data comes from DESI's Water Quality and Investigations monitoring network for the lower Herbert catchment near Ingham, Queensland. The network includes 17 in-situ nitrate sensors reading every 15 minutes since 2020. Due to remote deployment challenges, sensor uptime varies. Data from 2020-2024 across all sites was extracted and provides sufficient coverage to attain monthly data summaries for each monitoring site.
```{python}
import pandas as pd
from datetime import datetime
from plotnine import *
#import concentration data
df = pd.read_csv('data/catchment_NNO3_all.csv')
df.head()
```
Site-specific metadata provides context for the site locations and the site 'types' to help understand drivers for the installations.
- Site types are categorized as:
- **Reference**: Situated in the upper catchment above contaminant sources, intended to provide a natural baseline.
- **Impact**: Located directly downstream of land-use types known to contribute to nutrient to the waterways (ie. agriculture/urban land use)
- **End-of-system**: Located at the most seaward practical monitoring point along the river or creek. The
intent is to capture the maximum extent of upstream land use while avoiding the complexities of monitoring
in the estuary.
```{python}
#import site information
sitelist = pd.read_csv('data/sitelist.csv')
sitelist.head()
```
## Data joining and Preprocessing
In order to work with these data in `Plotnine` we must join the data frames to include the nessecary metadata for the visualisations we are wanting. The result is one dataframe that will allow for a simpler integration with the graphics.
```{python}
# Subset to include only 'GSnum' and 'Value' columns
subset_df = df[['GSnum', 'Value']]
# Group by 'GSnum' and calculate the median for the 'Value' and 'Quality' columns
median_summary = subset_df.groupby('GSnum').median().reset_index()
# Rename the 'Value' column to 'median'
median_summary.rename(columns={'Value': 'median'}, inplace=True)
# Perform a left join of the original DataFrame with the median summary DataFrame
merged_df = pd.merge(df, median_summary, on='GSnum', how='left')
merged_df = pd.merge(merged_df, sitelist[['GSnum', 'Site type', 'Site code']], on='GSnum', how='left')
# Concatenate columns with separator
merged_df['combined_col'] = merged_df['Site type'] + ' // ' + merged_df['Site code']
```
## Date Parsing and Monthly Aggregation
The data needs to be aggregated by month. A seperate column has been created to allow for '*prettier*' month names for the plot.
```{python}
# Parse the 'Date' column from the timestamp
merged_df['ts'] = pd.to_datetime(merged_df['ts'])
# Group by month and calculate median of 'Value' column
monthly_median = merged_df.groupby([merged_df['ts'].dt.month, 'name', 'combined_col'])['Value'].max()
# Convert the Series to a DataFrame
df = monthly_median.reset_index(name='value')
# Create a list of month names
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec']
# Map the month numbers to month names
df['month_name'] = df['ts'].apply(lambda x: month_names[x-1])
df
```
# Visualization
```{python}
(
ggplot(df, aes(x='ts', y='combined_col', fill='value')) +
geom_tile(aes(width=0.95, height=0.95)) +
theme_dark(base_size=11, base_family=None) +
geom_text(aes(label="value.round(2).astype(str)"), size=6, color='#DDDEDF', show_legend=False) +
scale_fill_cmap(cmap_name="inferno", name="Nitrate-N Concentration (mg/L)") +
#scale_fill_gradient(cmap_name="viridis", low="#3BC4A4", high="#CC334E") + # Adjust low and high colors as needed
scale_x_continuous(breaks=range(1, 13), labels=month_names, name="") +
scale_y_discrete(name="Site type and ID") + # Change y-axis label
theme(axis_text_x=element_text(angle=30, hjust=1.5),
legend_position='top', # Move legend to the top
legend_direction='horizontal') + # Horizontal orientation of legend
labs(title="Seasonal and Spatial Variations in Maximum Monthly \nNitrate-N Concentration (2020-2024)",
subtitle="A Heatmap Analysis of Queensland's Lower Herbert Catchment")
)
```
This graph displays the maximum monthly Nitrate-N concentration (mg/L) for various sites over 4 years of monitoring (2020 - 2024). Here are some key observations:
- The **Impact** and **End of System** sites display *higher* nitrate concentrations.
- **Reference** sites generally maintain **lower** concentrations throughout the year.
- Highest concentrations are seen between November - January (coinciding with the wet season onset in the region)
- Some sites show very high concentrations (above 4 mg/L) in certain months.
- The highest recorded concentration appears to be **6.95 mg/L** at the End of System site **CCC** in December.
- Some sites show clear seasonal patterns, while others have more sporadic high concentration months.
- Periods of elevated concentration outside of the wet season may indicate specific impacts from fertiliser re-application and unseasonal or unexpected rainfall driven runoff events.