Welcome to the Clustering Electricity Usage in Portugal project! This project focuses on clustering monthly electricity usage data across various regions of Portugal. By leveraging advanced data analytics and clustering techniques, I aim to uncover patterns and insights at the country level that can help in understanding regional electricity usage behaviors, identifying trends, and supporting decision-making for energy distribution and management.
This repository contains a complete workflow for analyzing electricity usage in Portugal. It covers everything from data preprocessing and visualization to the application of clustering algorithms and interpretation of results. Whether you are an energy analyst, data scientist, or simply interested in the dynamics of electricity consumption in Portugal, this project offers a thorough and insightful exploration of regional usage patterns. Additionally, I have attempted to correlate the identified clusters with the Portuguese industrial sectors and the electricity end-users in each region.
Energy distribution management; Clustering algorithms; Customers segmentation; Electricity consumption; Machine learning; Regional consumption patterns
This project has the following directory structure and the next sections attempt to explain them.
ADD TREE
- Monthly electricity data (2023): E-REDES (file: smart_electricity_meter_portugal_2023.csv)
- Portuguese postcode data: Data Science for Social Good Portugal (file: cod_post_freg_matched.csv)
- Industry sectors per region (2022): PORDATA (file: industry_type_portugal_2022.xlsx)
- Electricity end-users per region (2022): PORDATA (file: elec_energy_by_type_portugal_2022.xlsx)
- Mean air temperature in several Portuguese regions (2022): NASA
- Data preprocessing in R (file: data_preprocessing_electricity_portugal.R)
- Electricity measurements clustering using K-shape algorithm in R (file: k_shape_clustering_electricity.py)
- Data visualization using R (file: data_analysis_clustering_results.R) - Plots generation and analysis
-
county_consumption_normalized_preprocessed.csv:
-
county_consumption_preprocessed.csv:
-
county_industries_normalized_preprocessed.csv:
-
county_industries_preprocessed.csv:
-
electricity_preprocessed.csv:
-
postcodes_preprocessed.csv:
- clustered_electricity.csv: Electricity dataset with the time series and their clusters groups (1-5) and outliers (0)