Course project for ISYE 7406 as a part of Georgia Tech's Online Master of Science in Analytics.
Inside the project directory, create a new virtual environment with the name env
.
python -m venv ./env
In your preferred shell, run the appropriate activate script. For example, Windows Powershell:
.\env\Scripts\Activate.ps1
Install required dependencies from the requirements.txt
file:
pip install -r ./requirements.txt
The full dataset is included in this repository. If you wish to manually scrape and build it, these are the steps:
-
Run the
01-scrape.py
script insrc/data
:python ./src/data/01-scrape.py
This script may take around 10 minutes or longer, depending on your connection speed. This will output raw data files from ISO-NE to
data/raw
. You'll see the following output as the script progresses:Scraping forecast data... Getting content from web page... Extracting data file URLs... Downloading data files... Progress: 2555/2555 Done scraping forecast data. Scraping system status data... Downloading content from web page... Progress: 7/7 Done scraping system status data.
-
Run the
01-transform.py
script insrc/features
:python ./src/features/01-transform.py
Likewise, this script may take about 10 minutes or longer. The script will show its progress as it executes.
-
Run the
02-join.py
script insrc/features
:python ./src/features/02-join.py
-
Run the
03-clean.py
script insrc/features
:python ./src/features/03-clean.py
- Run the
04-modeling-prep.py
script insrc/features
:python ./src/features/04-modeling-prep.py
EDA and modeling were performed in Jupyter Notebooks using the data prepared by the Python scripts. Important notebooks are listed below:
eda.ipynb
- Exploratory Data Analysis: correlation, clustering, etc.logistic.ipynb
- Logistic Regression Model Buildingsvm.ipynb
- SVM Model Buildingknn.ipynb
- KNN Model Buildingensemble.ipynb
- Ensemble Model Building