The research project originates from a university task in my 5th semester of computer science studies. The goal of this research project was to utilize the "Future Sales" dataset from the Kaggle competition Predict Future Sales. This was used to predict the sales of the upcoming months from individual items across selected branches spread across the country.
The environment is managed with conda and was built on python 3.8.
The packages are managed using poetry.
To get started, install conda and create a new environment and activate it using following code:
conda create --name rp python=3.8
conda activate rp
Then, begin by installing poetry
:
conda install poetry
Finally, navigate to the GitHub repository root folder, then execute following line to install all packages:
poetry install
The datasets provided by the Kaggle competition need to be downloaded to their respective folders. Check the folder content for further information.
-
Predicting Future Sales: main notebook, contains the EDA and feature engineering
-
In Depth EDA: concise information from the EDA that would’ve inflated the main notebook
-
Linear Regression: evaluation and training of the linear models. Contains the submission to the competition
-
ARIMA: evaluation of a forecasting model
-
info folder containing detailed information, provided by the Kaggle competition
-
technical folder containing the train, test and submission template, provided by the Kaggle competition
-
feature_engineering folder containing additional information, which was researched manually, used for feature engineering
-
arima folder containing data generated in the ARIMA analysis
-
out folder to store the serialized output from the feature engineering
-
submissions folder to store the final submission files to submit to the Kaggle competition
Located in the src folder, contains additional modules that were used across the different notebooks.
Contains default file to set default styles to maintain a uniform look across the project
-
ConvertingDateValues.py operations to handle the datetime datatype within a DataFrame
-
ParseDataframe.py operations that parses and extracts information within a DataFrame
-
ListActions.py special operations on list elements
-
FunctionExecTime.py function that executes and prints the execution time information from a requested function
-
.gitignore
managing the repository -
poetry.lock
&pyproject.toml
to manage the environment and packages for the project
Expand to view patches applied on an Apple Silicon machine.
Some issues were encountered after setting up the conda environment with poetry. Following fixes were applied
First, check that the right conda environment is active
conda activate rp
Expand to view patches applied on an Ubuntu machine.
First, check that the right conda environment is active
conda activate rp
Had issues to get the debugger to run using PyCharm. Not entirely sure which exact package is missing. Rerunning following command did alleviate the issue. Still encountered some very strange bugs along the way after coming back from coding in a Mac environment after a prolonged period of time.
conda install jupyter
Additionally, I did reset all the settings / caches on an IDE and project level. IDE directories are mentioned here,
the project level configs are found in the .idea/
folder in the project root.