Skip to content

cedric-uden/hka-predicting_future_sales

Repository files navigation

Predicting Future Sales

📈 Introduction

The research project originates from a university task in my 5th semester of computer science studies. The goal of this research project was to utilize the "Future Sales" dataset from the Kaggle competition Predict Future Sales. This was used to predict the sales of the upcoming months from individual items across selected branches spread across the country.

🛠 Set-Up

Environment

The environment is managed with conda and was built on python 3.8.

The packages are managed using poetry.

To get started, install conda and create a new environment and activate it using following code:

conda create --name rp python=3.8
conda activate rp

Then, begin by installing poetry:

conda install poetry

Finally, navigate to the GitHub repository root folder, then execute following line to install all packages:

poetry install

Datasets

The datasets provided by the Kaggle competition need to be downloaded to their respective folders. Check the folder content for further information.

𝌞 Contents

Research notebooks

  • Predicting Future Sales: main notebook, contains the EDA and feature engineering

  • In Depth EDA: concise information from the EDA that would’ve inflated the main notebook

  • Linear Regression: evaluation and training of the linear models. Contains the submission to the competition

  • ARIMA: evaluation of a forecasting model

Data folder

  • info folder containing detailed information, provided by the Kaggle competition

  • technical folder containing the train, test and submission template, provided by the Kaggle competition

  • feature_engineering folder containing additional information, which was researched manually, used for feature engineering

  • arima folder containing data generated in the ARIMA analysis

  • out folder to store the serialized output from the feature engineering

  • submissions folder to store the final submission files to submit to the Kaggle competition

Modules

Located in the src folder, contains additional modules that were used across the different notebooks.

Contains default file to set default styles to maintain a uniform look across the project

Report

  • LaTeX source files and compiled PDF of the final report

Technicalities

⚠️ Troubleshooting

Issues on Apple Silicon (M1)

Expand to view patches applied on an Apple Silicon machine.

Some issues were encountered after setting up the conda environment with poetry. Following fixes were applied

First, check that the right conda environment is active

conda activate rp

Issues with XGBoost

Issue: XGBoost Library (libxgboost.dylib) could not be loaded.
conda install -c conda-forge py-xgboost
Issue: cannot import name 'CUDF_concat' from 'xgboost.compat'
brew install xgboost


Issues on Ubuntu

Expand to view patches applied on an Ubuntu machine.

First, check that the right conda environment is active

conda activate rp

Issues with connecting to the debugger using PyCharm (Professional) IDE

Had issues to get the debugger to run using PyCharm. Not entirely sure which exact package is missing. Rerunning following command did alleviate the issue. Still encountered some very strange bugs along the way after coming back from coding in a Mac environment after a prolonged period of time.

conda install jupyter

Additionally, I did reset all the settings / caches on an IDE and project level. IDE directories are mentioned here, the project level configs are found in the .idea/ folder in the project root.


About

Repo for my 5th semester research project

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages