This project focuses on analyzing and predicting sales in an e-commerce dataset using various machine learning models. The main objective is to explore sales patterns, engineer relevant features, and build predictive models to forecast future sales performance. The project covers data preprocessing, feature engineering, model building, and evaluation.
Public database credits obtained through Kaggle: https://www.kaggle.com/datasets/fahmidachowdhury/e-commerce-sales-analysis/data
- Exploratory Data Analysis (EDA): To understand the distribution and relationships within the data.
- Feature Engineering: To enhance the dataset with meaningful features like price_review_interaction, price_range, and seasonal patterns.
- Model Building: To train machine learning models and compare their performance using metrics like MAE (Mean Absolute Error) and MSE (Mean Squared Error).
- Prediction: To forecast sales for the upcoming year based on historical data.
📁 data/
: Contains the data used in the project.📁 notebooks/
: Jupyter Notebooks with analysis and modeling code.📁 results/
: Results and visualizations generated by the project.📄 README.md
: This file.
- Clone the repository:
git clone https://github.com/your-username/e-commerce-sales-analysis.git
- Install the dependencies:
pip install -r requirements.txt
- Run the analysis:
- Open the notebooks in the notebooks/ folder and run the cells to reproduce the analysis.
- Machine learning models were trained to predict future sales.
- Metrics such as MAE and MSE were used to evaluate model performance.
- Comparison charts between actual and predicted sales were generated for visualization.
- 🌲 Random Forest: MAE: 3833.85, MSE: 42,981,625.24
- 📉 Linear Regression: MAE: 98,874.36, MSE: 18,736,420,807.01
- ⚡ XGBoost Regressor: MAE: 9945.71, MSE: 180,299,575.99
- 📈 Gradient Boosting Regressor: MAE: 6429.05, MSE: 74,975,642.94
- Charts were created to compare actual sales with those predicted by each model. These charts help to understand how well each model is performing.
- Deep Learning Models: Explore neural networks to see if they can improve predictions.
- Expand Dataset: Apply this methodology to different datasets to broaden the scope.
- Deploy Models: Deploy the best-performing model using a web app for real-time predictions.
- Communities: Identifying communities of consumers with similar interests.
- Recommendation: Implementing a personalized product recommendation system based on buying patterns.
- Contributions are welcome! Feel free to open an issue or submit a pull request.