Skip to content

Data science techniques are applied to analyze and predict e-commerce sales. The work encompasses thorough data exploration, advanced feature engineering, and careful model evaluation using methods like Random Forest and XGBoost. Future enhancements involve deep learning, dataset expansion, and model deployment. Contributions are warmly welcomed.

Notifications You must be signed in to change notification settings

3bento/ecommerce_sales_analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

🛒 E-commerce Sales Analysis

📋 Overview

This project focuses on analyzing and predicting sales in an e-commerce dataset using various machine learning models. The main objective is to explore sales patterns, engineer relevant features, and build predictive models to forecast future sales performance. The project covers data preprocessing, feature engineering, model building, and evaluation.

Public database credits obtained through Kaggle: https://www.kaggle.com/datasets/fahmidachowdhury/e-commerce-sales-analysis/data

🚀 Project Objectives

  • Exploratory Data Analysis (EDA): To understand the distribution and relationships within the data.
  • Feature Engineering: To enhance the dataset with meaningful features like price_review_interaction, price_range, and seasonal patterns.
  • Model Building: To train machine learning models and compare their performance using metrics like MAE (Mean Absolute Error) and MSE (Mean Squared Error).
  • Prediction: To forecast sales for the upcoming year based on historical data.

🗂️ Project Structure

  • 📁 data/: Contains the data used in the project.
  • 📁 notebooks/: Jupyter Notebooks with analysis and modeling code.
  • 📁 results/: Results and visualizations generated by the project.
  • 📄 README.md: This file.

🛠️ How to Use

  1. Clone the repository:
    git clone https://github.com/your-username/e-commerce-sales-analysis.git
    
  2. Install the dependencies:
    pip install -r requirements.txt
    
  3. Run the analysis:
  • Open the notebooks in the notebooks/ folder and run the cells to reproduce the analysis.

Results:

  • Machine learning models were trained to predict future sales.
  • Metrics such as MAE and MSE were used to evaluate model performance.
  • Comparison charts between actual and predicted sales were generated for visualization.

📊 Results and Visualizations

Evaluated Models

  • 🌲 Random Forest: MAE: 3833.85, MSE: 42,981,625.24
  • 📉 Linear Regression: MAE: 98,874.36, MSE: 18,736,420,807.01
  • ⚡ XGBoost Regressor: MAE: 9945.71, MSE: 180,299,575.99
  • 📈 Gradient Boosting Regressor: MAE: 6429.05, MSE: 74,975,642.94

Visualizations

  • Charts were created to compare actual sales with those predicted by each model. These charts help to understand how well each model is performing.

🔍 Ideas for future analysis and works

  • Deep Learning Models: Explore neural networks to see if they can improve predictions.
  • Expand Dataset: Apply this methodology to different datasets to broaden the scope.
  • Deploy Models: Deploy the best-performing model using a web app for real-time predictions.
  • Communities: Identifying communities of consumers with similar interests.
  • Recommendation: Implementing a personalized product recommendation system based on buying patterns.

🤝 Contributions

  • Contributions are welcome! Feel free to open an issue or submit a pull request.

About

Data science techniques are applied to analyze and predict e-commerce sales. The work encompasses thorough data exploration, advanced feature engineering, and careful model evaluation using methods like Random Forest and XGBoost. Future enhancements involve deep learning, dataset expansion, and model deployment. Contributions are warmly welcomed.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%