This project provides a comprehensive analysis of a Brazilian e-commerce dataset. It includes exploratory data analysis (EDA) to uncover customer purchasing trends, demand patterns, and product-related insights. Additionally, an NLP-based sentiment analysis is performed on product reviews, coupled with machine learning modeling for predicting key outcomes.
- Source: Brazilian e-commerce dataset taken from Kaggle.
- Contents: Customer, product, and transaction data, including features like purchase time, product category, review scores, and text reviews.
- Analyze customer purchasing behaviors and identify demand patterns.
- Use NLP to evaluate customer sentiments through review analysis.
- Apply machine learning to predict aspects of customer behavior or product trends.
- Data Cleaning: Preprocessing to handle missing values, correct data types, and remove irrelevant information.
- Exploratory Data Analysis: Uncover patterns in customer behavior, demand peaks, and product popularity.
- NLP Analysis: Tokenization, sentiment analysis, and categorization of review text.
- Machine Learning Models: Model selection and evaluation for predictive analysis based on customer and transaction data.
- Customer demographics and behavior.
- Product demand patterns and sales peaks.
- Order metrics, including delivery times and customer satisfaction.
The sentiment analysis leverages Natural Language Processing (NLP) techniques to evaluate product reviews, helping identify trends in customer satisfaction. Steps include:
- Tokenization of review text.
- Sentiment score calculation.
- Analysis of review sentiment across product categories.
Predictive modeling is applied to understand and forecast key factors within the e-commerce data, enhancing our understanding of customer actions and product preferences.
- Identified customer segments with varying purchasing behaviors. Determined the impact of certain product attributes on customer satisfaction. Developed a review-based sentiment profile for key product categories.
To run this project, you’ll need to download the credit card fraud detection dataset:
- Download the dataset from Kaggle’s Brazilian E-Commerce Dataset page.
- Save the downloaded
csv
files in the same directory as the Jupyter Notebook (brazilian-e-commerce-eda-nlp-ml.ipynb
).
- Clone the repository.
git clone https://github.com/AnnaAnastasy/Brazil-E-Commerce.git
- Ensure Python 3.7+.
- Install required libraries listed in
requirements.txt
.
pip install -r requirements.txt
- Run the Notebook: Open and execute
brazilian-e-commerce-eda-nlp-ml.ipynb
in a Jupyter Notebook environment.
This project provides actionable insights into Brazilian e-commerce, revealing key trends in customer purchasing behavior, high-demand product categories, and peak sales periods. Sentiment analysis of customer reviews further highlighted areas for customer satisfaction improvements, such as delivery and product quality. Together, these insights can help e-commerce businesses optimize inventory, refine marketing strategies, and enhance the overall customer experience.