📈 Optimizing Customer Engagement in Online Retail

In this project, I analyzed various customer segments in an Online Retail dataset using Python. For this task, I employed cohort analysis, RFM Analysis, and k-means clustering.

🔍 Problem Statement

Identify the customer segments in the dataset and prescribe a course of business action for each segment.

Example of a segment might be the customers who bring the max profit and visit frequently.

📊 Data Overview

Source: The UCI Machine Learning Repository

This dataset contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a non-store online retail.

🖼️ Data Snapshot

🛠️ Data Exploration

🧹 Removed Null Values
✂️ Removed Duplicate Values
📍 Maximum transactions are from the UK

📅 Cohort Analysis

A cohort is a set of users who share similar characteristics over time. Cohort analysis groups users into mutually exclusive groups, and their behavior is measured over time.

There are three types of cohort analysis:

📅 Time Cohorts: Groups customers by their purchase behavior over time.
📦 Behavior Cohorts: Groups customers by the product or service they signed up for.
📏 Size Cohorts: Groups customers by their spending within a period.

For this project, I chose time cohorts. The steps are as follows:

🗓️ Identified cohort month for each customer (the month when the customer first transacted).

# First Transaction month (Cohort Month) for each customer  
df3['Cohort Month'] = df3.groupby('CustomerID')['InvoiceFormat'].transform(min)

🔢 Identified cohort index (difference between transaction month and cohort month) for each transaction.

# This function calculates difference between invoice format and cohort month  
def diff(d, x1, y1):  
    l = []  
    for i in range(len(d)):  
        xyear = d[x1][i].year  
        xmonth = d[x1][i].month  
        yyear = d[y1][i].year  
        ymonth = d[y1][i].month  
        diff = ((xyear - yyear) * 12) + (xmonth - ymonth) + 1  
        l.append(diff)  
    return l

📊 Grouped data by cohort month and cohort index.
📋 Developed a pivot table.

🔥 Developed a time cohort heatmap.

📌 Summary

💔 Roughly 10% of new joiners remain after a year. Retention is quite poor.
🎯 About 250 new people join each month, which indicates marketing efforts are satisfactory.

📈 RFM Analysis

RFM stands for Recency, Frequency, Monetary. It evaluates:

📅 Recency: How recently a customer transacted.
🔄 Frequency: How often they transacted.
💰 Monetary: How much they spent.

These scores help group customers for further analysis.

🗓️ Recency

The last transaction in the dataset was on 2011-12-09. Thus, recency was calculated using 2012-01-01 as the snapshot date.

🔢 Frequency

freq = df6.groupby(["CustomerID"])[["InvoiceNo"]].count()

💵 Monetary

df6["total"] = df6["Quantity"] * df6["UnitPrice"]  
money = df6.groupby(["CustomerID"])[["total"]].sum()

📚 Clustering

Before applying K-means clustering, I addressed data skewness.

📊 RFM Distribution

The data was left-skewed, so I used log transformation:
🔄 After Log Transformation

🤖 Implementing K-means

inertia = []  
for i in range(1, 11):  
    kmeans = KMeans(n_clusters=i)  
    kmeans.fit(scaled)  
    inertia.append(kmeans.inertia_)

📉 Elbow Curve

From the graph, I chose 3 clusters.

🏷️ Customer Segments

🌟 Best Customers: High-frequency, high-monetary value, recent transactions.
⚠️ At-Risk Customers: Long time since the last transaction, low spending.
💼 Average Customers: Regular transactions, moderate spending.

💡 Suggestions and Cluster Interpretation

⚠️ At Risk Customers:
- Suggestion: Analyze why they left; offer sales or discounts to win them back.
💼 Average Customers:
- Suggestion: Convert to best customers through discounts, excellent support, and targeted promotions.
🌟 Best Customers:
- Suggestion: Focus advertising and product launches on this group. Heavy discounts aren’t needed.

🚀 Future Work

Expand Analysis: Include new customer features, like demographics and lifetime value.
Dynamic Segmentation: Automate real-time segmentation for evolving behaviors.
Advanced Models: Explore hierarchical clustering or DBSCAN for complex relationships.
Predictive Insights: Use predictive analytics to forecast customer behavior and recommend proactive strategies.

This project sets a strong foundation for tailored customer engagement, paving the way for smarter, data-driven business decisions! 😊

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
customer_segmentation_analysis.ipynb		customer_segmentation_analysis.ipynb
online_retail_dataset.xlsx		online_retail_dataset.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 Optimizing Customer Engagement in Online Retail

🔍 Problem Statement

📊 Data Overview

🖼️ Data Snapshot

🛠️ Data Exploration

📅 Cohort Analysis

📌 Summary

📈 RFM Analysis

🗓️ Recency

🔢 Frequency

💵 Monetary

📚 Clustering

🤖 Implementing K-means

🏷️ Customer Segments

💡 Suggestions and Cluster Interpretation

🚀 Future Work

About

Releases

Packages

Languages

chaitanyasai-2021/Optimizing-Customer-Segmentation-in-Online-Retail

Folders and files

Latest commit

History

Repository files navigation

📈 Optimizing Customer Engagement in Online Retail

🔍 Problem Statement

📊 Data Overview

🖼️ Data Snapshot

🛠️ Data Exploration

📅 Cohort Analysis

📌 Summary

📈 RFM Analysis

🗓️ Recency

🔢 Frequency

💵 Monetary

📚 Clustering

🤖 Implementing K-means

🏷️ Customer Segments

💡 Suggestions and Cluster Interpretation

🚀 Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages