Skip to content

This project analyzes customer behavior in online retail using cohort analysis, **Recency, Frequency, Monetary (RFM)** metrics, and K-means clustering to segment customers. It identifies key groups like Best, At-Risk, and Average Customers, offering strategies to enhance engagement and drive loyalty.

Notifications You must be signed in to change notification settings

chaitanyasai-2021/Optimizing-Customer-Segmentation-in-Online-Retail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

📈 Optimizing Customer Engagement in Online Retail

In this project, I analyzed various customer segments in an Online Retail dataset using Python. For this task, I employed cohort analysis, RFM Analysis, and k-means clustering.

🔍 Problem Statement

Identify the customer segments in the dataset and prescribe a course of business action for each segment.

Example of a segment might be the customers who bring the max profit and visit frequently.

📊 Data Overview

Source: The UCI Machine Learning Repository

This dataset contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a non-store online retail.

🖼️ Data Snapshot

Screenshot (516)

🛠️ Data Exploration

  • 🧹 Removed Null Values
  • ✂️ Removed Duplicate Values
  • 📍 Maximum transactions are from the UK

Screenshot (517)

📅 Cohort Analysis

A cohort is a set of users who share similar characteristics over time. Cohort analysis groups users into mutually exclusive groups, and their behavior is measured over time.

There are three types of cohort analysis:

  • 📅 Time Cohorts: Groups customers by their purchase behavior over time.
  • 📦 Behavior Cohorts: Groups customers by the product or service they signed up for.
  • 📏 Size Cohorts: Groups customers by their spending within a period.

For this project, I chose time cohorts. The steps are as follows:

  1. 🗓️ Identified cohort month for each customer (the month when the customer first transacted).

    # First Transaction month (Cohort Month) for each customer  
    df3['Cohort Month'] = df3.groupby('CustomerID')['InvoiceFormat'].transform(min)  
  2. 🔢 Identified cohort index (difference between transaction month and cohort month) for each transaction.

    # This function calculates difference between invoice format and cohort month  
    def diff(d, x1, y1):  
        l = []  
        for i in range(len(d)):  
            xyear = d[x1][i].year  
            xmonth = d[x1][i].month  
            yyear = d[y1][i].year  
            ymonth = d[y1][i].month  
            diff = ((xyear - yyear) * 12) + (xmonth - ymonth) + 1  
            l.append(diff)  
        return l  
  3. 📊 Grouped data by cohort month and cohort index.

  4. 📋 Developed a pivot table.

Screenshot (518)

  1. 🔥 Developed a time cohort heatmap.

Screenshot (519)

📌 Summary

  1. 💔 Roughly 10% of new joiners remain after a year. Retention is quite poor.
  2. 🎯 About 250 new people join each month, which indicates marketing efforts are satisfactory.

📈 RFM Analysis

RFM stands for Recency, Frequency, Monetary. It evaluates:

  • 📅 Recency: How recently a customer transacted.
  • 🔄 Frequency: How often they transacted.
  • 💰 Monetary: How much they spent.

These scores help group customers for further analysis.

🗓️ Recency

  • The last transaction in the dataset was on 2011-12-09. Thus, recency was calculated using 2012-01-01 as the snapshot date.

🔢 Frequency

freq = df6.groupby(["CustomerID"])[["InvoiceNo"]].count()  

💵 Monetary

df6["total"] = df6["Quantity"] * df6["UnitPrice"]  
money = df6.groupby(["CustomerID"])[["total"]].sum()  

Screenshot (520)


📚 Clustering

Before applying K-means clustering, I addressed data skewness.

  • 📊 RFM Distribution
    Screenshot (521)

    The data was left-skewed, so I used log transformation:

  • 🔄 After Log Transformation
    Screenshot (522)

🤖 Implementing K-means

inertia = []  
for i in range(1, 11):  
    kmeans = KMeans(n_clusters=i)  
    kmeans.fit(scaled)  
    inertia.append(kmeans.inertia_)  
  • 📉 Elbow Curve
    Screenshot (523)

From the graph, I chose 3 clusters.

🏷️ Customer Segments

  1. 🌟 Best Customers: High-frequency, high-monetary value, recent transactions.
  2. ⚠️ At-Risk Customers: Long time since the last transaction, low spending.
  3. 💼 Average Customers: Regular transactions, moderate spending.

Screenshot (526)


💡 Suggestions and Cluster Interpretation

  1. ⚠️ At Risk Customers:

    • Suggestion: Analyze why they left; offer sales or discounts to win them back.
  2. 💼 Average Customers:

    • Suggestion: Convert to best customers through discounts, excellent support, and targeted promotions.
  3. 🌟 Best Customers:

    • Suggestion: Focus advertising and product launches on this group. Heavy discounts aren’t needed.

Screenshot (528)


🚀 Future Work

  • Expand Analysis: Include new customer features, like demographics and lifetime value.
  • Dynamic Segmentation: Automate real-time segmentation for evolving behaviors.
  • Advanced Models: Explore hierarchical clustering or DBSCAN for complex relationships.
  • Predictive Insights: Use predictive analytics to forecast customer behavior and recommend proactive strategies.

This project sets a strong foundation for tailored customer engagement, paving the way for smarter, data-driven business decisions! 😊

About

This project analyzes customer behavior in online retail using cohort analysis, **Recency, Frequency, Monetary (RFM)** metrics, and K-means clustering to segment customers. It identifies key groups like Best, At-Risk, and Average Customers, offering strategies to enhance engagement and drive loyalty.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published