Customer Segmentation with RFM Analysis

Mehmet Nazlıay
5 min readMar 24, 2021

Hello Everyone,

I want to talk about Customer Segmentation with RFM Scores.

What is the customer segmentation?

Customer segmentation is simply the grouping together of customers based on similarities they share with respect to any dimensions you deem relevant to your business. Dimensions could include customer needs, channel preferences, interest in specific product features, customer profitability — the list goes on.[1]

What are the business goals of segmentation?

  • Create segmented ads & marketing communications
  • Develop differentiated customer servicing
  • Target prospects with the greatest profit potential
  • Optimize your sales-channel mix

RFM Analysis

RFM is a method used for analyzing customer value. It is commonly used in database marketing and direct marketing and has received particular attention in retail and professional services industries.[2]

RFM stands for the three dimensions:

  • Recency — How recently did the customer purchase?
  • Frequency — How often do they purchase?
  • Monetary Value — How much do they spend?

The Dataset

We will analysis the Online Retail II Dataset.

For the analysis, I will use a retail customer data of 5875 customers having a time period of 2 years.

Understand the dataset:

Variables:

InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter ‘c’, it indicates a cancellation.
StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
Description: Product (item) name. Nominal.
Quantity: The quantities of each product (item) per transaction. Numeric.
InvoiceDate: Invoice date and time. Numeric. The day and time when a transaction was generated.
UnitPrice: Unit price. Numeric. Product price per unit in sterling (£).
CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
Country: Country name. Nominal. The name of the country where a customer resides.

Let’s start with import data.

import numpy as np
import pandas as pd
df=pd.read_csv(“../input/online-retail-ii-uci/online_retail_II.csv”)
df.head()
df.shape

Out: (1067371, 8)

df[“TotalPrice”] = df[“Quantity”]*df[“Price”]

The sum of the ‘TotalPrice’ values, which we calculated for each customer and added as a new column.

Cleaning the Dataset:

We will drop null data in Customer ID.

df.dropna(subset=[“Customer ID”],inplace=True)
df.shape

Out: (824364, 8)

  • We will drop orders which were ordered by Test accounts.
#We find test account ‘Customer ID’
test_account_id=df[df[“StockCode”].str.contains(“TEST”,regex=False)].iloc[:,6].astype(“int”)
#We find orders which was ordered by test 'Customer ID'
test_account_id_list=[]
for i in range(len(test_account_id)):
test_account_id_list.append(test_account_id[i:i+1].item())
test_account_id_set=set(test_account_id_list)
test_account_id_list=list(test_account_id_set)
test_account_id_list
#We drop Test ordersfor i in test_account_id_list:
delete_invoice=df[df[“Customer ID”]==i].index
df.drop(delete_invoice,inplace=True)
  • We will drop orders which was canceled.
order_cancel=df[df[“Invoice”].str.contains(“C”,regex=False)].indexdf.drop(order_cancel,inplace=True)
  • We will drop postage.
post=df[df[“StockCode”].str.contains(“POST”,regex=False)].index
df.drop(post,inplace=True)

Calculate RFM Scores

Recency:

While we find recency score, we use today date and last purchasing date. Because ‘Invoicedate’ is old in the dataset, we will use the same day of the maximum day of the dataset as today’s date.

We will find customers’ last purchasing ‘InvoiceDate’. Then we will subtract the last purchasing date from today.

#Recency
recency_df=df.groupby(“Customer ID”).agg({“InvoiceDate”: lambda x: (today_date — x.max()).days})
recency_df.rename(columns={“InvoiceDate”: “Recency”}, inplace = True)
recency_df.head()

Frequency:

We will find the number of purchases of the customers. This is our frequency value. We will group Invoices for every customer.

temp_df = df.groupby([“Customer ID”,”Invoice”]).agg({“Invoice”:”count”})
temp_df.head()
freq_df = temp_df.groupby(“Customer ID”).agg({“Invoice”:”sum”})
freq_df.rename(columns={“Invoice”: “Frequency”}, inplace = True)
freq_df.head()

We grouped by invoice, and then we found number of invoices which every customers have. Then we assigned this to new data frame “freq_df”.

Monetary:

Monetary is the total spending of the customer, we will group by ‘Customer ID’, and then sum of their ‘TotalPrice’.

monetary_df = df.groupby(“Customer ID”).agg({“TotalPrice”:”sum”})
monetary_df.rename(columns={“TotalPrice”: “Monetary”}, inplace = True)
monetary_df.head()

Score Values

We will assign receny_df, freq_df, monetary_df to a new data frame called rfm.

rfm = pd.concat([recency_df, freq_df, monetary_df], axis=1)
rfm.head()

We will score them from 1 to 5. The best score is 5.

rfm[“RecencyScore”] = pd.qcut(rfm[‘Recency’], 5, labels = [5, 4, 3, 2, 1])
rfm[“FrequencyScore”] = pd.qcut(rfm[‘Frequency’], 5, labels = [1, 2, 3, 4, 5])
rfm[“MonetaryScore”] = pd.qcut(rfm[‘Monetary’], 5, labels = [1, 2, 3, 4, 5])
rfm[“RFM_Score”]= rfm[‘RecencyScore’].astype(str) + rfm[‘FrequencyScore’].astype(str) + rfm[‘MonetaryScore’].astype(str)
rfm.head()

For RFM_Score a new column was added. The best customers, Champions, have ‘555’ in RFM_Score.

The Customers Segments

We will segment the customers using Frequency Score and Receny Score with regex (Regular Expression).

seg_map = {
r’[1–2][1–2]’: ‘Hibernating’,
r’[1–2][3–4]’: ‘At Risk’,
r’[1–2]5': ‘Can’t Loose’,
r’3[1–2]’: ‘About to Sleep’,
r’33': ‘Need Attention’,
r’[3–4][4–5]’: ‘Loyal Customers’,
r’41': ‘Promising’,
r’51': ‘New Customers’,
r’[4–5][2–3]’: ‘Potential Loyalists’,
r’5[4–5]’: ‘Champions’}
rfm[‘Segment’] = rfm[‘RecencyScore’].astype(str) + rfm[‘FrequencyScore’].astype(str)
rfm[‘Segment’] = rfm[‘Segment’].replace(seg_map, regex=True)
rfm.head()

We will see our customer counts segmentation on graphs.

import squarify
import matplotlib.pyplot as plt
import seaborn as sns
rfm=rfm.reset_index()
sq1=rfm.groupby(“Segment”)[“Customer ID”].nunique().sort_values(ascending=False).reset_index()
plt.figure(figsize=(14,8))
sq1.drop([0],inplace=True)
sns.barplot(data=sq1,x=”Segment”,y=”Customer ID”,palette=”Greens_d”);
champions=pd.DataFrame()
champions[“Customer_ID”]=rfm[rfm[“Segment”]==”Champions”].index
champions.to_csv(“champions.csv”)

We can convert champions data frames to csv.

Comments

rfm[[“Segment”, “Recency”,”Frequency”,”Monetary”]].groupby(“Segment”).agg([“mean”,”count”])

New Customers:

We have 81 customers in the New Customer segment. They have not shopped for an average of 10.54 days. They shopped an average of 8.82 times. They have spent an average of 705 £.

We can send gift coupons to them to attract their attention.

At Risk:

We have 800 customers in At Risk. They have not shopped for an average of 394.45 days. They shopped an average of 74.09 times. They have spent an average of 1306 £.

We can make a special campaign for them, and then let them know by email or message.

Need Attention:

We have 106 customers in Need Attention. They have not shopped for an average of 106.84 days. They shopped an average of 55.44 times. They have spent an average of 1400 £.

In this segment customers have approached the At Risk segment. We can discount products which they often bought.

You can view here: https://www.kaggle.com/mhmtnzly/rfm-customer-segmentation

References

  1. http://www.mindofmarketing.net/2012/05/what-is-customer-segmentation/
  2. Vikipedia:

https://en.wikipedia.org/wiki/RFM_(market_research)

--

--