Python RFM Model for Customer Segmentation
RFM is a method used for segmenting and analyzing customer lifetime value.
It is used in direct-to-consumer marketing, retail stores, e-commerce companies & professional service companies to derive the engagement strength and the likelihood of customers staying with the company for the long term.
You’d be surprised to see the revenue generated by your returning customers. RFM Model could be your ally in the retention game.
What is RFM?
R -> Retention: The freshness of the customer activity, be it purchases or visits
F -> Frequency: The frequency of the customer transactions or visits
M -> Monetary: The intention of customer spend or purchasing power of customer
What are the benefits of the RFM Model?
It helps you to answer the following questions:
Identify top customers
Customers contribution to churn
Potential / Valuable customers / Customer Lifecycle value
Customers can be retained
Prediction of customer engagement campaigns
We will be using simple python programming to segment our customers.
We will group these parameters by:
Percentiles or quantiles
Pareto Rule — 80/20
Business Acumen
Let’s the percentile grouping for our approach & get on with some Python code.
Shopping data set link for reference: Customer Segmentation | Kaggle
# Import libraries import pandas as pd from datetime import timedelta import matplotlib.pyplot as plt import squarify import seaborn as sns
# Read dataset online = pd.read_csv('data.csv', encoding="ISO-8859–1") online['InvoiceDate'] = pd.to_datetime(online['InvoiceDate'])
# Drop NA values from online online.dropna()
# Create TotalSum column for online dataset online['TotalSum'] = online['Quantity'] * online['UnitPrice']
# Create snapshot date snapshot_date = online['InvoiceDate'].max() + timedelta(days=1) print(snapshot_date)
# Grouping by CustomerID data_process = online.groupby(['CustomerID']).agg({ 'InvoiceDate': lambda x: (snapshot_date - x.max()).days, 'InvoiceNo': 'count', 'TotalSum': 'sum' })
# Rename the columns data_process.rename(columns={ 'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalSum': 'MonetaryValue' }, inplace=True)
# Plot RFM distributions plt.figure(figsize=(12, 10))
# Plot distribution of R plt.subplot(3, 1, 1) sns.distplot(data_process['Recency'])
# Plot distribution of F plt.subplot(3, 1, 2) sns.distplot(data_process['Frequency'])
# Plot distribution of M plt.subplot(3, 1, 3) sns.distplot(data_process['MonetaryValue'])
# Show the plot plt.show()
# Calculate Recency (R) and Frequency (F) groups # Create labels for Recency and Frequency r_labels = range(4, 0, -1) f_labels = range(1, 5)
# Assign these labels to 4 equal percentile groups r_groups = pd.qcut(data_process['Recency'], q=4, labels=r_labels)
# Assign these labels to 4 equal percentile groups f_groups = pd.qcut(data_process['Frequency'], q=4, labels=f_labels)
# Create new columns R and F data_process = data_process.assign(R=r_groups.values, F=f_groups.values) data_process.head()
# Create labels for MonetaryValue m_labels = range(1, 5)
# Assign these labels to three equal percentile groups m_groups = pd.qcut(data_process['MonetaryValue'], q=4, labels=m_labels)
# Create new column M data_process = data_process.assign(M=m_groups.values)
# Concat RFM quartile values to create RFM Segments def join_rfm(x): return str(x['R']) + str(x['F']) + str(x['M'])
data_process['RFM_Segment_Concat'] = data_process.apply(join_rfm, axis=1) rfm = data_process rfm.head()
# Count num of unique segments rfm_count_unique = rfm.groupby('RFM_Segment_Concat')['RFM_Segment_Concat'].nunique() print(rfm_count_unique.sum())
# Calculate RFM_Score rfm['RFM_Score'] = rfm[['R', 'F', 'M']].sum(axis=1) print(rfm['RFM_Score'].head())
# Define rfm_level function def rfm_level(df): if df['RFM_Score'] >= 9: return 'Can\'t Loose Them' elif ((df['RFM_Score'] >= 8) and (df['RFM_Score'] < 9)): return 'Champions' elif ((df['RFM_Score'] >= 7
Potential — high potential to enter our loyal customer segments, you can decide to take an appropriate calls like giving them extra discounts or freebies or make them eligible for elite customer club!
Promising — showing promising signs with quantity and value of their purchase but it has been a while since they last bought some time from you. you can target them as per their wishlist and throw some limited-time discounts or more bundling offers
Needs Attention — made some initial purchases but have not seen them since. Was it a bad customer experience? Or product-market fit? Let’s spend some resources building our brand awareness with them.
Require Activation — Poorest performers of our RFM model. They might have gone with our competitors for now and will require a different activation strategy to win them back.
rfm_level_agg.columns = rfm_level_agg.columns.droplevel() rfm_level_agg.columns = ['RecencyMean', 'FrequencyMean', 'MonetaryMean', 'Count']
# Create our plot and resize it.
fig = plt.gcf() ax = fig.add_subplot()
fig.set_size_inches(16, 9)
squarify.plot(sizes=rfm_level_agg['Count'], label=['Can\'t Loose Them','Champions','Loyal','Needs Attention','Potential','Promising','Require Activation'], alpha=.6 )
plt.title("RFM Segments", fontsize=18, fontweight="bold")
plt.axis('off') plt.show()
RFM analysis shows anomalies that will tell you priorities as per customer segments and help you to form appropriate value offering strategies. You can also add extra coefficients as per your business, products, relative seasonal index, etc. and change as you need