Ryan Ashton Portfolio

Segmenting Customers with K-Means Clustering

Background

Not all customers spend equally and with the right data, we can determine if specific customer characteristics can be clustered and compared with their spending behaviours. This is a fictitious dataset I generated inspired by publicly available datasets out there. I have used this type of clustering for work – but I will not be showcasing work from clients or from my previous employers.

Performing Exploratory Data Analysis

To first understand our dataset, we can use exploratory data analysis (EDA) techniques to review what data we have available for clustering.

How does Gender play a role?

Let's determine whether there is a spending difference between Males and Females by age.

It appears that Females aged between 30-40 are the largest spenders, spending drops between 40 and 50, only to rise again later in life.

Spend vs. Income

Does income affect the level of spending?

There doesn't appear to be a perfectly linear relationship between income and spending.

Elbow Method - 2D

To determine how many different clusters we need to set for the algorithm, we utilise the 'Elbow-Method'. This method allows us to visualise at what point the rate of decrease in Within-Cluster Sum of Squares (WCSS) sharply diminishes. This point suggests a balance between minimising within-cluster variance and avoiding overfitting.

We use only 2 features here - spending score and annual income

We will use 4 clusters to seperate the relationship between spending score and annual income.

K-Means 2D

This presents 4 different customer types - in this case, we would need to customise our strategy by 4 different income tiers.

Elbow Method - 3D

We introduce a 3rd dimension here 'Age' to help determine how many clusters are required.

K-Means 3D

Because we are clustering on the basis of 3 dimensions, we can visualise it with a 3-dimensional plot!

Conclusion

Whilst these data visualisations are interesting to view, ultimately there will be a dataset produced with a column that indicates what cluster a customer belongs to. The dataset will then be put into a CRM or marketing system where the cluster category will help determine what marketing or sales strategy is applied in a customised way. For example, the blue dots in the K-Means 3D chart indicates, young, low-income, low spenders. This segment may not be suitable to encourage increased spending, but rather gain more customers of this type to increase the overall revenue (e.g. "tell a friend about our products!").