Segmenting Customers with K-Means Clustering
Background
Not all customers spend equally and with the right data, we can determine if specific customer characteristics can be clustered and compared with their spending behaviours. This is a fictitious dataset I generated inspired by publicly available datasets out there. I have used this type of clustering for work – but I will not be showcasing work from clients or from my previous employers.
Performing Exploratory Data Analysis
To first understand our dataset, we can use exploratory data analysis (EDA) techniques to review what data we have available for clustering.
How does Gender play a role?
Let's determine whether there is a spending difference between Males and Females by age.
It appears that Females aged between 30-40 are the largest spenders, spending drops between 40 and 50, only to rise again later in life.
Spend vs. Income
Does income affect the level of spending?
There doesn't appear to be a perfectly linear relationship between income and spending.
Elbow Method - 2D
To determine how many different clusters we need to set for the algorithm, we utilise the 'Elbow-Method'. This method allows us to visualise at what point the rate of decrease in Within-Cluster Sum of Squares (WCSS) sharply diminishes. This point suggests a balance between minimising within-cluster variance and avoiding overfitting.
We use only 2 features here - spending score and annual income
We will use 4 clusters to seperate the relationship between spending score and annual income.
K-Means 2D
This presents 4 different customer types - in this case, we would need to customise our strategy by 4 different income tiers.
Elbow Method - 3D
We introduce a 3rd dimension here 'Age' to help determine how many clusters are required.
K-Means 3D
Because we are clustering on the basis of 3 dimensions, we can visualise it with a 3-dimensional plot!