August 12
Know your customers personally: A mathematical approach to segmentation
Back when I was a student at U Mass Amherst, I would frequent a used clothing store near my home. I spent a lot of time and money there and enjoyed chatting with the owner, a friendly woman named Melissa. One day, I walked into the store, and Melissa immediately reached under the counter and pulled out a pair of shoes and placed them in front of me. She said that as soon as she’d seen them, she’d known I’d want to buy them. And she was right. You can be right about your customers, if you collect and use the same type of information that Melissa regularly learned about the visitors to her shop.
Melissa was able to make such an accurate prediction about my purchasing behavior because she had gathered a lot of data about me: behavioral data (my past purchases, and time I’d spent looking at items that I eventually didn’t buy), demographic data (she knew I was a student, that I was in my 20s, that I was from New York), and psychographic data (she knew I prefer small stores like hers to department stores, that I prefer to shop alone, and that I make decisions quickly). She had combined these data into a complex model of me as a shopper. She probably hadn’t done so consciously, but she had done so effectively. This allowed her to know enough about me to offer me a product that met my needs.
The same type of behavioral, demographic, and psychographic data that Melissa used to make predictions about what items I’d be likely to purchase can be used for segmentation, a statistical method of clustering your customers into groups that are likely to respond to similar offers. Melissa gathered the data through conversation and direct observation of her customers’ behavior, a reasonable task for the hundred or so regular customers she had. But if you’re serving an on-line population that numbers in the thousand, you’ll need to gather data more systematically: through web analytics, customer records, and – if you can recruit participants – self-report surveys. You may also be able to overlay demographic data available from panel companies or other database resources.
All of these variables combine to make a complex, multidimensional space in which your customers reside. This space can easily span variables of different types such as:
- Behavioral data – For example, how often the customer buys a new pair of shoes
- Demographic data – For example, the customer’s age
- Psychographic data – For example, how much the customer is influenced by reading others’ reviews of products
Each customer would occupy a unique point in this multidimensional space; the more data available, the higher the dimensionality. For example, if the only information you have is about how often the customer buys new shoes, you’d be looking at one dimensional data (note, all data here are fictitious, no conclusions should be drawn about real shoe shoppers based on these data):

I’ve indicated one customer in red. She’s bought three pairs of shoes in the past three months; we’ll track this customer as we add further dimensions.
If we add the information about the customer’s age, all the points along the number-of-shoes-bought line will fan out to fill a two-dimensional space:

Our red customer, we see, is 25 years old (and, as we already know, she’s bought three pairs of shoes).
We can add a third dimension to our picture, by including the rating on the influence of reviews. This ranges only from 1 to 5, so we increase the distance between the numbers.

Unfortunately, it’s a little difficult to read a three-dimensional graph on a two-dimensional computer screen. Ideally,the third axis and each of the marks should be coming out toward you, the viewer, but we can’t show that, so you’ll just have to take my word that the red mark is out at 4, because she reports being moderately influenced by others’ reviews of products.
We can’t even begin to draw the additional dimensions that would be added if we included more variables. However, the computer is capable of uniquely identifying each point in the multidimensional space. And the computer can see if there are clusters, groups of points that are similar to each other, and different from points not in the cluster. For example, in the data set above, we have
- a group of people in their teens and 20s who buy a lot of shoes and (though you can’t see this in the two-dimensional picture) say they are not influenced by ratings
- a group of people in their 20s and 30s who buy a lot of shoes and are moderately influenced by ratings, and
- a group of people in their 30s, 40s and 50s who buy relatively few shoes, and are highly influenced by ratings
- a group of customers in their 60s and 70s who rarely buy shoes, and who are not particularly influenced by ratings
Our red dot would be in the second group. If we have additional information about our customers, we can either use it to make a more complex space for the segmentation, or we can use it to profile the different segments. For example, we might find out that the people in the first group are more influenced by store displays, or engaging web sites and have a stronger need to try out the shoes than people in the second group. This would be useful information for anyone designing a web site intended to sell shoes to teens.
Just as Melissa used her knowledge about me to present me with offers that I’d respond to (and to share jokes and stories she’d knew would interest me), a segmentation model can provide you with knowledge about your customers. You can use this knowledge to create a web presence (or a series of web presences) that speak to your customers in a more personal way, increasing their connection to your brand and their value as customers.
Molecular Voices » Metric of the Month October: Factor Analysis said on October 10th, 2008