October 10

Metric of the Month October: Factor Analysis

Hello and welcome to Molecular’s Metric of the Month. In this month’s post, we’ll be talking about factor analysis, a multivariate technique that is useful for seeing how variables are related to one another. We use a number of multivariate techniques at Molecular, most notably cluster analysis, a technique which identifies people who are similar to each other, and divides them into clusters, which form the basis for personas.
Both factors and clusters are created from quantitative data; at Molecular we generally use survey data, but multivariate analysis can be used with physical data (ranging from stars to animals to atoms) or virtual data (such as on-line behavior). Factor analysis allows us to reduce a large number of variables to a smaller number of factors. Understanding the factors underlying the observable variables can help you understand your customers, and to better meet their needs.

A factor is an unobservable, hypothetical variable that contributes to the variance of at least two of the observed variables. For example, we could construe a factor we could call Internet savvy and we would expect it to contribute to way respondents answer questions like “I read blogs to learn about new products” and “I have uploaded videos to the Internet” and “I would rather talk to my friends in person than in a chat room” (note that the contribution can be negative: someone who rates high on Internet savvy would probably agree less with the statement “I would rather talk to my friends in person than in a chat room”).
There are various computer programs which will perform factor analysis (in the old days, people did it by hand and it took days to analyze a data set with 100 respondents and 10 variables). Using one of a range of algorithms (it is the analyst’s job to determine which algorithm is most appropriate), the computer will find the factor that explains the greatest amount of variance in the data set. There will generally still be a fair amount of variability left in the data set, so the computer finds the factor that explains the greatest amount of the remaining variance, and so on. Unless two of the variables are exactly the same, to explain all the variability in a data set of N variable, you’ll need N factors. However, you can usually explain most of the variability with a fewer number of factors, and an analyst can usually identify between 2 and 5 factors that are meaningful, and describe the data set fair well.

  • Each variables is assigned a weight, called a factor loading, for each factor
  • Factor loadings range from -1 to 1; with 0 indicating that the variable doesn’t contribute to the factor at all
  • A factor loading between 0 and 1 means that high scores on the variable lead to increases in the factor, while a factor loading between 0 and -1 means that low scores on the variable lead to increases in the factor
  • The same variables load onto different factors, but their contribution varies across factors

Factor analysis uncovers patterns that exist within the data. Exploring the data allows us to identify correlations across variables, some of which we may not have anticipated. The analyst does not decide a priori how much each variable contributes to each factor, and in some cases there may be surprises.

Let’s take as an example a very brief survey on how people shop for shoes. (These numbers are not based on actual data; no conclusions about footwear shopping and Internet usage should be inferred from this article. ) So we create a very short survey of seven statements, and ask how much respondents agree with each one:

  1. I read blogs to learn about shoes
  2. I only buy shoes if I can try them on first
  3. I have recently visited shoe company’s websites
  4. I subscribe to one or more RSS feeds about fashion
  5. I would rather play games online than shop for shoes
  6. I have recently asked my friends about their shoe purchases
  7. I read user reviews of shoes online

A factor analysis might show three underlying factors, accounting for almost all of the variability.

  1. The first factor, accounting almost one half of the variability, is research rigor. All the statements load onto this factor at least moderately, with statements 1, 3, and 7 loading most heavily. Note that statement 5 has a negative loading, the more research rigor the respondent has, the less likely she is to prefer playing games to learning about shoes
  2. The second factor, accounting for about one third of the variability is Internet dependence. All the statements except 6 load fairly heavily on this factor, with statements 1 and 4 loading most heavily. Note that statement 2 has a negative loading – that is the more Internet dependent the respondent is, the less likely they are to say they would only buy shoes they could try on
  3. The third factor, accounting for about one fifth of the variability is readiness to buy. The statements which load most heavily on this factor are 3 and 6, though statements 1, 4, and 7 also have noticeable (though smaller) factor loadings.

The computer program merely identifies factor loadings, it doesn’t indicate what they mean. Human beings are necessary to interpret the data and provide names for the factors (just as human beings are necessary to interpret segmentation data and make sense of personas).

Once we have determined the factor loadings, we can compute a single factor score for each factor for each respondent. The score is computed by simply multiplying the respondent’s answer to the question by the factor loading, giving a weighted answer, and then summing the weighted answers. This is repeated for each factor.

Factor analysis works very well with personas. Rather than profiling the personas on a dizzying array of variables and trying to make sense of them, you can simply note that

  • Shoe-savvy Sue is high on research rigor and readiness to buy, and moderate on Internet dependence
  • In-and-out-of-the-store Ira is low on research rigor and Internet dependence, and high on readiness to buy
  • Mouse-happy Mary is high on research rigor and Internet dependence, and low on readiness to buy

You can use factors to create a space in which your personas reside. It makes it easy to see how they are similar and how they are different. For example, the above model makes it easy to see that:

  • Sue and Mary are similar on research rigor, and fairly similar on Internet dependence, but very different on readiness to buy; so while both will be amenable to online information, Sue is much more likely to actually make a purchase.
  • Sue and Ira are similar on readiness to buy, and somewhat similar on Internet dependence, but very different on research rigor. This suggests that the detailed online information which will appeal to Sue will probably cause Ira to navigate away; he would probably be better served by a simple interface that lets him access the information he needs quickly and make purchases as easily as possible.
  • Ira and Mary have very little in common at all.

Factor analysis provides a robust way to compare different types of customers and potential customers, and provides additional insight into how best to serve them.

More info on how Factor Analysis can Help Us Understand Digital Activity

Factor analysis has a long history in the social sciences and in marketing. Over the past few years, researchers have begun to apply the technique to data about the digital universe. Here are some examples that have application to the work we do at Molecular.

Further Reading

For those of you who’d like more details on this technique, here are some good overview articles (I am also happy to share my own expertise in the subject with anyone interested, either in person or via email)

  • Those of you who are more conceptually oriented might enjoy A. S. Kaplunovsky’s article “Why using factor analysis?” As you might guess from the title, Kaplunovsky is not a native English speaker, and his prose is sometimes difficult to navigate, however he gives a thorough history and a good range of applications
  • For a thorough mathematical explanation, complete with more Greek letters than you can shake a stick at, you can’t do better than the SAS manual – SAS does a great job of fully describing the mathematics underlying all its procedures, including factor analysis and the closely related principal components analysis. I have printouts on my desk, and the full set of manuals are available free online. Go to the SAS/STAT manual and search for Factor.

Add a comment

Browse posts by month

Browse by author

We're hiring!

Come take a look at careers with Molecular