Saturday, July 16, 2011

X501 - Week V - Part I - Using Six Sigma to Model Human Behavior

The main idea behind web analytics is solving Y = f(x), where Y is our behavior, and x are all the inputs that lead us to behave the way the do.

In that regard, numerati and web analysts use many of the same tools six sigma practitioners, in particular black belts use as they go through the 5 step DMAIC process.  That is a key reason I agree with the 10/90 rule.  Analytics software is a lot like a vending machine.  You plug in numbers, and it will spit out statistics.  However, if you do not ask the right question, you will not collect the right data, and will not be able to use any output generated by the software.  It is not about data, it is about the right data, collected the right way, to answer the right questions.  You need people (six sigma black belts or analysts) to figure out the right questions, and build the data collection plans.  As Avinash Kaushik writes in Web Analytics: An Hour a Day: "How data is captured is perhaps the most critical part of an analyst's ability to process the data and find insights."

I will now show how web analytics parallel the six sigma DMAIC process

1. Define - What is the practical business question we are asking, what is the problem we want to solve?
That's the starting point of any scientific inquiry.  Since the purpose of analytics is solving Y=f(x), we need to define Y!  The question can be very specific, like "why do romantic-movie lovers click on rental car ads," or more universal, as in "what drives consumer purchasing decisions?"

2. Measure - You start collecting data on your website.  Kaushik writes: "Measuring how your website is delivering for your customers will help you focus your web analytics program and cause you to radically rethink the metrics that you need to measure to rate the performance of a website."  You also start mapping the process by which you currently use analytics. 

3.  Analyze - In a typical six sigma sense, this is where you look for the root causes of your output performance.  At this point, you are turning your practical business problem into a statistical problem.  You begin by using divergent thinking to identify all the potential inputs into your process.  There are millions of bits of information to sift through.  Tacoda "harvests 20 billion of these behavioral clues every day." You begin to collect all the x's that may make up your model.  Next, you go into convergent thinking, where you narrow down all your inputs to the critical few, foregoing the trivial many.  The skill comes in finding correlation between the inputs and the outputs, and starting to build theoretical models of behavior, based on these inputs, or, as Stephen Baker writes in The Numerati, "the key to this process is to find similarities and patterns."  One of the most intriguing points in the introduction to Numerati is how correlations can be built through indirect relationships and assumptions.  Using the movies (Netflix, Hulu, Youtube?) or the music we listen to (say Pandora, Spotify), one could make an assumption about our mood, and use our mood to figure out what products we may be interested in... There are so many opportunities for discovery.  A pioneering book on behavioral economics is Dan Ariely's "Predictably Irrational."  Ariely writes on his blog:
"Good news. There is a science called Behavioral Economics.  This attempts to understand people’s day to day decisions (where do I get my morning coffee?) and people’s big decisions (How much should I save for retirement?). Understanding HOW your users make decisions and WHY they make them is powerful. With this knowledge, companies can build more effective products, governments can create impactful policies and new ideas can gain faster traction."

4.  Improve - At this point, you have an hypothesized model, and you must evaluate its accuracy, or more correctly, its usefulness.  As statistician George Box said, "all models are wrong, some are useful."  The criteria for a good model are, as described by Baker, that "they [the variables] must interact with one another mathematically just the way they do in the real world."   Experimentation is the role of this stage in the development of models.

5.  Control - Once you have a statistical model that works, you need to convert that statistical knowledge, or solution, to a practical solution, and use it in your marketing strategy.  As you continue to refine this solution set, it is likely that the statistical equation you derive gets simplified.  "All things being equal, the simplest solutions tends to be the best one."  Will one model be enough?  Of course not, as each model is only valid within the dataset it was developed with, i.e. its inference space.  Baker alludes to this when he writes: "In the coming decade... we'll be modeled as workers, patients, soldiers, lovers, shoppers, and voters."

Baker goes even further, hinting at the idea of the "long tail," when he writes: "The trick is now to deliver to each of us the precise flavor and texture and color we want, at just the right price."  Analytics are key, and Six Sigma can help!

No comments:

Post a Comment