Weak and Strong User Interest -- Characterizing AB testing for the masses
Having racked up nearly a decade of online marketing related experience I realize that I've had to create some lingo which I use to help business and technical folks primarily with a soft landing into our efforts on the marketing front.
While I doubt the concepts being expressed are patently new, I find that I'm having considerable difficultly finding them characterized as such. Perhaps this is because I disdain traditional educational sources (having felt alienated by them) or perhaps its something else. Either way, finding a seat at the table sans traditional education credentials is the challenge of any self-professed pretender to the moniker "intellectual". Ultimately experience and ability win in the end, but getting the other monkeys to listen in the mean time can be maddening.
Whatever the case here goes...
In my involvement with marketing I've been called on to implement text blurbs and graphical creative that will be shown to users in the hopes of enticing them into a paid signup or upgrade, or sometimes just in sharing additional information with us in order to better serve them.
On the web this creative takes the form of a hyper link--the veritable backbone of the Internet since it's often attributed as it's most compelling value addition to the life of an information consumer.
From the perspective of tracking the performance of a hyper link campaign, you need a few key performance indicators. Those metrics are essentially page views, or impressions. Click-throughs. And, conversions. Some folks also lump revenue in with conversions as a metric.
Further complicating the effort is split testing, or AB or multi-variate testing if you prefer. This effort hypothesizes that one of "n" creatives or offers will appeal better than it's counterparts. To determine which it is, we need to normalize for other variables such as time of day, day of week, the weather, localization factors, etc. The simplest way to do this is to serve all of the tests at the same time, in equal proportions. This means you'll show the different tests to the same number of people and use their responses to gauge which was most favored.
The "terms" (phrases really) I think I'm coining are "weak user interest" and "strong user interest". And they play out after the split test or simple creative is shown to users.
I define "weak user interest" as a user voting in the form of a click-through. Essentially if you trust that a click means something, you could basically be saying that by clicking, a user is voting for a message or offer shown in a creative.
Further, "strong user interest" is a formerly "weakly interested" user completing the conversion process.
In a value pyramid, at the base you have viewing the creative--essentially worth nothing from the standpoint of measuring a specific user's interest. Above that is a click, the least expensive form of voting that we can observe--or weak user interest, since it requires a weak investment, possibly only that of curiosity. Above that is a completed sale, or conversion event--strong user interest in that the user survived the variables and resistance (such as requirement of a payment method) and emerged successfully.
Why bother with the difference?
Well, consider a scenario...
We wanted to test if a 10% discount performed as well as a 20% discount. So we showed half of our traffic over the course of a week the 10% offer and the other half the 20% offer.
Click-throughs on both were surprising similar. I say surprisingly because you would expect intuitively that 20% would be more attractive to a given user than 10%.
However, the conversions on the 20% split were lower than those on the 10%. To the intuitive observer, this should have been the reverse.
So what was happening?
Well, as it turns out, other variables in the conversion process actually worked against the 20% case resulting in fewer of them successfully completing a conversion.
Performing a post mortem on the test based solely on the conversion metric would have resulted in a marketing decision that was not representative of the user population against which it was tested. Instead, the billing variables introduced between the click-through and the successful conversion had soured the test.
By seeing that the populations were responding similarly in terms of voting, we investigated further and determined the problem. Therefore, measuring and reporting the entry and exit of steps ultimately culminating in a conversion help shed light on the whether the conversion metric is a true gauge of user interest, or rather a symptom of a process flaw. When the momentum begun with the expression of weak user interest isn't carried through into a state of strong user interest it can be manifesting a phenomenon outside of the marketing effort all together. Further if you choose to erroneously hold the marketing department accountable for flaws in your systems or billing support you'll likely never have the iterative experience result in a gelling of understanding of user behaviors and you'll never be able to fully optimize your marketing bang for the buck.
As such characterizing weak and strong user interest can aid in the process of learning user preferences and distinguishing them from other factors such as that of opacity illustrated in my example.
I'd love your informed feedback. Is this approach useful or have I slept through a pertinent lesson?
Labels: AB testing, characterizing post mortem results, multivariate techniques, split testing
