In the rapidly
Marketers and ad tech companies have increasingly turned good statistical science into pseudoscience for the sake of expediency.
In any optimization, you have an explore phase, where you're testing to figure out a winning strategy, and an exploit phase, where you use your winning strategy. Uunfortuntely, a pseudoscience approach gives the false sense that you've figured out the winning strategy. Two of the biggest pseudo-optimization offenders are randomization of creative by impression and something called multi-armed bandit algorithms. Both of these offenders favor instant gratification over long-term results.
Multi-armed bandit algorithms
Standard A/B testing allocates impressions equally to all variations during the explore phase. So for an A/B test, each gets 50% of impressions, until enough data comes in to achieve statistical significance, at which point all impressions are shifted to the declared winner.
However, most dynamic creative optimization (DCO) ad tech -- and some conversion-rate optimization products -- use a bandit algorithm for testing.
The bandit method begins optimizing a split test immediately. As variations perform better or worse, the test gradually rebalances impressions toward the winners, based on performance. In other words, exploration and exploitation occur simultaneously. Unfortunately, with impressions unevenly divided, it now takes much longer to reach statistical confidence in the results.
Nevertheless, marketers may find this approach appealing because the optimization happens in "real-time," before all the data has come in. It's as if a sports gambler changed bets after every point scored, thinking he was "optimizing" based on current information. A smart gambler would know better, and would wait to bet only after he had watched enough previous, completed games between the two teams. Don't be fooled by real-time; wait for enough information to know what works.
Short-term optimization can also hurt your long-term results. Take launching inside a trend, such as seasonality. This skews results in the wrong direction from what would have been the overall winner.
Data worsens when users receive multiple versions within a test
Randomizing which creative is served in real-time, based on each impression, sounds like statistical science. But because creative is randomly assigned for each impression, a person could see multiple variations in an A/B test over multiple impressions. In fact, with a multi-armed bandit situation, each user could increasingly see more of the winning creatives before conversion, regardless of what they had seen previously. When that happens, future optimizations based on a user's behavior become suspect. It's no longer valid to declare the user's activity to be based solely on the last ad they saw, because they've had exposure to more than one message.
In a well-executed A/B test, a user will only ever see one variation. The user can be tracked (on the desktop, at least) using cookies, and some companies now offer cross-device user tracking called "householding" to prevent test contamination. This people-based testing method is what a scientific, randomized control trial does in other fields.
On the surface, a lot of the real-time optimization going on in marketing seems accurate. It even has reporting to back up the supposed performance increases. But beneath that rosy exterior is a tangled web of statistics and data science being improperly applied. Marketers need to demand that "data-driven" be backed by data science.