The promise of the data ecosystem is simple: Past consumer behavior can provide insights into future consumer purchase intent. Back in the early 1990s, researchers knew more than 80% of Saab drivers owned Macintosh computers, but they didn't know what that meant, or how best to leverage it. Today, corollaries like that one are brought to bear through technology every nanosecond behind the browser.
But, these corollaries can only be as actionable as the data that powers them, and good data is hard to come by . Good data isn't simply a matter of a vendor announcing they have insights into millions of unique visitors across a network of sites. That claim sounds impressive, but it tells you absolutely nothing about what you're actually buying, which is the actual data – or is it? Does the seller talk about who those customers are, what sites they visited, where they come from, or what they were actually shopping for? In some cases, the data provider might not even know. Buyers themselves only know two irrefutable facts: the name of their data provider, and what they're going to pay.
Primary data
The vast majority of data available is not gathered from primary sources, like brand or retail sites. It's modeled from relatively small data samples that are gathered from primary sources. It's an extrapolation – pure and simple. So, while a brand may be trying to go after in-market camera shoppers, the data you're buying for them may merely be based on a faulty assumption assembled from a miniscule group of actual camera shoppers. The providers take what little insight they can gather, determine a common thread, and then create a bucket that 's labeled "camera shopper."
I have no doubt that these providers can help advertisers access audiences of the sizes they claim, but I doubt the entire audience matches exactly what the advertiser is looking for. It's widely accepted that targeting has its limitations as a strategy, because the more precise the desired audience, the smaller the pool of potential customers. But when you think about the rampant inaccuracies in modeled data, it's clear that the pool might be even shallower than many expected. That should frighten advertisers spending to hit what they believe are their target consumers. The next time you hear about an anomaly indicating how 40% of "new moms" are male, check the source of the data being examined. Chances are, it is modeled data, not primary data.
Trust but verify
So, what can you do about this? Insisting on better, primary data is a clear step in the right direction. Want to target camera shoppers? Make sure that any data you're buying is transparent about its source. If a data provider claims that they have primary data from a certain brand, check that brand's own site with the Ghostery plug-in installed on your browser, and see which data companies have tags on that site. If your data company is not among them, then they can't be selling this data as a primary source. You can do this same sort of checking on retail sites too. You don't have to check more than a few to see whether or not your data source is a primary one.
Advertisers know about, but it's difficult to find, likely sending them back to a data vendor. It's time to start demanding transparency from the data providers, and walking out the door if you can't get it. Ask your buyer or data partner where they're getting their data. There is a significant difference between the "camera shopper" data that comes from Best Buy, Nikon's brand site, or a professional photography blog and the modeled data that comes from extrapolation. Run into a DSP and compare it side by side with a modeled audience, and let me know which one performs best. I'll save you the effort: campaigns run against primary data will outperform those run against modeled audience data every time.
Buying modeled data without any clear indication of where that data came from, let alone the opportunity to optimize against that data source, probably means you're throwing money away. You could very well be paying extra money on top of the normal CPM while achieving the same results you would have seen without the data.