Datasets
While it’s one of the most basic ID matches, the combination of hashed email and postal addresses underpins many more granular identifications of people and households, presence of children, age, gender, ethnicity, income and more, McKinley said. So getting the match right is crucial for a wide variety of audience measurement and ad targeting applications, he said.
“This is what everyone is relying on to determine who might be behind that IP address and with driving personification,” McKinley said.
TV measurement providers may lean on their own panel (in the case of Nielsen) or others such as TVision or HyphaMetrics to provide co-viewing estimates. But then they apply those estimates across big data sets that rely on matches of hashed email and postal addresses to help determine age, gender, ethnicity and more, McKinley said.
“There’s this sort of daisy chain of huge datasets that have to fall into place to make this stuff accurate,” he said. “Otherwise, you’re just guessing.”
It is possible, though, to find higher-quality segments within even below-average databases. One provider that averaged only 44% email-postal match accuracy overall had 80% to 90% accuracy across more than a quarter of its IDs, the study found.
Quality problems often stem from marketers, agencies or other buyers pushing to buy the largest possible data pools regardless of quality, McKinley said.
“It actually incentivizes all the data providers to include IDs and linkages they know are not good,” he said. But Truthset’s research indicates smaller, better-validated data sets can deliver return on investment of $1.56 per dollar spent vs. 67 cents for the least validated data sets.
Among possible solutions, McKinley said, would be attaching metadata to each ID indicating when the data was collected and match made, so that older IDs more likely to be wrong can be automatically purged from databases.