Measurement

Nearly half the data used for ad targeting is wrong

Truthset, which evaluates consumer database accuracy, has new findings. (Getty Images)
October 10, 2023 09:30 AM

A crucial data match that underpins everything from targeting ads to measuring TV audiences is right only 51% of the time, according to a new study by Truthset, a firm that evaluates consumer database accuracy.

The study explored matches between hashed (anonymized) email addresses and postal addresses, covering 15 data providers with 790 million unique hashed email addresses and 133 million postal addresses. Findings are to be presented at a Coalition for Innovative Media Measurement Summit in New York today.

Average accuracy among data providers ranged from 32% to 69%. Even within individual provider databases, groups of IDs had a wide range of accuracy, so even the worst providers have some high-quality matches, said Scott McKinley, CEO of Truthset.

Truthset evaluates data from providers by comparing it to a database of 20 million records from trusted, verified sources, the company said. Truthset has not publicly identified those trusted, verified sources. But its reports have been evaluated and used increasingly in recent years by groups including CIMM and the Advertising Research Foundation. Agencies and marketers such as Publicis Groupe’s Epsilon and Heineken have said they use Truthset to improve targeting accuracy and outcomes of campaigns.

Look back: Epsilon and Samba TV reach ad targeting deal 

Moves, multiple email addresses

One of the major drivers of inaccurate matches in the study appears to be the age of the data, McKinley said. Other possible factors driving disconnects are people who have moved frequently or have multiple postal addresses or multiple email addresses, McKinley said.

Size, however, didn’t matter, which is one of the more disturbing parts of the research, he said. There was little correlation between the size of the database and how accurate it was.

The study was done for CIMM, and in reviewing findings to members of the group this summer, McKinley said, “We got a lot of ‘This can’t be true. Oh my God, this is my life’s work. What have I been doing for 20 years?’”

But McKinley said when he separately presented findings to some major players in data and measurement, he was told their own research has shown similar levels of mismatches, an issue they’ve been trying to address.

Datasets

While it’s one of the most basic ID matches, the combination of hashed email and postal addresses underpins many more granular identifications of people and households, presence of children, age, gender, ethnicity, income and more, McKinley said. So getting the match right is crucial for a wide variety of audience measurement and ad targeting applications, he said.

“This is what everyone is relying on to determine who might be behind that IP address and with driving personification,” McKinley said.

TV measurement providers may lean on their own panel (in the case of Nielsen) or others such as TVision or HyphaMetrics to provide co-viewing estimates. But then they apply those estimates across big data sets that rely on matches of hashed email and postal addresses to help determine age, gender, ethnicity and more, McKinley said.

“There’s this sort of daisy chain of huge datasets that have to fall into place to make this stuff accurate,” he said. “Otherwise, you’re just guessing.”

It is possible, though, to find higher-quality segments within even below-average databases. One provider that averaged only 44% email-postal match accuracy overall had 80% to 90% accuracy across more than a quarter of its IDs, the study found.

Quality problems often stem from marketers, agencies or other buyers pushing to buy the largest possible data pools regardless of quality, McKinley said.

“It actually incentivizes all the data providers to include IDs and linkages they know are not good,” he said. But Truthset’s research indicates smaller, better-validated data sets can deliver return on investment of $1.56 per dollar spent vs. 67 cents for the least validated data sets.

Among possible solutions, McKinley said, would be attaching metadata to each ID indicating when the data was collected and match made, so that older IDs more likely to be wrong can be automatically purged from databases.

Media measurement blog

Tracking TV, social and digital updates
Staying current is easy with newsletters delivered straight to your inbox.