Getting representative ‘big data’
VideoAmp does have processes in place to ensure its big data is representative, said Josh Chasin, the company’s chief measurability officer. It uses data from credit reporting and third-party data firm Experian to match with its households so it can weight data to help ensure proper ethnic representation.
But there’s an art to weighting that not all companies use, Chasin said. “We’re not weighting to bring households in line with census data. We’re weighting to bring people in households into line. One of the reasons that matters, for example, is that Hispanics tend to live in larger households than average.”
VideoAmp also recognizes that set-top box and smart TV data sets have different biases, he said. Because the company co-mingles data from both, it can see that and use the differences to get “a more diverse profile of consumers than you get with one separately.”
While set-top-box data clearly doesn’t capture viewing by households that watch only over-the-air, smart TV data can. That’s where VideoAmp turns to ensure it has the proper mix of OTA households, Chasin said. The company also subscribes to the Advertising Research Foundation’s Universe Study of Device and Account Sharing (DASH study) to ensure it has the proper mix of OTA households in its sample.
Comscore, like VideoAmp, uses Experian data as part of its effort to ensure its measurement is ethnically representative as well as to adjust for such factors as presence of children and household income, David Algranati, the company’s chief product officer, said in a statement.
All data collection processes have biases, he said, and panels or survey recruitment can allow conscious or unconscious bias to creep in, if phone recruiters respond differently to an accented voice, for example. It can be more expensive to recruit Spanish-speaking households into a panel because of the need for bilingual recruiters, he said, providing a disincentive for companies. On the other hand, cable and satellite operators—whose data fuels Comscore measurement—have an incentive to communicate bilingually because it can help increase their customer base, he said.
Third-party data is often wrong
But ultimately, the big-data providers lean on third-party data to ensure representation. And one problem is that third-party databases are often wrong.
The Association of National Advertisers’ Alliance for Inclusive and Multicultural Marketing has been tracking accuracy of third-party databases in identifying ethnicity, and while it’s found modest improvements, there’s still plenty of error.
AIMM has been using the firm Truthset to evaluate accuracy of third-party databases, said Jim Spaeth, co-founder of analytics firm Sequent Partners, which is working on the project. Truthset collects data from a variety of third-party firms, using hashed email addresses to match their data to other independent, validated databases. The process generates “Truthscores” estimating how often the databases are right in identifying people’s ethnicity, gender or other factors. By providing their data, the providers get feedback that helps them improve accuracy over time, in addition to third-party scores they can show to clients.
However, they may not want to, because the data is often wrong. In a worst case, Truthset has found one database where people identified as African American actually were only 27% of the time, and another where Hispanic designations were right only 20% of the time.
Across the whole of Truthset’s set of vetted data providers the results are better, but still far from perfect. AIMM reported in March that third-party databases as of the fourth quarter of 2021 were accurate 83% of the time when identifying someone as Hispanic and 77% of the time when identifying someone as Black or white.
The accuracy has improved some, at least on identifying people as Black or Hispanic, though the error rate increased substantially for white and Asian people since the third quarter of 2020, with the latter correct only 61% of the time late last year.
Experian is one of 21 third-party data providers now being evaluated by Truthset, according to a recent statement by the Advertising Research Foundation, though it’s not clear if it was part of the fourth-quarter data cited by AIMM. Experian representatives didn’t respond to requests for comment on the accuracy of how it classifies people ethnically.
But one positive sign is that the most recent Truthset average scores reported by the ARF for third-party providers on accuracy in ethnic classification—an average that does include Experian—appear to show improvement from what AIMM reported for late last year.
Those scores showed providers were right about African American-identification 81% of the time, Hispanic 92% and Asian 84%.
No takers for MRC evaluation
Certainly the Media Rating Council (MRC) would like to evaluate Experian and other third-party data providers, but so far they haven’t applied for accreditation as MRC CEO George Ivie has called on them to do.
But the MRC will likely get a chance anyway, because evaluating the third-party data used to ensure representative measurement will be part of its audit of Comscore, which has applied for MRC accreditation, Ivie said. There is some economy of scale here, since the same audit might also be used should VideoAmp apply for MRC accreditation, which executives of that company have said they intend to do. Both companies use Experian.
Getting minority representation right is crucial, Ivie said, and will be a critical part of any big-data audience measurement system getting accreditation.
“It’s an enormous part of what we do,” he said. “I’ve been hauled in front of Congress on more than one occasion to speak about this matter. …They’re deadly serious about the fact that our audience measurement products should be complete in their coverage and representation of Americans, and we take it super seriously.”
Clearly the industry is moving toward using passive data collection from big household device sets and matching that data against other data sets to ensure a representative sample.
“We match all this stuff together and come away kind of fat, dumb and happy, thinking we’re all good,” Ivie said. “But the problem is, we’re not all good.”
The data and associated matching and attribution processes “have inaccuracies built into them, and some of those inaccuracies are particularly pointed in the race and ethnic areas,” Ivie said.
Ivie pointed to Procter & Gamble Co. Chief Brand Officer Marc Pritchard’s extensive discussion at the ANA Media Conference about efforts to invest more in minority-owned media. “As an industry,” he said, “we have to have an infrastructure in place to support that.”
Value beyond media measurement
Besides just getting audience measurement right, better ethnicity data will help marketers do a better job overall, Sequent’s Spaeth noted, by ensuring they’re seeing accurate representations of the market and lowering effective costs when targeting ethnic audiences.
That was exactly the experience for a campaign that used Truthset to improve the accuracy of targeting for a Tecate Beer campaign targeting Hispanic buyers, said Rebekah Kennedy, director of consumer data strategy for Heineken Co.
“Typically it’s very hard to target ethnicity online,” Kennedy said. “It’s like 50% accuracy rate, which is not very good.” Using Truthset to “tweak it and make it better is hugely beneficial and makes sense.”
In an effort to make ethnic and other representation better generally across the industry, the Advertising Research Foundation has launched a new project to evaluate the bias in research panels and big data sets. It’s inviting brands, agencies, media and research companies to put samples of their first-party customer relationship management databases, automated content recognition systems or other big data sets through testing with Truthset to anonymously score their accuracy on ethnic and other classifications.
The question is how many will take the ARF up on the offer?
“One of the biggest challenges is that many of the advertisers just don’t care,” about issues of data accuracy in ethnic representation, Spaeth said, based on experience in private consulting. “Always there are other issues that are more important.”