Marketers are, of course, increasingly obsessed with amassing and deploying consumer data. But Truthset thinks that the marketing world needs to take a step back and first consider the accuracy of all that data—because a lot of it may be seriously flawed.
Toward that end, the San Francisco-based firm just launched the Truthset Data Collective, comprised of 20 players in the data space—including Epsilon, Verisk, Fluent, Alliant and TargetSmart—that are together focused on “validating the accuracy of consumer data,” per the group’s mission statement.
Truthset, which was founded in 2019 by veterans of Nielsen, Salesforce, LiveRamp and Procter & Gamble, has actually been quietly working with its partners in private beta over the past few years. What that has meant in practice is that Truthset got access to massive overlapping datasets related to millions of consumers and billions of data points, giving it the ability to compare and benchmark various data segments algorithmically.
The endgame is not only good PR—data companies coming together to champion accuracy is obviously a good look—but an opportunity for Truthset to further position itself as the go-to data-validation service for marketers looking to better target customers.
Ad Age’s Simon Dumenco spoke with Founder-CEO Scott McKinley and President-Chief Revenue Officer Chip Russo about their company and the Truthset Data Collective.
Ad Age: So before we dive into the Truthset Data Collective, let’s talk about Truthset itself. What’s your elevator pitch as a company?
Scott McKinley: Basically, we set out to create a service that allows anybody who is using consumer data to measure the accuracy of every record they’re using. What our service does at its essence is it allows a user of consumer data to look at a confidence score to decide if they want to include an ID or not include an ID in a given operation.
Ad Age: Give us an example. Like, say I need to reach Hispanic consumers and I’ve rented a list.
McKinley: So, if you’re building an audience that’s supposed to be Hispanics, that data is going to come from all over the place—inferences, probabilistic guesses, some deterministic data—and there’s going to be some amount of error in that file. We allow the person who’s looking at that audience to know exactly which records are likely to actually be Hispanic and which ones aren’t.
Ad Age: How do you do that?
McKinley: We set up Truthset, we decided to put together a census-level view of everything. We went to 20 data providers, asked them to send all of their data to us, and we built an algorithm that runs across all of those data providers. We have independent truth sets—we call them validation sets—that we use to test and train our models.
So what happens, just to simplify it, is we first take every data provider and we test them by themselves against this validation set to see how good they are at getting every attribute right. So if a provider is 60% accurate at getting gender right—which is pretty bad, but it’s also pretty standard—that’s their score for getting gender right. And we assign confidence scores to every single record for 25 different attributes.