Why Political Data Is a Complete Mess
Politics is hardly ever a clean business, but the voter data underlying campaign strategies can be a real mess.
While today's political data technologies and analytics may look automated and highly precise, the information feeding these systems often comes in far less sophisticated forms, including scanned printouts filled with nonstandardized data. It might even be delivered on a floppy disk.
When Sherrie Preische, a Democratic data cruncher in New Jersey, requested information this spring on election outcomes by precinct in Burlington County, she received a 46-page scan of a paper report from the county clerk's office showing the number of voters registered in each of hundreds of voting districts, how many people voted and for which candidates. Ms. Preische, partner at Fifty-One Percent, a political data and analytics startup that serves local campaigns in New Jersey, said some election results data is available for easily-digestible digital download from the Burlington County website, but the precinct-level data was not available that way.
"Some county clerks have election data available in a nice form," said Ms. Preische. "Some of them you feel like they're taking it out of a box somewhere." Burlington County is certainly not alone.
Younger campaign staff might not even recognize the format used to store recent election results data passed along to one political data consultant: a floppy disk (shown above). (The consultant asked not to be named.)
The core voter data that political campaigns and consultancies rely on comes from over 3,000 County Clerk offices in the U.S. And that election outcome data and voter record information is far from standardized or "clean." From stale addresses to the variety of labels given to voting areas, such as wards, districts and precincts -- some voting sections are known as "beats" in Mississippi -- there are countless variables political consultants, campaign staff, volunteers, polling researchers and data analysts must navigate before they can make sense of the data.
When organizations make a New Jersey Open Public Records Act request for election results data, explained Burlington County Clerk Tim Tyler, the information would have to be added to digital databases by manually typing in the information from scanned printed tallies. "You would probably have to take the .pdf that we sent you, take the information from there yourself and put it into your own databases," he said.
Not only are the states diverse in how they organize and disseminate this public data, there are discrepancies within the states, too. "The Vermont voter file has 42 different spellings of Burlington," said Allen Fuller, COO at Republican data firm Voter Gravity, referring to misspellings of the Vermont city.
People managing and updating any type of data set need to ensure each piece of information in each data field is in a standardized form in order for automated systems to filter and analyze it. Yet even the simplest address can be expressed in several ways in voter registration data. East Main Street could be displayed as Main Street East, Main St. East, Main Street E, and so on. "You end up with just a ton of possibilities," said Mr. Fuller.
Rural addresses have their own quirks. An address in a desolate part of Oklahoma, for example, could be as inexact as "2 miles past the railroad tracks" in a given database.
"States have precinct names like 'Fire Station 103' but also IDs like 'az417,'"said Mr. Fuller. "The Secretary of State file has the ID but activists on the ground are used to working with the names."
And, yes, deceased people could end up on walk or phone-banking lists. "I guarantee you there are dead people on your voter file and I guarantee you there are going be more dead people on your voter file tomorrow," said Bill Russell, voter file consultant at Democratic data and analytics firm TargetSmart.
The messier the data the more confusion in real-world scenarios. A list provided to a volunteer headed out to knock on doors to remind supporters to vote might include incorrect or jumbled information if data isn't scrubbed and standardized.
Analysts need to use voter history, party-affiliation data and party-supplied voter IDs to help campaigns determine which voters are likely supporters and which are better left alone. In Virginia, for instance, voters do not designate party affiliation when registering, and political databases might include multiple IDs from party organizations or analytics firms showing the degree to which a voter is a likely supporter of a party.
Mr. Russell has witnessed plenty of disorderly political data in the 15 years he's dealt with voter files. When TargetSmart, which partners with one of the most widely-adopted data platforms for Democratic campaigns, NGP VAN, puts voter information through its cleansing process, the company makes changes to more than 20% of the records on average, he said.
Around 5% are duplicates, meaning the same voter is present in data from more than one state or county, he said. As anyone who's received mail sent to a previous resident of her home can attest, the data doesn't always keep pace with people as they move from place to place.
That is especially true in urban locales filled with renters, said Ms. Preische. "When you get into urban centers -- and we have several of them in New Jersey -- people move several times a year. People's data in those kinds of situations are just much harder to track and much harder to use," she said. "It's not that they don't vote, but tracking them from year-to-year and having them carry along all of their consumer data and all that stuff, it's very messy; it doesn't translate well."
Political researchers, media buying firms and analytics companies often layer on consumer data in the hopes of understanding groups of voters more holistically by integrating demographic or ethnic data associated with their neighborhoods, TV viewing data, purchase data and other supplementary information.
Those dealing in political and voter data don't expect federal standards to be put in place any time soon, if ever, but some states are modernizing election and voter-data reporting rules and systems.
The State of California awarded CGI, the international tech services firm that built the notoriously problematic HealthCare.gov site, a $39 million contract in 2013 to implement its centralized voter registration and reporting system, deemed VoteCal. It will encompass data from the state's county election offices; be used to de-duplicate and update voter registration records; help election officials keep track of voter party affiliations; and generate election results and voter data reports.
However, even as political campaigns grow more data hungry, it might take a while for other governments to adopt 21st century technologies. When asked by Ad Age whether New Jersey or Burlington County has plans to upgrade reporting to more data software-friendly formats, Mr. Tyler responded, "You're probably the first person I've heard talk about it."