A Data Lab Rat in the Big City: Why Trackers Couldn't Trap This City Dweller
I am my loyalty card -- or am I?
When I purchased cereal and seltzer at a regional grocery-store chain last month, information about the items I purchased -- the cost, packaging types, date and store location -- was sucked into a database associated with a shopper ID number that can be linked back to my name and other personal data. This data is among multiple streams of information marketers gather about me and is crucial to painting a complete picture of who I am as a consumer.
But that picture, it turns out, is rather blurry. I asked companies that handle offline and online data to assist in a three-week tracking experiment. During that period, the data collectors had no clue I bought a whole sea bass at the farmer's market a couple days after the grocery store visit (I paid cash) and some even struggled to pin down something as seemingly basic as my home address. This admittedly unscientific study showed that, even though I agreed to let the trackers watch me all they wanted, gaps in what they could have learned about me abound.
Companies that forage for data nuggets about us sometimes are willing to share what consumer segment they've lumped us into -- one that drives a midlevel sedan or loves sports. I wanted to know how they get to those conclusions.
After three weeks during which five companies tracked my loyalty cards, mobile location and other information, it became clear that these dark data corners are especially abundant in crowded cities, like the one I live in.
While some of their observations were spot on, my trackers found me only sporadically and couldn't always shadow me in my home city of Jersey City, N.J., which is apparently the Bermuda Triangle of data collection. This might comfort privacy advocates, but marketers pouring millions of dollars into data insights should consider that data's limitations.
This limited experiment managed to expose serious flaws in tracking urban dwellers; in particular, it's difficult to distinguish an individual, such as myself, from neighbors belonging to varied ethnic, financial and demographic groups like those in Jersey City. Anyone following U.S. population trends knows how problematic that is. The young people marketers always pursue are more city-centric than ever. According to Nielsen's "Millennials: Breaking the Myths of this No Strings Attached Generation" report published this year, 62% of millennials said they prefer living in urban areas where they can be near eateries, shops and their workplaces. The research firm found that growth in U.S. cities outpaces growth beyond their limits for the first time since the '20s.
Few companies in the consumer-data industry are willing to draw back the curtain in the way Ad Age requested. Most declined to participate, citing legal or privacy issues, or concerns that too much proprietary information would be exposed. Three of the five project partners were especially instrumental in exposing holes in urban data: 4Info harvests location data through mobile-phone apps; Catalina grabs exact product-purchase information from grocery and drugstore loyalty programs; and Speedeon Data compiles public and third-party data in an attempt to understand who's in my household and neighborhood.
These companies and the other two firms that took part (Truste and Crimson Hexagon) normally wouldn't store my data in such a way that it is easily associated with my personal identity. While large data aggregators thrive by linking information about people using personally identifiable information, they stress that they are interested in doing so only to corral people into targetable segments. These companies have secure warehouses of aggregated data linked to shopper numbers and device IDs, usually storing personally identifiable data separately, if at all. The partner companies I worked with needed to obtain my explicit permission in order to expose the information they collected about me to me.
People often think of data tracking as it relates to what we do online, where data flows continuously and immediately. The most interesting stuff compiled for this project, though, came from the real world. I visited numerous places during the three-week research period, but to see many of them plotted on a map and listed by 4Info with latitude, longitude, date and time was eye-opening.
Ultimately, 4Info wants to figure out which of the locations it spots me in is my actual home address. It takes that piece of inferred personal data to partners like Acxiom and Speedeon, who match it to data about who lives there. That combined information is then stripped of personal identifiers and plugged into an audience segment.
But 4Info needs something to work with. In order for the company to keep track of me, I downloaded mobile applications including the ABC News app, an exercise app called Daily Workouts, and AroundMe, which lists restaurants, gas stations and other places nearby based on current location. All the apps required me to allow location data tracking -- a common request for ad-supported apps that send phone location data to firms like 4Info for ad targeting.
At the end of the experiment, 4Info gave me data showing around 20 locations its system spotted me in, some of them multiple times, based on where I was when I opened an app. The company found me near Grand Central, where I take the subway to and from the Ad Age offices. It tracked me at Liberty State Park on the Hudson in Jersey City. And it eyed me at Yulie's Place, a Hispanic food joint near my apartment that I walk by often -- but have not gone in. It also noticed a trip I made to Niagara Falls, N.Y.
A lot of what these data hunters readily retrieve about other consumers remained elusive in my situation. Not only am I not an app super-user or loyalty-card addict, I live in a large city and the places I frequent are not easily distinguishable from others near them. More important, my apartment is in a multifamily rowhouse with others on either side of it, making it difficult for any mobile tracking system to pinpoint my precise address.
4Info never found my address even though I opened the tracked apps in my home. That's partly because the system is designed not to jump to erroneous conclusions. Not until it registers a user multiple times at a single residence does it assign that residence as a home address. During the three-week experiment, it came close (two neighboring houses two times each), but not close enough.
Without that key piece of data, the connections that 4Info could have made between my mobile device, my current location, where I live, my interests and previous purchases were tenuous at best.
"Because our algorithm requires a high degree of certainty before it associates a device with a household, it can take more time, and we simply didn't see your device often enough within the time frame of the test to be assured of a match," Kirsten McMullen, 4Info's chief privacy officer, told me.
As a result, I got fewer ads and irrelevant ones. I was served a mobile banner ad for a Swarovski necklace; I don't wear jewelry. An ad in the workout app pictured a gurgling baby and promoted Gerber yogurt; I don't have kids (and hopefully the algorithms haven't discovered I'm pregnant).
Most location data has its fuzzy spots, according to Sue Davidson, senior VP-analytics and accountability at digital agency R/GA, adding, "It's probably going to get better."
"You have to be working with all materially different forms of identity linkage to overcome the problems you are raising," said Rick Erwin, president-consumer insights and targeting at Experian Marketing Services, which partners with 4Info. He also suggested the fact that city dwellers change residences a lot is "a bigger problem."
The city-shopper disconnect
Another reason why marketers can't easily pin down urbanites? How we shop.
Many city dwellers shop for food at smaller markets that aren't aligned with loyalty programs or other means of tracking shoppers. That prevents companies that gather data from large retailers from knowing most of what I buy on a regular basis.
Consumers join loyalty programs for discounts, but the programs are even more valuable to the chains that offer them -- and their data partners. Long before the tracking project began, I was an A&P my+Rewards cardholder and was accustomed to receiving targeted coupons along with my paper receipt when making larger shopping trips. I wanted more reporting fodder, though, so Catalina asked me to join other programs from which it gleans shopper data. Scannable plastic tags from Rite Aid's Wellness+ program and the ShopRite Price Plus Club now dangle from my key ring.
By joining those programs I allowed the retailers to track what I bought from them and attach that information to a shopper ID associated with my name and other personal data. What shows up in Catalina's database is a near-mirror reflection of what I see on my printed receipts. My shopper IDs for each store are linked to around 50 items I purchased during the three weeks from Catalina's retail partners.
The A&P data Catalina showed me goes back to November 2013 and encompasses purchases of another 100-or-so items. I can see I purchased two boxes of Kashi GoLean Crisp cereal at "Store 3" (which I know is A&P) for a total of $8.58 in March. The data, which lists the product's UPC, triggered a coupon for the same item. In addition to enabling customized discounts in-store, Catalina partners with 4Info to aim mobile ads to people based on their in-store purchases. I received no targeted offers at A&P, ShopRite or Rite Aid during the project-tracking time.
Catalina's product data is precise, but there are gaps in produce purchase info. Retailers usually have store-specific PLUs or price lookup codes associated with vegetables, fruits, fresh herbs and the like. So Catalina doesn't know that I bought two artichokes from A&P on Sept. 6.
But such purchases don't necessarily matter to the big CPG companies that tap Catalina data to learn about who's buying their packaged goods.
More smoggy city data
General information that accurately reflects city residents can also be difficult to obtain and in many cases is irrelevant in my tightly packed, multiethnic neighborhood. Jersey City is the 21st most ethnically diverse city in the U.S., and the most ethnically diverse city on the East Coast, according to Speedeon. And it's a proverbial sardine can: Jersey City has a population density of 16,700 people per square mile; cities like Omaha, Neb., and Columbus, Ohio, have about 2,680.
Drilling down to the zip+4 level, the modeled data intended to define the small area around my home is spotty. Speedeon data is correct in suggesting I am college educated and enjoy travel. It also says I'm likely to be Asian or Southeast Asian. Many of my neighbors are Asian, but I am not. Neither is the musician from Peru who lives below me, or the Puerto Ricans, Dominicans and African-Americans on my block.
When companies can't determine home addresses or other specific data to categorize consumers, "A marketer might default and say, 'What are people like in that neighborhood?'" said Speedeon Chief Operating Officer Joshua Shale. "Jersey City is designed to defeat that."
What the research revealed about people living in tightly packed places is typical, suggested Peter Vandre, senior VP, digital analytics practice leader at marketing tech consultancy Merkle. "I think if you were to rinse and repeat this exercise for a bunch of people living in other very dense urban locations you would get very similar results."
Steve Simpson isn't so quick to write off urban data efforts. The global head of data analytics for Starcom MediaVest Group said it's important to remember that marketers can work with large data brokers and consultancies like Merkle, Epsilon, Acxiom and Experian, which gather lots of disparate data and try to connect it to individuals using keys like an email address or home address. My experiment's time frame also likely limited insights. "If they looked at you for six months they would identify patterns in that data," he said.
Before anyone living in a bustling metropolis is lulled into a false sense of data obscurity, let's remember that most of us supply companies with data about our purchases, locations, online interactions, social connections, driving habits, exercise routines and more, making much of what we spend our time and money on at least somewhat visible to marketers.
If anything, this brief tracking experiment highlighted the fact that much of the information floating around about us exists in databases that are not easily linked, due to technical hurdles as well as privacy and security safeguards. That's why companies with lots of first-party data -- stuff collected through their own properties -- such as Google, Facebook and Amazon, are formidable players in the consumer data arena.
"There are thousands of companies out there that do some little piece of the puzzle," said Mr. Erwin.
The firms participating in this project offered a fascinating peek at my bit part in today's consumer-data economy. There's no question we should be aware of the data we disseminate and its potential impact on our lives, but in some ways the project was a reminder that information that can be quite telling about individuals is not always evident in data. The data sleuths didn't track the futility of me watching a late-season Mets game at a bar with an unreliable internet connection, and they missed that I paid cash to see a rock show in Brooklyn. Indeed, much of who we really are lies between the rows and columns of our data dossiers.