×

Once registered, you can:

  • - Read additional free articles each month
  • - Comment on articles and featured creative work
  • - Get our curated newsletters delivered to your inbox

By registering you agree to our privacy policy, terms & conditions and to receive occasional emails from Ad Age. You may unsubscribe at any time.

Are you a print subscriber? Activate your account.

Machine Learning: Beware the Bias

By Published on .

Our biases are expressed in the very data we collect, infecting its analysis as well, writes Julie Wittes Schlack.
Our biases are expressed in the very data we collect, infecting its analysis as well, writes Julie Wittes Schlack. Credit: CurvaBezier/iStock
Most Popular

In the past year or so, I've been struck by how we tend to anthropomorphize the machines and mechanize the human beings whose data we rely on. The machines are "ingesting," "consuming," and "learning," while people's life stories, videos and loved-ones photos --the meaningful moments of their lives -- are flattened into consumable "documents" or "transactions."

This observation is not an excuse to rant against artificial intelligence. On the contrary, the more I work with machine learning, the more appreciative I am of its accuracy and productivity benefits. Rather, it is a call to recognize why human intelligence and compassion are, and will remain, a necessary companion to AI.

Why? Because AI falsely seduces us into believing that the software's conclusions are somehow "objective," free of the bias that inevitably informs purely human-based methods of interpretation.

We bring our biases -- our tendency to interpret comments through the filter of our own beliefs -- to the training task itself. For example, we try to make our automated systems smarter when we interpret the nuance in even a simple phrase like "Love it!" to determine whether the sentiment is sarcastic or sincere. If I'm the person assembling the training set and correcting the system's errors, I'm likely to perpetuate my own conviction that people who use that phrase in relation to the flavor of Cheez-Whiz or the music of Abba can't possibly be sincere. But my Euro-pop-loving colleague might teach the machine the opposite lesson.

What's more, our biases are expressed in the very data we collect, because often the context in which the data was collected is crucial to its proper interpretation. For instance, a generally smart concept-extraction tool we've been using at C Space told us that members of several financial services communities were having an inordinately high volume of conversation about music. Why? Because they were discussing "records" and "CDs" … but not the kind made by John Legend.

In another case, we've been making the machine smarter by training it to recognize that "pumpkin" is relevant to Halloween, but those training sets are useful only if we're feeding it training data that was harvested in October. Pumpkin content mined in spring is more likely to be about gardening, while mentions during Restaurant Week are more likely to be enticements to sample Middle Eastern cuisine.

While most of our work with machine learning focuses on text analytics and image recognition, the bias inherent in a data set influences a wide range of algorithms applied to all kinds of data. For instance, when Pokeman Go was first released, users quickly discovered a paucity of Pokeman locations in black neighborhoods. That's because its initial database of locations was based on a prior augmented reality game whose locations were based on the activities and requests of its predominantly white, English-speaking male user base -- a population less likely to frequent those neighborhoods.

This is just one of many examples of how machine learning is inevitably based on what is, not on what could be; on established patterns rather than possibility. So as useful as it is, by relying exclusively or even predominantly on machine learning, we risk constraining our ability to innovate. Data indicates what people do within the realm of what's available and observable, but it has neither conscience nor imagination. If, in the early 1960s, publishers had trained a computer to recognize great American literature, we would still be reading little but white male authors today. That's because, as data scientist Cathy O'Neil has observed, " … Big Data processes codify the past. They do not invent the future." Breakthroughs are not forged by millions of data points, but by a spark in the mind and heart of a single individual.

Recognizing and overcoming bias requires self-awareness, and that's a uniquely human – or at least organic -- trait.

In this article: