The Truth About Online Commenters, According to The Guardian

Newspaper Does Big Data Deep Dive on Reader Comments, Learns a Thing or Two

By Published on .

In an ambitious bit of data-driven journalism, British newspaper The Guardian commissioned a study of the more than 70 million reader comments posted on its site since 2006. What was learned? Well, the headline on the report gives a hint: "The dark side of Guardian comments."

Some specifics:

[A]rticles written by women attract more abuse and dismissive trolling than those written by men, regardless of what the article is about. Although the majority of our regular opinion writers are white men, we found that those who experienced the highest levels of abuse and dismissive trolling were not. The 10 regular writers who got the most abuse were eight women (four white and four non-white) and two black men. Two of the women and one of the men were gay. And of the eight women in the "top 10", one was Muslim and one Jewish.

And the 10 regular writers who got the least abuse? All men.

A sample visualization from The Guardian's study of reader comments. Credit:

In the in-depth report authored by Becky Gardiner, Mahana Mansfield, Ian Anderson, Josh Holder, Daan Louter and Monica Ulmanu, The Guardian says that it decided to treat its millions of comments, including those blocked by moderators over the years, "as a huge data set to be explored rather than a problem to be brushed under the carpet."

In the process, the paper has produced a remarkably comprehensive and transparent look at the online community, or lack thereof, that congregates around its journalism -- as well as the nature of the journalism it's producing. For instance, the report offers visualizations of insights including: "While the number of articles published increased over time, the writers' gender gap stayed pretty much the same" and "Conversations about crosswords, cricket, horse racing and jazz were respectful; discussions about the Israel/Palestine conflict were not. Articles about feminism attracted very high levels of blocked comments. And so did rape."

The report also includes short videos from Guardian journalists talking about how they grapple with hateful reader comments.

Data nerds will appreciate a separate post titled "How we analysed 70m comments on the Guardian website" (sample detail: "We wrote the code in Scala, and deployed it to an Elastic MapReduce cluster on AWS").

Most Popular