The Report Button Won’t Save Us: The Hidden Reality of Content Moderation

We’ve all been there. You’re scrolling through social media, maybe it’s Facebook or Twitter and you see it. A comment so hateful it makes your stomach turn. Someone telling a stranger they don’t deserve to exist because of who they love. A meme mocking a religious minority with a “joke” about violence. You hesitate for a moment, then click the report button. A cheerful little message pops up: “Thank you. We’ll review this content.”

And then… nothing. Days pass. Maybe the content disappears, maybe it doesn’t. You never hear back. You move on with your life, vaguely hoping someone, somewhere, handled it.

There seems to be an uncomfortable question nobody asks: What is really happening after I clicked that button?

The answer isn’t reassuring. Behind that cheerful “thank you” lies a vast, largely invisible machinery of human suffering, broken algorithms and corporate buck-passing that protects platforms far more than it protects people. And if we want online spaces that don’t constantly fail the most vulnerable among us, we need to understand how that machinery really works and why it’s designed the way it is.

Cleaning Your Clean Feed

Let’s start with a truth that might surprise you: when you report hate speech, an algorithm probably isn’t making the final call. A human being is, and they are almost certainly not sitting in a Silicon Valley office with free kombucha and a standing desk.

They’re more likely in Manila, Gurgaon or somewhere in the American Midwest. They’re probably a contractor not a full-time employee. They might be earning just a few dollars an hour to watch the absolute worst material humanity can produce so that you don’t have to.

Media scholar Sarah T. Roberts spent years investigating this hidden workforce and what she found is disturbing. In her book Behind the Screen, Roberts (2019) documents how commercial content moderation has become an “essential practice” for social media platforms it is a “yucky job” as one Microsoft spokesperson memorably called it (Roberts, 2019, p. 39). Moderators review child exploitation material, graphic violence, hate speech and animal abuse. They do it under conditions that are, to put it mildly, not great.

These workers are “fractured organizationally and geographically” (Roberts, 2019, p. 39), deliberately scattered across different employment arrangements that make it hard for them to organise or even recognise each other. Some work in-house at tech giants but often as temporary employees or contractors. Some work for specialised “boutique” firms that manage a single brand. Many work in call centre style operations particularly in the Philippines and India. Some do “microlabor” digital freelance work where they’re paid pennies per image reviewed, with no benefits, no job security and no real connection to the company whose brand they’re protecting.

They sign non-disclosure agreements that prevent them from talking about what they see. They receive minimal psychological support and when they burn out, which they do often, there’s always another worker ready to take their place in the global race to the bottom for cheap labour.

Clicking the report button kicks off a chain reaction activating a global supply chain of precarious, underpaid, psychologically damaging work and the platforms that profit from your engagement would really prefer you didn’t think too hard about that.

It’s Not a Bug, It’s a Design Feature

But surely, you might think, the problem is just scale. Billions of users, millions of posts per minute of course some bad stuff slips through. If we just had better algorithms or more moderators or clearer rules we could fix this.

That’s what platforms would like you to believe however the evidence suggests something more troubling: harmful content persists not in spite of platform design but because of it.

Take Reddit. Media researcher Adrienne Massanari (2017) spent years studying the platform and found that its very architecture with the karma point system, the upvote/downvote mechanics, the way content aggregates across subreddits creates fertile ground for what she calls “toxic technocultures”. The karma system doesn’t just neutrally reflect what users value; it actively shapes behaviour by rewarding certain kinds of content and punishing others. Posts that align with the site’s dominant culture (young, white, male, geeky and often anti-feminist) get upvoted and amplified. Voices that challenge that culture get buried or harassed into silence.

During the 2014 the “Fappening” occurred. This was when stolen nude photos of Jennifer Lawrence and other celebrities were distributed across Reddit and the platform’s algorithms actively amplified the content. The subreddit hosting the images gained 100,000 subscribers in 24 hours. Links to the stolen photos appeared constantly on /r/all, the site’s front page for non-logged-in users. Reddit administrators hesitated to ban the subreddit and one reason was simple: in six days users purchased enough “Reddit gold” (a paid currency) to run the entire site for a month (Greenberg, 2014).

Reddit profited from the non-consensual distribution of intimate images and its design, voting system, aggregation algorithm and hands-off moderation philosophy made that profiteering possible.

Scholar Ariadna Matamoros-Fernández (2017) created a concept called “platformed racism” which recognises that social media platforms through their moderation practices can create safe harbour for racism. This concept can be adapted for the “Fappening” by instead describing it as an instance of platformed sexism. It’s sexism that has been amplified and manufactured not just by users but by Reddit’s fundamental design.

Platforms claim to be neutral conduits for user expression but their design choices around what gets amplified, what gets hidden, what’s easy to report and what isn’t reflects values and priorities.

And those priorities? Engagement, growth and advertiser comfort. Not justice.

Case Study: Andrew Tate and the Affiliate Loophole

You may have heard that Andrew Tate, the controversial influencer, was banned from social media. In 2022 he declared that women “belong” to their male partners and that depression “isn’t real” and was subsequently removed from YouTube, TikTok, Instagram and Facebook. The platforms made public statements. The accounts disappeared. Problem solved, right?

Not even close.

In early 2024 a VICE News investigation revealed something startling: YouTube had been generating thousands of pounds in advertising revenue from channels actively recruiting for Tate’s “The Real World” business academy. This is an online programme critics describe as a pyramid scheme operating in a “cult-like atmosphere” (VICE News, 2024). These weren’t obscure accounts. The largest had over 600,000 subscribers and had accumulated more than 450 million views before finally being terminated.

Here’s the detail that should make you concerned: YouTube only took action after VICE News flagged the channels and asked for comment. This reveals systemic failures in how platforms enforce their own rules.

While Tate’s official accounts were banned, a vast network of affiliate channels continued posting his content. Videos with titles like “Andrew Tate’s $100,000 Birthday Celebration” and “Andrew Tate – I’m Still Standing (Music Video)” appeared months after the supposed ban. Researchers found these channels functioned as recruitment arms for Tate’s business. They pushed his “hateful rhetoric and business ventures” to young male audiences (VICE News, 2024).

The two largest Tate-affiliated channels terminated by YouTube were monetised. YouTube confirmed they carried advertising that generated revenue for the platform, though the company declined to specify exactly how much. Callum Hood from the Center for Countering Digital Hate put it bluntly: “Not only does high-engagement content like Andrew Tate’s videos generate revenue for the platform but proper content moderation also costs money. It’s sometimes more profitable for YouTube to turn a blind eye and try to get away with the bare minimum” (VICE News, 2024).

Content like Tate’s exploits how algorithms work. Joanna Schroeder, who studies gender and media representation, explains that platforms often serve young male users increasingly extreme content because “algorithms push content that is often extreme. Extreme views, hate-filled views get a lot of traction on places like YouTube” (Wilson, 2022). What starts as gaming content can gradually lead viewers into the “manosphere” a network of anti-feminist and misogynistic communities that researchers describe as “one big cesspool of hate” (Wilson, 2022).

Australian campaigner Nathan Pope, who has worked to expose Tate’s operations, described the pattern of platforms ignoring affiliate uploaded content as systemic: “Social media platforms are willing to facilitate the blatant exploitation of children for profit. Something needs to be done” (VICE News, 2024).

The Tate case demonstrates the clear difference between the word of these platforms and their actions. When the ban was announced it was with fanfare and created the impression of decisive action. Behind the scenes the platform’s own profit incentives and algorithmic design allowed Tate’s content to continue reaching millions. The affiliate network exploited a fundamental weakness in content moderation: platforms struggle to address coordinated campaigns that don’t use the banned individual’s official accounts and the enforcement only happened when journalists started asking questions.

What This Means for You (And What We Should Demand)

So let’s return to that report button. The one you clicked in frustration and the hope someone would fix the problem.

Your report joined a queue of millions. Somewhere a contract worker, probably underpaid and probably in a country you’ve never visited and probably working a gruelling shift, glanced at it. They had seconds to decide. They may not have understood the cultural context. They may have been following a training manual written in English and machine-translated into their language. They may have erred on the side of leaving content up because removing content creates more work than leaving it. Or they may have removed it and the person who posted it may have appealed and a different moderator may have restored it.

The platform recorded your click as evidence that its community flagging system is working. Advertisers saw that the platform is responsive to user concerns. Maybe the platforms quarterly transparency report noted a slight increase in proactive detection of hate speech. The content you reported might still be there. Or it might be gone and replaced by something nearly identical from a different account.

This is not a system designed for justice. It’s a system designed for scale and efficiency and liability management. Every design choice, from the report button to the opaque removal process to the reliance on outsourced labour, reflects the fundamental priority to protect the platform first and users second.

So what would a better system look like?

Transparency. We should know who moderates content and under what conditions and with what training and using what cultural knowledge. The “black box” of moderation protects platforms from accountability. It doesn’t protect users from harm.

Local expertise. Facebook’s “trusted partner” program, which connects the company with civil society organisations that understand local contexts, is a step in the right direction. As one study found however the program is opaque, inconsistent and unknown to many who need it most (Sinpeng et al., 2021, p. 20). Platforms should publicly identify trusted partners in every country and fund their work and give them real power to escalate serious concerns.

Accountability for design choices. When recommendation algorithms amplify hate speech, platforms should be able to explain why and change course. When affiliate networks circumvent bans, as in the Tate case, platforms should be held responsible for the harm that continues on their watch. When platforms profit from content that harms vulnerable people, as Reddit did during the Fappening and as YouTube did with Tate recruitment videos, they should face consequences.

With increasing penetration of these platforms into our personal lives as well as greater political influence the ability of users to have real recourse when encountering harmful content should be prioritised, however all users need to be clear eyed about the real world consequences of the simple report button and demand a system that isn’t just there to protect liability but one that promotes justice for the content moderators, victims of abuse and everyday users.

References

Business for Social Responsibility. (2018). Human rights impact assessment: Facebook in Myanmar. https://about.fb.com/wp-content/uploads/2018/11/bsr-facebook-myanmar-hria_final.pdf

Greenberg A (2014) Hacked celeb pics made Reddit enough cash to run its servers for a month. Available at: http://www.wired.com/2014/09/celeb-pics-reddit-gold/.

Massanari, A. (2017). #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, *19*(3), 329-346. https://doi.org/10.1177/1461444815608807

Matamoros-Fernández, A. (2017). Platformed racism: The mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication & Society, *20*(6), 930-946. https://doi.org/10.1080/1369118X.2017.1293130

Roberts, S. T. (2019). Understanding commercial content moderation. In Behind the screen: Content moderation in the shadows of social media (pp. 33-72). Yale University Press.

Sinpeng, A., Martin, F. R., Gelber, K., & Shields, K. (2021). Facebook: Regulating hate speech in the Asia Pacific. Department of Media and Communications, The University of Sydney. https://hdl.handle.net/2123/25116.36

VICE News. (2024, February 7). YouTube profited from Andrew Tate recruitment videos despite ‘banning them’. https://www.vice.com/en/article/youtube-profited-from-andrew-tate-recruitment-videos-despite-banning-them/

Wilson, B. (2022, September 6). Andrew Tate’s been banned from social media. But his harmful content still reaches young men. CBC News. https://www.cbc.ca/news/andrew-tate-social-media-bans-harmful-content-1.6573978

ARIN6902

ARIN6902: Digital Policy and Governance