The Delusion of Safe Platforms: Why Social Media’s War on Hate Speech Is A Sham

After using social media for years, I now believe in only one thing; the Anti-hate movements by platforms are just a show.

In October 2022, Elon Musk walked into San Francisco headquarters of Twitter with a sink. It was a joke, a literal “let that sink in” moment.

But what followed was not a laughing matter. Within weeks, he had fired roughly 80% of the company’s Trust and Safety team — the very people responsible for moderating hate speech. His reasoning? Platforms had become too restrictive. The freedom of speech, he said, was being choked to death.

Leap to 2024: one peer-reviewed study published in PLOS ONE concluded that hate speech on X had grown 50% overall since Musk was purchased, with transphobic hate speech up 260% and racist content up 42% (Hickey et al. , 2025).

In the meantime, the transparency report of X itself revealed that out of the 56 million incidents of hate speech reported within the first half of 2024, the platform suspended just 940,218 accounts for abuse, harassment, and hateful conduct combined, roughly 1/ 60 reports made (X Corp. , 2024).

These are some figures that we should all feel deeply uncomfortable. It is not due to the fact that hate speech is present on the internet, but rather the systems that are programmed to prevent it simply will never work structurally.

The definition of hate speech defines hate speech as speech that “expresses, encourages, stirs up, or incites hatred against a group” based on race, ethnicity, gender, religion, or sexual orientation (Parekh, 2012, as cited in Flew, 2021).

Its victims struggle to take part in communal existence and lead independent and satisfying personal lives, the issue is not whether this damage is a reality, it clearly is. Whether platforms can be relied upon to prevent it is the question. And the answer, as the evidence is more and more showing, is no.

Wait, Aren’t Platforms Supposed to Be Fighting This?

Here is the main contradiction: social media outlets themselves benefit off the spread of hate-based content and purport to be combating it.

It is not good intentions failing. It is a structural conflict that is built into the business model and unless we can recognize it as such, no policy adjustments will solve it.

Advertisement is the source of revenue to social media companies. Advertisers buy eyeballs, the attentions of users who are consuming material.

And the uncomfortable fact is that outrage, anger, and hostility are shockingly efficient in attracting attention. As media critic Sarah T. Roberts (2019, p. 34) explains, the idea of content moderation is only to“protect their corporate or platform brand”, the well-being of users being the victim of hate speech is not a priority.

Your Feed Is an Outrage Machine And That’s No Accident

Think about the last time you went through social media and were angry about something. Most likely, you continued scrolling, perhaps even clicking on the post, commenting or shared it.

Researchers have discovered that that response is not by chance. It is engineered.

One of the most thorough audits of social media algorithms ever to have been published in PNAS Nexus in 2025, conducted by a team of researchers led by Suleman Kootobshah, discovered that Twitter’s ranking algorithm amplified “emotionally charged, out-group hostile content” significantly more than a simple chronological feed would have.

More importantly, users said that they felt worse about this content, but the algorithm continued to push it, as it increased engagement (Milli et al. , 2025).

This is what researchers call the attention economy: platforms are competing over your limited attention and the surest method to win your attention is by provoking your emotions. Technologist and designer Tobias Rose-Stockwell made it clear: emotional responses such as outage are strong indicators of interest. So content that triggers outrage gets prioritised above everything else (quoted in Munn, 2020).

Another neuroscience study also discovered that the reward feedback loop of social media actually conditions users to become increasingly more outraged with time, since posts with more moral outrage get more likes and shares, which serves as positive feedback.

The authors noted that design choices

“aimed at satisfying other goals such as profit maximisation via user engagement can indirectly affect moral behaviour because outrage-provoking content draws high engagement” (Brady et al., 2020).

That is to say: hate speech is not passively hosted on platforms. It is actively being rewarded by their algorithms.

Researcher Ariadna Matamoros-Fernández (2017, p. 931) calls this “platformed racism”. Platforms are not merely a tool of amplifying hateful discourse, but in their structure, algorithmic logic, and business model. The site is not a neutral host. It is an active player.

X Marks the Contradiction

The story of X under Elon Musk is the most vivid real-life illustration of this structural failure, yet it is also a mirror reflecting an issue that is present throughout the industry.

In late 2022, when Musk replaced the previous regime, he positioned the current moderation system as censorship. The vision he had was a free speech platform whereby content will not be removed but downranked. On paper, this makes sense, as he said, “freedom of speech, not freedom of reach.” Practically, it showed the way hollow platform moderation had ever been.

The Centre for Countering Digital Hate (CCDH) put this to the test in August 2023. Through the official reporting features of X, researchers documented 300 posts with hate speech, including Holocaust denial, anti-Black racism, and neo-Nazi posts.

A week later, 86% of those posts were still live. More troublingly, major brand advertisements were running directly alongside this content, despite X CEO Linda Yaccarino’s public claims that the platform had “built brand safety and content moderation tools that have never existed before” (CCDH, 2023).

X sued CCDH over this research, arguing the data was “cherry-picked.” The lawsuit was dismissed in March 2024. The judge concluded the case was “about punishing the Defendants for their speech” (Stray, 2026).

The most ironic thing about it is that, a platform that purports to promote free speech was trying to shut down the efforts of researchers who were recording its shortcomings.

In my opinion, this hypocrisy is the best manifestation of the fact that platform moderation is an act, rather than a promise. The structural tension is similar in the case of Meta, YouTube, or Tik Tok: their advertising revenue is based on user’s time spent on the platform, where hateful, inflammatory content is very effective at precisely doing that.

According to Matamoros-Fernández (2017, p. 933), platforms “perform a rhetoric of neutrality” while constantly intervene in the public discourse in a manner that is beneficial to them in terms of commercial interests. Self-identifying as a neutral platform does not make you one.

“Downrank, Don’t Delete”: A Policy That Protects No One

X claims that posts labelled as violating its hateful conduct policy receive 81–85% fewer impressions (X Corp., 2024). Sounds substantial, right?

But you should know what that number is hiding: on a platform that has hundreds of millions of users, an insignificant portion of remaining reach equates to millions of people. The post that has been viewed by 1 million users and then is downranked has already caused a lot of damage.

A study of Facebook and YouTube by Munn (2020) discovered that the design architecture of those platforms results in the formation of “outrages cascades”. The promotion of viral outbursts of hostile content was facilitated by the convenience of sharing and the algorithm favoring emotional content. Removing a post post-factum would not help with the cascade that already took place.

This was empirically described by Matamoros-Fernandez (2017, p. 938) in her research on the Adam Goodes racial vilification scandal on social media in Australia. Just one Facebook page liking one of the anti-Goodes pages led the algorithm to recommend a cascade of further racist meme pages.

On YouTube, anti-Goodes content led to recommendations with public figures who comment on racism. The algorithm does not fall into the trap of promoting hate, it is designed to.

Roberts (2019, p. 35) adds an addictive footnote: the industry-standard choice to allow all content to be uploaded without pre-screening “was a business decision on the part of the social media companies themselves, and certainly not a foregone conclusion based on technological necessity.”

Platforms chose this system. They could have chosen differently.This is what makes the platforms “we’re working on it” message sound so hollow, to me. The moderation system is reactive in nature, since a truly proactive system would also kill the interaction that brings advertising revenue. The conflict of interest can not be resolved under the existing business model.

This Isn’t Just Someone Else’s Problem

You might be thinking: this only affects people who get targeted by hate speech online, not me. However, the effects of amplifying hate through algorithms are much broader than those who are directly targeted.

Research from Myanmar provides the most devastating illustration of what happens when this dynamic goes unchecked. Over the years, Facebook algorithm boosted anti-Rohingya propaganda, disseminating information that framed the Muslim minority as dangerous outsiders.

Facebook later admitted that it had not done enough to ensure that its platform was not used to stir violence. At that time, a genocide had taken place. The UN Fact-Finding Mission declared social media as a factor that determined the crisis.

The recent years have seen the regulatory efforts of Australia being anything but mild. In late 2024, the country passed the world’s first social media age ban, which came into full force in December 2025, and engaged in a federal court battle with the X Corp over content takedowns. But you will notice that even these precautions are aimed at the content itself: who will see and who will be able to leave it. None of the laws specifically answers the question: What type of algorithmic design makes this content amplification and dissemination possible in the first place? Regulators are running after the outcomes, and structural causes are not discussed.

This is the regulatory gap that is important: we are discussing whether platforms are fast enough at deleting, when the deeper issue is whether the incentive structure that generates and disseminates this content can be in any way reformed without altering the business model.

The response to this, according to the evidence, seems to be: not voluntarily. Social media will censor as little as it needs to ensure advertisers are satisfied and regulators do not get involved. They will not tone down in a way that will significantly decrease the engagement since they are selling engagement.

So What Would Actually Work?

All this does not imply that the problem is hopeless. However, the solutions must be corresponding to the size of the structural problem.

Algorithmic accountability, not just content removal. The regulators in the EU have started to make such a move, with the Digital Services Act imposing on platforms the duty to evaluate their systems and reduce systemic risks posed by their recommendation systems. This is the appropriate framing.

Advertising model reform. As long as engagement-maximising algorithms are tied to advertising revenue, the incentives will remain misaligned. Other options to consider: chronological feeds, subscription plans, or required algorithmic auditing.

Transparency as a baseline. It is also a policy failure that we know so little about the manner in which these algorithms work. X brought a lawsuit against a research organisation to study its platform. It cannot be a legal possibility.

The platforms will tell you they take your online safety seriously. The data tells a different story.And until we no longer regard hate speech as a content problem, but as a structural, economic and design problem, the moderation theatre will persist — and the harms will just keep accumulating.

So here’s a question worth thinking about: would platforms redesign their algorithms tomorrow to actually reduce hate speech, but at the cost of 20 percent of their advertising income, would they? And otherwise, what does that say to us about the actual audience of these platforms?

References

Brady, W. J., Crockett, M. J., & Van Bavel, J. J. (2020). The MAD Model of Moral Contagion: The Role of Motivation, Attention, and Design in the Spread of Moralized Content Online. Perspectives on Psychological Science, 15(4), 978-1010. https://doi.org/10.1177/1745691620917336

Hickey, D., Fessler, D. M. T., Lerman, K., & Burghardt, K. (2025). X under Musk’s leadership: Substantial hate and no reduction in inauthentic activity. PLOS ONE, 20(2), e0313293. https://doi.org/10.1371/journal.pone.0313293

Centre for Countering Digital Hate (CCDH). (2023, September 13). X content moderation failure. https://counterhate.com/research/twitter-x-continues-to-host-posts-reported-for-extreme-hate-speech/

Flew, T. (2021). Hate speech and online abuse. In Regulating platforms (pp. 115–118). Polity.

Matamoros-Fernández, A. (2017). Platformed racism: The mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication & Society, 20(6), 930–946. https://doi.org/10.1080/1369118X.2017.1293130

Milli, S., Carroll, M., Wang, Y., Pandey, S., Zhao, S., & Dragan, A. D. (2025). Engagement, user satisfaction, and the amplification of divisive content on social media. PNAS Nexus, 4(3), pgaf062. https://doi.org/10.1093/pnasnexus/pgaf062

Munn, L. (2020). Angry by design: Toxic communication and technical architectures. Humanities and Social Sciences Communications, 7, Article 53. https://doi.org/10.1057/s41599-020-00550-7

Roberts, S. T. (2019). Behind the screen: Content moderation in the shadows of social media. Yale University Press.

Stray, J. (2026, April 8). What are the politics of a platform? What the data says about content moderation on X. Tech Policy Press. https://www.techpolicy.press/what-are-the-politics-of-a-platform-what-the-data-says-about-content-moderation-on-x/

X Corp. (2024). Global transparency report: H1 2024. https://transparency.x.com/en

ARIN6902

ARIN6902: Digital Policy and Governance