Why Harmful Content Persists Despite Moderation: Meta and the Limits of Platform Governance

Meta’s 2025 Shift and the Question It Reopened

*Meta’s January 2025 announcement on “more speech and fewer mistakes.” Source: Meta (2025).*

In January 2025, Meta announced major changes to its content moderation policies, claiming to prioritize

“more speech and fewer mistakes”.

The goal was to minimize the removal of lawful speech and prioritize the inclusion of valid speech. Nevertheless, Meta’s announcement raises the more complicated issue: If platforms do not moderate content as rigidly, how do they address the harmful content that moderation is intended to contain?

Typically, this is discussed as a dichotomy. Most argue that the answer lies somewhere in the center of either too much content removal, and, by contrast, a lack of removal of content. This framing is often not adequate.

The primary issue is that the frameworks that platforms employ to assess and moderate harmful content do not account for how harm actually manifests online.

The categories of harm are often too narrow. The relevant context is difficult to assess. The design of platforms often amplifies the problem, in this case, hostility. Finally, moderation is often too late. Meta’s 2025 policy shift makes the structural issues more visible.

Why Online Harm Is More Than Offence

An important aspect of the challenge is that online harm is interpreted predominantly as a matter of offence. It creates an unwelcoming environment and makes people less willing to participate in online public discussion.

*Adults in Australia who had seen or personally experienced online hate in the previous 12 months. Source: eSafety Commissioner (2025).*

It increases costs of online participation for an already targeted group. From this larger context, moderation is not simply about whether statements are deemed to be rude or controversial. It entails an entire range of contextual, social, risk, and group-related harms in relation to various geographic, linguistic, and political domains.

A platform’s written rules may be thorough, detailed, and articulate, and yet still, in a substantial, meaningful way, fail to protect people if those rules cannot recognize the everyday forms that rule breaking entails.

*Cyberbullying and harassment as a user-facing platform problem. Source: UNICEF.*

What Meta Actually Changed

*Meta’s reported hate speech prevalence on Facebook. Source: Meta Transparency Center.*

These changes were more than symbolic. Meta stated that it would focus more on illegal and high-severity violations, and replace third party fact checking with a community notes model in the United States (Meta, 2025).

“This approach has gone too far. As well-intentioned as many of these efforts have been, they have expanded over time to the point where we are making too many mistakes, frustrating our users and too often getting in the way of the free expression we set out to enable.”

Still, the benefits to users in terms of fairness and equity of the rules are overshadowed by the fact that if a large number of harmful content is visible, the most significant cost is borne by the users who are exposed to harmful content.

In that regard, we can better understand the governance implications of Meta’s change. A platform that strives to identify the clearest, most blatant, or most obviously illegal violations is not neutral. It is making a judgment on the types of harm that warrant quick action and which can be ignored for longer.

This is critical because many forms of online abuse do not begin with overt threats or explicit hate speech. Instead, they involve ridicule, insinuation, coded language, targeted repetition, organized harassment, or individual posts that seem trivial but are harmful because of their cumulative effect.

The key issue is that harassment and hate often work through accumulation rather than through a single dramatic post. A single post may not seem too serious to result in the most severe action by the platform. Nonetheless, an environment can be created, with sufficient posts of ridicule, hostile mentions, or coordinated grouping that is humiliating. Because these posts accumulate gradually, it appears that the existing policies will make it easier to allow low-level harassment and hate posts to spread, even while preventing plenty of severe posts.

Reuters described a significant change in the way Meta oversees political content, while the Oversight Board indicated that due to lack of policy and delayed responses to content, harmful content would remain online in a high-risk scenario for too long (Meta Oversight Board, 2025; Reuters, 2025). The case demonstrates the fact that moderation is influenced by risk and by decisions where users will bear the cost.

Why Platforms Struggle to Recognize Harm

Platforms want to categorize harm and sort different types of harms into categories that are clean, when in reality, harms are neither clean nor stable.

*Meta’s reported hate speech removals on Facebook and Instagram. Source: Statista, based on Meta data.*

Harmful speech is context sensitive and dependent on many variables, including: tone, local slang, cultural references, the politics of a country, and the level of knowledge of the collective audience including the people targeted by that speech. This problem is especially clear in research on Facebook in the Asia Pacific Region.

Sinpeng et al. (2021) show that hate speech is deeply affected by language and context, as well as the global reach of Facebook’s Community Standards and how automated systems do not capture that complexity. A post may not fit neatly into a particular category defined by a specific policy, yet be perceived as hostile, belittling, or threatening. What the system may categorize as borderline may be clear in context as harmful.

If harmful content is consistently missed because it is too context-specific, too coded, or too socially embedded to meet a category, users do not merely accept that moderation is flawed. They become disappointed with the platform’s capacity to identify the issue.

This is more serious in Meta’s case since the company has indicated that it wants lighter controls. If a platform is already having difficulty pinpointing context-related harm, then reducing available options will only make that blind spot more significant (Sinpeng et al., 2021).

How Platforms Amplify Harm

Misrecognition is only part of the story. There are significant gaps within a platform’s response to social harm, despite the fact that it can identify some of social harm. Social media channels are structured to facilitate the continuous shifting of audience attention, and the content that elicits the most intense responses also tends to travel the farthest, appearing with greater regularity, and it spreads through reposts, replies, recommendations, and user coordination.

*Most reported experiences of online hate occurred on social media, often involving strangers. Source: eSafety Commissioner (2025).*

Under these circumstances, the persistence of harmful content is not the issue. It is sufficient to have the content circulate a limited way, have it repeated to sufficient ends, and have it exposed to adequate visibility in order to create a negative experience for the users who are the targeted audience.

This is why it is helpful to consider platforms as environments that can amplify speech rather than as neutral containers for speech. Harm is often shaped by the interaction of design, governance, and user culture, rather than by individual bad actors alone.

The analysis of Reddit by Massanari (2017) demonstrates how toxicity can become embedded in cultures as a result of visibility, rapid participation, and poor governance. In the Australian context, Matamoros-Fernández (2017) points out that racism online is influenced not only by platform policies but by individual user speech, the business model and technical architecture. Meanwhile, the governance problem is whether platform enables hate speech to be distributed with significant reach, social momentum, and user reinforcement before anyone intervenes.

Meta’s 2025 shift should be evaluated on its own merits, and mistaken removals are not the only factor at play. Once user governance becomes less restrictive, the logic of circulation will follow the same pattern as subsequent enforcement. By extension, even the highly harmful content that is removed is only removed after it has been disseminated, repeated, captured in a screenshot, and used as a social cue for further abuse.

The main concern is not just that too few removals occur, but rather that the platform is allowing certain forms of harm to circulate in the community while it is awaiting more substantial cues, clearer violations, or community-driven forms of correction.

Why Moderation Often Arrives Too Late

*Public attitudes toward how well social media companies address online harassment. Source: Pew Research Center (2021).*

On large platforms, the bulk of content posted is not premoderated. It is posted, then judged—often after users report the problematic content. By then, the damage may be done. A post may be seen, shared, copied, mocked, and integrated into a larger context of abuse before a moderator gets to it.

As the Oversight Board’s own wording is striking here:

“These actions were too late. By this time, all three pieces of content had been posted.”

This shows why “eventual removal” can be more frustrating to users than to the platform. The post may be removed, but the social exclusion, fear and pile-on may remain. Roberts (2019) shows how commercial moderation is expensive and reactive, and the labour is largely hidden.

From the outside, moderation appears to be a simple technical system that filters out bad stuff. The reality is that moderation happens under conditions of scale, speed, exhaustion, ambiguity, and judgement. This makes moderation far messier than policy pages suggest. In Meta’s case, the promise of safety feels empty to users, even when the rules are written well, because a platform’s formal safety policies are mostly about what happens after interventions, not about how long harmful content is visible, how far it spreads, and how much damage it causes before the platform finally intervenes (Roberts, 2019).

There are many reasons for concern about the nature of people’s experiences using digital content moderation tools (or the lack). Perhaps the most serious is the result of a loss of trust. Once users witness abuse of the reporting system, their trust starts to erode. Someone eventually becomes so frustrated that they cease reporting, and reporting decline sharply. Users also become apathetic about moderation and report only the most obvious and egregious abuses. Moderation is only partially effective if it is selective, inconsistent, or opaque. The mere existence of harmful content is not only a failure of moderation but also a failure of governance (Roberts, 2019).

Why Moderation Alone Is Not Enough

*Only 38% of social media users said content controls improved their experience. Source: Ofcom (2024).*

If moderation is often too late, simply removing more content will be of little use. This also explains why Woods and Perrin (2021) advocate for a duty of care.

It reframes the question from whether platforms are responding to harmful content to whether they are designing systems that are likely to cause harm. It emphasizes the need to examine the design of recommendation systems, reporting systems, moderation, user safety, internal escalation, and other structural elements that are driven by economic interests to guide moderation priorities. It is a holistic approach that considers harm as an inevitable consequence of a complex system rather than a failure of moderation.

Given that, we must ask, how are users being needlessly exposed to risk? This is even more pertinent, given the changes Meta made in 2025. It is likely that the 2025 changes resulted in shifts in how the company is willing to address risks. How much responsibility is the company willing to take, and how much is it willing to build for users and community? In summary, these changes demonstrate a governance model that treats harm as something to be managed after the fact, even when the platform itself affects the risks faced by users (Woods & Perrin, 2021).

The Limits of Platform Governance

This case represents more than just a disagreement about where to draw a line regarding the freedom of expression. It illustrates a more detailed concern about the ways in which the platforms manage the issue of harm.

The reason why harmful content persists is not only that some posts are ignored, but also platforms have a narrow definition of harm, fail to use the appropriate context, increase hostility, and intervene far too late in largely reactive systems.

This is why moderation cannot be a primary approach. A more credible response would begin in a very different place and take more care in examining the systems of the platforms, how these systems are structured, what they incentivize, and how much responsibility companies want to take for the systems that they build.

*The Digital Services Act reflects a broader shift toward systemic platform regulation rather than post-by-post moderation alone. Source: Council of the European Union.*

As long as the first concern is visibility, and harm is something that is to be managed later, moderation will always be reactive and will always be late to the process.

References

Massanari, A. L. (2017). #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, 19(3), 329–346. https://doi.org/10.1177/1461444815608807

Matamoros-Fernández, A. (2017). Platformed racism: The mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication & Society, 20(6), 930–946. https://doi.org/10.1080/1369118X.2017.1293130

Meta. (2025, January 7). More speech and fewer mistakes. About Meta. https://about.fb.com/news/2025/01/meta-more-speech-fewer-mistakes/

Meta Oversight Board. (2025, April 23). Wide-ranging decisions protect speech and address harms. Oversight Board. https://www.oversightboard.com/news/wide-ranging-decisions-protect-speech-and-address-harms/

Reuters. (2025, April 23). Meta’s oversight board rebukes company over policy overhaul. Reuters. https://www.reuters.com/sustainability/boards-policy-regulation/metas-oversight-board-rebukes-company-over-policy-overhaul-2025-04-23/

Roberts, S. T. (2019). Behind the screen: Content moderation in the shadows of social media. Yale University Press.

Sinpeng, A., Martin, F., Gelber, K., & Shields, K. (2021). Facebook: Regulating hate speech in the Asia Pacific. The University of Sydney & The University of Queensland. https://doi.org/10.25910/j09v-sq57

Woods, L., & Perrin, W. (2021). Obliging platforms to accept a duty of care. In M. Moore & D. Tambini (Eds.), Regulating Big Tech: Policy responses to digital dominance (pp. 93–109). Oxford University Press. https://repository.essex.ac.uk/31444/

ARIN6902

ARIN6902: Digital Policy and Governance