You Report, It Protects

When you encounter hate speech on social media, clicking on report, filling in reasons, and submitting evidence, you believe that this mechanism will bring protection. But what you often receive is a cold “not violating community guidelines”. Anger, confusion, and powerlessness follow—why is it considered “compliant speech” in the eyes of the platform, even though it is blatant discrimination and harm?

This type of reporting failure is not accidental, but a long-standing reality in social media. Reporting does not appear as an error correction mechanism, but rather as a rule verification mechanism:

It does not verify the authenticity of the harm, but explores the “legitimacy” of the content details within the framework of rules (Gillespie, 2018).

As Meta’s practice has shown, user participation is not about “governance”, but about the repeated confirmation of platform rules. In other words, if you think you are resisting hate speech, you are actually cooperating with the platform to confirm which hatred can be “legalized” to exist.

Vedio from Youtube: https://www.youtube.com/watch?v=gyQH7JiGBLE&t=8s

The Trap of Proceduralism: Are Oversight Board Truly Independent?

To understand this “rationalization” mechanism, we need to examine a key institution—Oversight Board. This board is often described as an independent, impartial third-party body dedicated to content moderation decisions, established with the original intent of addressing the trust crisis inherent in platform self-regulation. But can we truly entrust our trust to it?

On April 23, 2025, the Oversight Board issued its ruling on the Gender Identity Debate Videos case, allowing the retention of two “gender identity videos” that had been heavily reported by users.
In the case:

A video from Facebook shows a transgender woman being questioned and humiliated while using a women’s restroom, implying she was trespassing on female space.

Another video was posted on Instagram, which documented some viewers disapproving of transgender underage athletes who won women’s sports competitions in the United States. The video also referred to the athletes as “male students who consider themselves girls,” implying that transgender athletes are exploiting female resources.

The accused in the video has been subjected to continuous verbal attacks on social media platforms, even involving personal safety issues of underage transgender individuals. Countless users spontaneously reported and protested in groups in an attempt to stop this hatred. However, the supervisory committee ignored the harm caused by these hatreds and ultimately determined that the content did not constitute “direct incitement of hatred or violence against specific groups” on the grounds that it belonged to “public discussion issues”, giving a verdict of “not violating rules”.

The Board finds neither post violates Meta’s Hateful Conduct policy. A violation consists of two elements: (i.) a “direct attack” in the form of prohibitions listed under the “Do not post” section of the policy; (ii.) that targets a person or group on the basis of a listed protected characteristic. For both posts, the absence of a “direct attack” under the revised rules means there is no violation. The Board notes that “gender identity” remains a protected characteristic under Meta’s Hateful Conduct policy.

Why is the ruling based on rules? This cannot be separated from the interests of the supervisory committee and the platform: the funding of the supervisory committee comes from Meta, and its ruling basis is limited to the Meta platform, based on Meta’s content policies, values, and human rights responsibilities. Although the supervisory committee strives to maintain independence in its operations, it is not a complete external supervisor, so this ruling is more like a self proof of the rules.

Image from Pixabay:https://pixabay.com/images/download/x-784077_1920.jpg

The platform transforms complex social harm issues into “computable decision—making problems” through rules and processes (Gillespie, 2018). What replaces real damage is the technical terminology of the platform. The platform has grasped the formulation and modification of rules from the beginning, and the supervisory committee claims to have exchanged “procedural justice”. Victims feel that the rules have betrayed justice, and the authenticity of independence is shaken. The seeds of distrust have taken root from this.

Has the platform narrowed the boundaries of hatred and made speech more free?

Meta’s policy update during the ruling process provides a seemingly contradictory answer: hate violations have been narrowed down to direct attacks on protected characteristics of others, while “freedom of speech” has been further expanded—allowing the LGBTQ+ community to be described as “mentally abnormal” and so on. The platform narrows the boundaries of hate speech and broadens the space for expressing hostility.

More precisely, this approach is a strategic ambiguity (Caplan, 2018). In academic discussions, hate speech is often understood as the denigration, dehumanization, and incitement of hostility towards a particular group, not necessarily directly inciting violence, but it gradually deepens the systemic structure of discrimination (Flew, Terry, 2021). Meta’s approach is to simplify all of this into a quantifiable indicator—whether it directly incites violence, while more subtle exclusionary behaviors are packaged as “discussions on public issues”.

The underlying problem brought about by this strategic ambiguity is that while the platform appears to moderately relax the boundaries of speech, it is evading its own responsibility for the consequences of speech. When hatred is narrowed down to the actionable definition of ‘direct incitement to violence’, systemic humiliation, repeated stigmatization, and erosion of the daily dignity of specific groups are legally classified under the category of ‘opinion expression’. As a result, the platform has gained a technological exemption space: it does not have to judge whether a certain statement constitutes psychological violence or cultural harm. But only relies on algorithms and content reviewers to mechanically judge the “directness”. As a result, the ‘freedom’ of speech has become a one-way right – the party with the right to speak can use hostile labels without hesitation, while the targeted group must bear the psychological and practical costs of this “legitimate expression”. The platform appears to balance freedom and security, but it simplifies complex moral judgments into right and wrong issues through technological rule making, thereby depriving the community of their right to speak about their own experiences.

Image from Cartoon Movement:https://www.cartoonmovement.com/cartoon/likely-consequence-free-speech?utm_source=chatgpt.com

The result is an unequal freedom: those who give considerate labels to the group have the cover of “debate”; And the group labeled is suffering increasingly heavy damage. When the dignity of a transgender person is treated as a ‘debatable’ topic, the platform is actually exercising a unilateral interpretive power (Noble, 2018)—it decides what constitutes’ reasonable hostility ‘and what constitutes’ undeserved violence’, and the victim can only accept the judgment of this set of rules. The true freedom is never speech itself, but the ultimate power of the platform to explain hate behavior.

Can Community Notes keep up with the spread of hate speech?

Faced with a crisis of trust, Meta has launched the Community Notes mechanism since 2025, replacing platform arbitration with user consensus. Meta responds to criticism of freedom of speech by delegating power to community users, allowing content from different contexts to exist based on user consensus from diverse perspectives. But is this shift really a form of responsibility outsourcing?

The “surveillance capitalism” framework proposed by Shoshana Zuboff (2019) can help us understand this issue: platforms act as institutional designers, transferring the responsibility of auditing from themselves to users while continuing to control data infrastructure and algorithmic power. Users bear the responsibility of explaining whether this is harmful or not, and whether there is a background. We do not deny the help of this explanation for freedom of speech in cultural and contextual contexts, but we should also pay attention to the platform’s behavior of retreating behind the scenes. We are also concerned about the asymmetry in time, as user annotations require consensus to appear, and content that incites hatred may have already spread widely. By the time the comments surface, the damage may have already occurred.

Image from United Nations:https://www.un.org/en/hate-speech/impact-and-prevention/why-tackle-hate-speech

From this perspective, Community Notes are more like a means of delaying accountability—it makes the platform appear “in action” but cannot keep up with the spread of hidden hatred.

What is the core fact?

We can focus on the business model of the platform itself to see a core fact: the core goal of social media platforms is not to reduce harm, but to maximize user engagement time . In this logic, hate speech has a dual function. It is both an object that platforms need to control and govern, and a content resource for attention economy, which can drive user interaction (Crawford, Gillespie, 2016).

The algorithm of the platform has formed a fixed tendency to prioritize pushing content that can trigger users’ anger, fear, or indignation (Vosoughi, Aral, 2018). This is not a technical accidental failure, but a deliberately designed feature—negative emotions trigger stronger participation, algorithms detect reactions, promote similar content, and over time form a cyclical “anger economy” (Wardle, Derakhshan, 2017). The platform clearly will not abandon topics like transgender groups that are highly popular and prone to conflict and opposition. The ultimate result is that the platform profits from the high traffic brought by hate speech, and the cost of harm is passed on to the victims and reviewers.

The operation mode of this so-called anger economy is no longer a niche phenomenon hidden in the corner, but a deeply integrated underlying rule of the platform’s content recommendation mechanism. When algorithms continue to push adversarial and aggressive speech to more users, hate expressions that were originally only in a niche range will continue to enter the public eye and gain a spread that is disproportionate to their actual impact. What is even more alarming is that platforms often exhibit a tendency towards “negative governance” when dealing with hate speech, such as narrowing identification standards and reducing proactive reviews. These behaviors happen to form an implicit coordination with algorithmic mechanisms: on one hand, they rely on intensifying emotions to gain traffic, while on the other hand, they use loose rules to avoid responsibility. In the end, hatred became an indispensable driving force in its commercial operation.

The integrity report released by Meta in 2025 shows that its handling of hate content has dropped to its lowest level since 2018, while the “law enforcement error rate” has decreased by 50%, achieved through reducing proactive reviews. From this, we can also see that when regulatory pressure conflicts with commercial interests, the latter always wins. Abandoning hate speech means giving up profits, and platforms are institutions designed for profit. Naturally, platforms will not make choices that harm their own interests, so “rationalization” is harmful.

Reporting is not the end

We do not deny that transgender issues are being discussed, but we also cannot accept the unfairness when the discussion reaches targeted harm. The reporting mechanism creates a false sense of participation in governance among users, but in reality, it only includes users in the existing rule system of the platform. To break the dilemma of ‘you report, it protects’, we can no longer rely solely on submitting evidence, nor can we trust that platforms will judge fairly on their own.

True accountability should shift from “rule self verification” to “structural accountability”. No longer just asking where a certain content violates community rules, but asking the platform how to define the violation of hate speech; Doubting the systemic consequences of this definition in practical operation. Of course, platforms have the ability to define the boundary between public issues and hate speech. When discussions turn into targeted dehumanizing attacks that cause profound harm, can this still be considered a “debate on issues”?

This transformation also challenges the platform’s attention centered business model. Perhaps this business model cannot be changed for a long time, and since it is difficult for platforms to proactively tighten boundaries, external institutional constraints (such as government supervision) become particularly important.

Poplars and oaks are brought down by a storm, it is not because they grew weaker but because the wind grew stronger(Nightingale, 2011).

We don’t want to see every report and effort to remove hate speech become a tacit approval of the platform’s existing rules.

Image from Adobe Stock:https://stock.adobe.com/au/search?k=stop+hate

References

Gillespie, T. (2018). Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. New Haven: Yale University Press.

Caplan, R. (2018). Content or Context Moderation? Data & Society Research Institute.

Flew, Terry (2021). Hate Speech and Online Abuse. In Regulating Platforms. Cambridge: Polity

Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism. New York: NYU Press.

Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: PublicAffairs.

Crawford, K., & Gillespie, T. (2016). What is a flag for? Social media reporting tools and the vocabulary of complaint. New Media & Society, 18(3), 410-428.

Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151.

Wardle, C., & Derakhshan, H. (2017). Information Disorder: Toward an interdisciplinary framework for research and policy making. Council of Europe Report.

Nightingale, V. (Ed.). (2011). The handbook of media audiences (pp. 75-76). Wiley-Blackwell.

ARIN6902

ARIN6902: Digital Policy and Governance

You Report, It Protects

The Trap of Proceduralism: Are Oversight Board Truly Independent?

Has the platform narrowed the boundaries of hatred and made speech more free?

Can Community Notes keep up with the spread of hate speech?

What is the core fact?

Reporting is not the end

References

Be the first to comment

Leave a Reply Cancel reply

The Trap of Proceduralism: Are Oversight Board Truly Independent?

Has the platform narrowed the boundaries of hatred and made speech more free?

Can Community Notes keep up with the spread of hate speech?

What is the core fact?

Reporting is not the end

References

Related Articles

Fixing the Door While Ignoring the Room: Rethinking Digital Platform Regulation

When Hate Goes Viral: How Manipur Exposes the Dark Side of Digital Platforms

When Sports Controversy Becomes Digital Harm: Platform Amplification, Identity Conflict and Online Abuse

Be the first to comment

Leave a Reply Cancel reply