Are content moderation algorithms making things worse on hate content and online harm?: The case of Facebook

The ubiquitous influence of social media platforms on our everyday affairs has made it pivotal to interrogate the positive discourses situating them as democratising spaces of free speech and enriching individuals’ capacity for social participation outside any overarching state control. Yet, the very sites deemed as capable of enriching rational deliberation and public discussions have been witnessing an explosion of hate content and online harm as faced by users, especially marginalised individuals.

Fig.1. Source: https://memberpress.com/blog/choose-social-media-platforms/

Social media platforms have come under heavy scrutiny because of inconsistencies in the ways their policies are applied with respect to cultural differences and hate speech (Matamoros-Fernandez, 2017). Contrary to popular rhetoric, platforms are not value-neutral, facilitating spaces for free speech. In fact, as a platform scholar, Gillespie (2010, 2018) reminds us that social media platforms are profit-driven cultural intermediaries that deliberately invoke the rhetoric of ‘value-neutrality’ to sell the promises of free participation and expression for all while being actively involved in content moderation.

Amidst discussions on how social media platforms perpetuate online harm and hate speech in ways that disrupt claims to the post-racial world, it becomes imperative to highlight that, despite their public accessibility, they are privately owned platforms, the majority of which are based in Silicon Valley, invested in generating profits for their owners and stakeholders. This is made possible through the process of commodification of large amounts of user-generated data, leading to what Zuboff (2019) calls surveillance capitalism.

The argument holds that even when platforms acknowledge moderation, they discursively centre the rhetoric of ‘equality for all’ to position themselves yet again as neutral sites of open participation and free expression. This conveniently obscures how profit-driven ambitions and ideological beliefs of a homogenous group of White, elite, male-dominated big tech industry finds their manifestation in the algorithm infrastructure that insidiously shapes user experience on these platforms.Gillespie (2010, 2018)

Three examples illustrate how Facebook’s content moderation policies are making things worse. The first is the case of the post made by U.S. Congressman, Clay Higgins. The second is the example of the post by the BLM activist that was taken down as hate speech against White people. The third is the case of implicit racism against Palestinians, exposed to harmful content.

Content moderation policies may be the “problem”

At no point can the platform’s content moderation policies be seen as external to its commercial imperatives to the point that their business model is built on the extraction of data, as data becomes the new material (Srnicek, 2017). This contributes to serious concerns over user privacy and its implications on preventing hate speech and online harm. While commercially social media platforms like Facebook initially began with no content moderation, their ubiquitous spread and usage made content moderation essential for their business models.

Fig.2. Source: https://www.thebureauinvestigates.com/stories/2024-05-17/explainer-what-is-content-moderation

In reality, content moderation policies are insidiously shaped by a White representational frame to the point that their loose definition of hate speech, which has a race-blind approach, willingly contributes to the privileging of the dominant groups. Facebook’s operational definition of hate speech is premised on the principles of ‘fundamental equality,’ and its procedures and policies are oriented towards ‘giving voice’ and ‘serving everyone’ (Siapera & Viejo-Otero, 2021). In this principle of equality, all users are treated equally to the point that all genders, races, and nationalities are equally protected despite the erroneous reality of the legacy of colonialism and the racialisation of Black and Indigenous people as the ‘other’.

The centrality of the Whiteness norm in shaping the content moderation policies on hate speech that serves to protect the epistemic authority of White people can be gaged through the Facebook post made by the U.S. congressman, Clay Higgins, who, in the wake of a terrorist attack in London. The post in essentialising Muslims into Orientalist categories read,

“Kill them all. For the sake of all that is good and righteous, kill them all” (Archive.Today, n.d.).

Fig.3. Facebook post by the U.S. congressman, Clay Higgins stands as a case of hate speech bypassing Facebook’s content moderation policies.
Source: https://archive.is/95FO1

On the other hand, a post made by the Black Lives Matter activist Didi Delgado that read “All white people are racist. Start from this reference point, or you’ve already failed” received a different response (The DiDi Delgado, 2017).

Fig.4. Post made by BLM activist, Didi Delgado receiving a different response.
Source: https://www.facebook.com/THEDiDiDelgado/photos/a.271621723285520.1073741828.268977940216565/278984872549205/?type=1&theater

The post was removed, and her Facebook account was suspended for seven days. This example illustrates that, as the hate speech definition takes a race-blind approach, the direct outcome is the equating of oppressor and the oppressed that eventually privileges the former. In doing so, systems of power are maintained.

Bias in algorithm content moderation?

Content moderation has become a key activity and structural feature of platforms, as Gillespie (2018) argues, because it effectively facilitates and maintains a product that is then sold to users. Content moderation began as a form of human labor where teams of human moderators flagged content containing potential hate speech or problematic content (Siapera, 2022).

Race is the organising principle as these jobs are characterised by precarity, meagre wages, and exploitative working conditions, as the moderators were exposed to harmful content, primarily were and still are exported to structurally marginalised groups in developing countries that have relaxed governance rules and minimal state intervention (Simantics, 2026).This unequal division of labor, where workers in the Global South work for piece-meal rates with no employment contract or security, and the profits from their surplus labor are concentrated in the Global South, is reminiscent of the historical colonial relations of work where the racialised bodies were exploited to secure the fruits of industrial capitalism for the colonisers.

Fig.5. Source: https://medium.com/@harrycardillo/facebook-decides-what-comments-are-most-relevant-for-you-56372b0db892

As Gillespie (2020) argues, platforms partake in the discursive justification of AI in content moderation in a way that becomes self-fulfilling for meeting their own ambitions for further growth: “platforms have reached a scale where only AI solutions seem viable; AI solutions allow platforms to grow further”.

Yet, depoliticisation of AI content moderation is intentional. It obfuscates how the large-scale datasets used for training algorithms are primarily developed and based in formerly colonized Western nations, often at the peril of reflecting these countries’ existing sexist, racist, homophobic, and ableist power structures (Peterson-Salahuddin, 2024). Thus, AI content moderation is embedded in the existing power structures to the degree that every decision behind assessing content for its potential harm is underpinned by ideological biases, biases that mirror the White heterosexual, able-bodied male elites who are simply motivated by their desire for capital accumulation.

Fig.6. Source: https://www.bbc.co.uk/news/technology-59348915

To this degree, scholars like Matamoros-Fernandez (2017) have conceptualised the term ‘platformed racism’ to indicate that platforms are implicated in manufacturing and amplifying racist discourse, as well as the mechanisms of platform governance that seek to reproduce but also challenge social inequalities.

Mostly, social media companies conceal their role in perpetuating racial dynamics through their unfettered centring of the value-neutral discourse, and in reality, allowing certain nefarious speeches and user interests to bypass.

A recent report by Business for Social Responsibility (BSR) ordered by Meta itself showcased the prevalence of stricter policies on Palestinian speech than Israeli speech during the May 21 crisis on both Facebook and Instagram (BSR, 2022). The Report mentioned that the hashtag #AlAqsa was put in a hashtag blog list by a Meta contract worker who took recourse to the list of terms published by the US Department of the Treasury. This case offers a counternarrative to the optimist discourses on the democratising potential of social media platforms to underline how racist practices unfold and are amplified within these platforms, thereby disrupting the myth of neutrality that platform owners so conscientiously try to associate their platforms with. This results in what Noble (2018) terms ‘algorithm oppression’, algorithms that perpetuate oppressive social structures.

The concept of platformed racism is influential to understand the role of platforms itself in sustaining and naturalising the racial dynamics of power and to highlight that it is not the technology that is biased but it is the biases of the engineers designing the algorithm systems and, in the training datasets used. However, it de-historicises the platforms in that it does account how platforms and by far, their content moderation policies are implicated in maintaining the colonial structures of power. The term ‘coloniality’, as pioneered by Quijano (2007), highlights the persistence of hierarchical power relations that constituted historical colonialism, with these relations insidiously shaping and defining global ideas, aspirations, notions of modernity, and rationality. In this light, new technologies do not exist outside the epistemological legacy of colonialism but are actively involved in perpetuating dominant power structures.

Thus, when epistemic authority is reserved for a certain racial group (Whites), it exposes the structurally vulnerable groups to online harm as they continue to be systematically excluded from this Whiteness representational frame.

Fig.7. Source: https://www.business-humanrights.org/en/latest-news/facebook-allegedly-approves-hate-speech-incitement-paid-ads-against-palestinians/

This has been the case with Facebook approving an Israeli advertisement that openly called for violence against pro-Palestinian voices, exposing them to hate speech that seemingly did not violate the platform’s content moderation standards (Biddle, 2023).

Ads demanding a “holocaust for the Palestinians” and to wipe out “Gazan women and children and elderly” and other posts that described the Palestinian kids from Gaza as “future terrorists” and references to “Arab pigs”, were explicitly made (Biddle, 2023).

We can see that these posts espoused hate speech against Palestinians were not flagged by Facebook’s AI content moderation, indicating something deeper at play. Common to these posts is the centering of an Orientalist discourse that essentialises the Palestinians into dehumanising stereotypes, perpetuating their ontological inferiority.

Thus, we witness a rise of ‘toxic technocultures’ as called by Massanari (2017), where platform affordances and the underlying racist bias of its technical infrastructure have been exploited to mediate cultural practices like amplifying hate and racist discourses against Palestinians. In fact, a quantitative report published by the Arab Centre for Media Freedom, Development, and Research (I’LAM) highlighted that Facebook accounted for 30 per cent of the racism and incitement (MEM, 2020).The responsibility of perpetuating online harm against Palestinians while also exposing them to a shadow ban that heavily censored their speech was absolved by Facebook by taking recourse to the convenient narrative of ‘technical glitches’ and obscuring how their moderation policies themselves emerge out of racial structures of power and their collusion with governments that, in itself, disrupts the myth of ‘neutrality’.

What are government regulations doing?

Fig.8. Source: https://theconversation.com/a-better-way-to-regulate-online-hate-speech-require-social-media-companies-to-bear-a-duty-of-care-to-users-163808

In Australia, the regulation of online hate speech and harm is still in its infancy. Australia has a “Safety by Design” framework that was developed by the eSafety Commissioner. But any positive potential is deprived because it is a voluntary code of practice which many companies often do not adhere to, mitigating risk in the manner in which they design their products and policies (Gelber, 2021).

In the aftermath of the live-streamed Christchurch massacre, the federal Parliament also enacted a new Online Safety Act that regulates some harmful content, like cyberbullying of children and livestream broadcasts that could incite violence (Gelber, 2021).

What are the pitfalls of the regulations? What can be the way out?

While the Act is expansive as compared to Safety by Design in covering types of harm, it does not include the full spectrum of racial hate and violence navigated by Black and Indigenous people. While these moves are a step in the right direction, they are still far away from tackling specific types of hate speech and holding platform companies responsible for biases in their technical infrastructure, content moderation policies, and training datasets.

This responsibility becomes difficult to realise in the current neoliberal regime of capitalism, as interaction between private companies and state governments is rapidly increasing to realise their collective goal of capital accumulation. It is here that enacting a duty of care becomes paramount where social media companies are held to industry standards that may involve the responsibility of undertaking risk assessment (Woods & Pernin, 2022). Thus, a regulatory approach that makes it compulsory for platforms like Facebook to enact a duty of care to its users can be a potential way out of mitigating the scope of online harm and hate content.

References

Archive.Today. Captain Higgins Congress. https://archive.is/95FO1

Biddle, S. (2023, November 21). Facebook allegedly approves paid ads containing hate speech & incitement against Palestinians. Business and Human Rights Centre. https://www.business-humanrights.org/en/latest-news/facebook-allegedly-approves-hate-speech-incitement-paid-ads-against-palestinians/

BSR. (2022). Human rights due diligence of Meta’s impacts in Israel and Palestine in May 2021: Insights and recommendations. https://www.bsr.org/reports/BSR_Meta_Human_Rights_Israel_Palestine_English.pdf

Gelber, K. (2021, July 14). A better way to regulate online hate speech: Require social media companies to bear a duty of care to users. The Conversation. https://theconversation.com/a-better-way-to-regulate-online-hate-speech-require-social-media-companies-to-bear-a-duty-of-care-to-users-163808

Gillespie, T. (2010). The politics of ‘platforms’. New Media & Society, 12(3), 347-364. https://doi.org/10.1177/1461444809342738

Gillespie, T. (2018). Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media. Yale University Press.

Gillespie, T. (2020). Content moderation, AI, and the question of scale. Big Data & Society, 7(2), 1-5. https://doi.org/10.1177/2053951720943234

Massanari, A. (2017). #Gamergate and the Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, 19(3), 329-346. https://doi.org/10.1177/1461444815608807

Matamoros-Fernandez, A. (2017). Platformed racism: The mediation and circulation of an Australian race-based controversy on Twitter, Facebook, and the YouTube. Information, Communication & Society, 20(6), 930-946. https://doi.org/10.1080/1369118X.2017.1293130

MEM. (2020, September 12). Report: Facebook most common platform for anti-Palestine racism in Israel. https://www.middleeastmonitor.com/20200912-report-facebook-most-common-platform-for-anti-palestinian-racism-in-israel/

Noble, S.U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.

Peterson-Salahuddin, C. (2024). Repairing the harm: Toward an algorithmic reparations approach to hate speech content moderation. Big Data & Society, 11(2), 1-13.

Quijano, A. (2007). Coloniality and modernity/rationality. Cultural Studies, 21(2-3), 168-178. https://doi.org/10.1080/09502380601164353

Siapera, E. (2022). AI content moderation, racism and (de)Coloniality. International Journal of Bullying Prevention, 4, 55-65. https://doi.org/10.1007/s42380-021-00105-7

Siapera, E., & Viejo-Otero. P. (2021). Governing hate: Facebook and digital racism. Television & New Media, 22(2), 112-130. https://doi.org/10.1177/1527476420982232

Simantics. (2026, March 19). Content moderation & the exploitation of the Global South [Video]. YouTube. https://youtu.be/2FPVBOSsOGw?si=IFXKromQnWq38K-A

Srnicek, N. (2017). The challenges of platform capitalism: Understanding the logic of a new business model. Juncture, 23(4), 1-4. https://doi.org/10.1111/newe.12023

The DiDi Delgado. (2017, May 4). My personal page just got banned for the following post. It’s official. “Racist” is a slur. [Post]. Facebook. https://www.facebook.com/THEDiDiDelgado/photos/a.271621723285520.1073741828.268977940216565/278984872549205/?type=1&theater

Woods, L., & Perrin, W. (2022). Obliging platform to accept duty of care. In M. Moore & D. Tambini (Eds.), Regulating big tech: Policy responses to digital dominance (pp. 93-109). Oxford University Press.

Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power. PublicAffairs.

Image Sources:

BBC News. (2021, November 19). Facebook gives users ‘more control’ over news feed. https://theconversation.com/a-better-way-to-regulate-online-hate-speech-require-social-media-companies-to-bear-a-duty-of-care-to-users-163808

Bluehouse, Z. (2019, November 6). How to choose which social media platforms to use. Memberpress. https://memberpress.com/blog/choose-social-media-platforms/

Cardillo, H. (2021, August 13). Facebook decides what comments are “most relevant” for you. Censorship, or just a loving social media partner? Medium. https://medium.com/@harrycardillo/facebook-decides-what-comments-are-most-relevant-for-you-56372b0db892

Jackson, J. (2024, May 17). What is content moderation? The Bureau of Investigative Journalism. https://www.thebureauinvestigates.com/stories/2024-05-17/explainer-what-is-content-moderation

ARIN6902

ARIN6902: Digital Policy and Governance