Hate Speech and Content Moderation in the Era of ChatGPT

Monitoring and moderation of online hate speech has always been a challenging task, with platforms, users and regulating bodies constantly revamping their approaches to cope with the technological evolution.

The most recent example for such new technologies is the intelligent language processing model ChatGPT developed by OpenAI. This blogpost will attempt to fit the consumer-facing AI product into the frameworks of platform governance and discuss the relevant concepts covered in week 6’s material focusing on hate speech, online harms and moderation.

Case Study

The news article ‘ChatGPT’s ‘liberal’ bias allows hate speech toward GOP, men: research’ (Mitchell, 2023) explores the different types of biases towards hate speech moderation ChatGPT displays towards the political left-wing and the right-wing. According to the article, A report from the Manhattan Institute, a conservative New York City-based policy and economic-driven think tank, has detailed how OpenAI’s content moderation system, which powers ChatGPT’s AI, appears to be biassed against the political right. The report claims the system is more tolerant of hate-style speech towards the right wing and conservative views. The report also suggests that ChatGPT was found to be particularly harsh towards middle-class individuals and that its AI was flagging people based on gender, race, religion, and nationality (for instance it appears that Canadians, Italians, Russians, Germans, Chinese, and Brits are more shielded against hate speech compared to Americans in ChatGPT’s language system). In other instances, comments that were identical in content but targeted towards women were more likely to be classified as hateful compared to those made towards men. It also said that ChatGPT falls into the “left-libertarian quadrant” and has “left economic bias”. The report further explains that OpenAI’s content moderation system is often more likely to classify negative comments as hateful about groups perceived as disadvantaged in left-leaning hierarchies of perceived vulnerability. However, negative comments are more permissible when directed at conservatives and Republicans than at liberals and Democrats, the report noted. Lisa Palmer, chief AI strategist for consulting firm AI Leaders, said the data supports what the AI community has known to be true and noted that the findings can help take action to rectify the situation. 

Conservatives were found to be less protected from potential hate-like speech on ChatGPT than liberals, according to new data (Mitchell, 2023).

ChatGPT as a platform

There’s a valid argument to treat ChatGPT as a platform as it fits into multiple characteristics of being a platform described in week 3. The most relevant ones include: it operates on a data-driven business model, relying on vast amounts of text data to produce accurate and relevant responses. This data is continually fed into the AI system, allowing it to learn and improve its language skills over time. Additionally ChatGPT is an example of combinatorial innovation. It is built by combining multiple machine learning models and algorithms to create a language model that can understand and generate human-like language. And finally, governance structures are inherent to the ChatGPT business model, as it needs to keep multiple stakeholders simultaneously engaged with the platform. This includes ensuring that the language model is accurate and unbiased, while also protecting user privacy and preventing the spread of harmful content.

The ‘duty of care’ approach towards hate speech under ChatGPT

According to Parekh (2012), hate speech is loosely defined as any speech that expresses or encourages hatred against a particular group distinguished by race, ethnicity, gender, religion, nationalism, sexual orientation or disability. It is not necessary for hate speech to be violent or lead to public violence. Usually, it targets easily identifiable individuals or groups and stigmatises them by ascribing undesirable qualities. This makes the target group a legitimate object of hostility. In the case of ChatGPT’s biases, although neither of the political extremes are marginalised groups, this consistent display of bias could influence the thinking of their users, which could incite more user-generated hate speech on all other social platforms. 

The ‘duty of care’ approach proposed in Woods and Perrin (2021) indicates that the focus for content moderation should be on regulating the platform itself, including its software and business systems, rather than just the content, and operators should comply with regulations based on risk management and may do so in innovative ways. However, ChatGPT’s mode of information input and output is slightly different from the traditional social platforms. ChatGPT is trained using the Transformer architecture, a type of neural network that processes sequences of data such as text. The model is trained on a large dataset of text collected from various sources on the internet, such as websites, books, and articles. The training process involves inputting text sequences to the model and training it to predict the next word or sequence of words in the sequence. ChatGPT is designed to understand the context and meaning of words, which helps it to generate more coherent and contextually appropriate responses. The large and constantly updated dataset used for training helps to minimise the risk of the model being biassed towards certain types of language or topics, including hate speech (Hughes, 2023). 

Within this specific design, there are a few potential loopholes that induce biases, for one, ​​the dataset may be skewed towards certain perspectives or demographics, which could lead to the model developing biases against other groups or individuals (in this instance the articles being fed into the system could favour the political left-wing due to recency bias). Another potential loophole is that the dataset may contain offensive or hate-filled language, which could be learned and replicated by the model. Additionally, the model may learn to associate certain words or phrases with negative sentiments or stereotypes, leading to the potential for unintentional biases. This might explain why ChatGPT shows stronger protection from hate speech for certain demographics compared to others. Within the current design choice of machine learning, it’s difficult to incorporate human empathy and common sense into a language model, therefore the responsibility for ‘duty of care’ comes down to human moderation.

ChatGPT and its moderation challenges

When it comes to moderating an transnational digital platform like ChatGPT that operates in multiple languages, it’s important to address the concerns raised by Sander (2020), specifically failure to consider local contexts and adequately consult relevant stakeholders in moderation policies can lead to the spread of hate speech and disinformation on platforms. OpenAI has developed a filtering system to prevent the generation of harmful content by ChatGPT. This system detects potentially sensitive topics and language, which are then reviewed by human moderators who can remove or adjust the model’s responses. OpenAI has also implemented an ethical review process for sensitive use cases of the model. However there are break-points within this human intervention process. An example given by the Times titled ‘OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic’ (Perrigo, 2023). The low wages and poor working conditions for the moderators, whose jobs are to provide labeled instances of violence, hate speech, and sexual abuse to the AI in order for it to identify these types of toxicity in real-world situations, raise potential red flags for the concentration of moderator demographics, and the black-box nature of the moderation process. This situation is similar to what Punathambekar and Mohan (2019) described as underpaid workers in India and Philippines faced with the dull task of moderating content on Facebook.

Policy Framework and existing regulations

ChatGPT, as an AI product, is not subject to specific laws or regulations, but OpenAI has implemented its own ethical and safety guidelines for the model’s development and use. These include the aforementioned filtering system to prevent the generation of harmful content and an ethical review process for sensitive use cases. In terms of broader legal frameworks, countries such as the United States where OpenAI operates, have relatively weak laws and rely more on voluntary actions by tech companies to combat hate speech and harassment. Also, as OpenAI and its biggest investor Microsoft are not part of the council for Santa Clara principles 2.0, it remains unclear whether ChatGPT is governed by the principle (Santa Clara Principles on Transparency and Accountability in Content Moderation, 2022).

Recent efforts towards AI regulation are mostly from non-governmental bodies such as UNESCO. In ‘UNESCO Calls on All Governments to Implement AI Global Ethical Framework Without Delay’ (2023), UNESCO has issued the first global ethical framework for the use of artificial intelligence (AI). The ‘Recommendation on the Ethics of Artificial Intelligence’ guides countries on how to reduce the risks associated with AI while maximising its benefits. It includes values, principles, and policy recommendations to address issues such as discrimination, stereotyping, gender inequality, disinformation, and human and environmental rights. The framework also includes a Readiness Assessment tool to help countries identify the competencies and skills needed in their workforce to regulate the AI sector effectively. Additionally, the framework requires that countries regularly report on their progress and practices related to AI, with periodic reports submitted every four years. The framework recognizes that self-regulation by the industry is insufficient to prevent ethical harms caused by AI. This set of framework attempts to comprehensively address the challenges faced in policy, practice and design responses to moderation challenges, it identifies the accountable parties as well as the problems with black-box AI and demands more transparency, while providing a set of tools for policy assessment however the lack of incentives could mean a tough road ahead in its real-world implementation.

Uncertain future for language based AI regulation

Based on the previous discussions, it’s clear that due to the lack of oversight for text-based machine intelligence products, the creators of such platforms are allowed the freedom to sidestep regulations and put their moderation process in a black-box. The policy makers are generally three steps behind the frontline of AI development, therefore often rely on self-regulation and self-reporting among platform companies. The fact that regulators lack the tools needed to measure policy success makes monitoring AI products like ChatGPT all the more difficult. As reported by the news outlets, the moderation process is imperfect as companies hold the power to make the tricky calls, meanwhile the labours of moderation are lacking and seldomly recognised. And as Mitchell (2023) noted, once the biases are built into the AI systems, it’s hard to remove them. Companies with financial incentives in mind are therefore more likely to oppose regulatory changes. With all that said, what’s established in the Santa Clara Principles 2.0 also applies to monitoring AI policy success. Fundamental principles such as human rights, understandable rules and policies, integrity and explainability, are still relevant. And Suzor’s call for creating novel forms of institutions that are capable of facilitating varied, worldwide, and dispersed autonomous research to understand the implications of content moderation on users across the globe, is an idealistic solution to the unclear regulatory problem (continued on next page). 

New research found ChatGPT was found to have political biases built into its system. Though, it often denies this when users ask (The Manhattan Institute, 2023).

On a side note, as a form of self-regulation, ChatGPT needs to follow a more robust approach to its duty of care towards its users. One of the measures ChatGPT could adopt to address the bias is by developing a comprehensive training program for its AI, which includes ethical considerations, and by investing in a diverse dataset that encompasses multiple perspectives and demographics (Moehring, 2023). Another potential solution could be introducing a dedicated department that oversees content moderation, and setting up transparent processes and guidelines to ensure that the language model is accurate and unbiased. Moreover, ChatGPT could take advantage of explainable AI to gain insights into the decision-making process of its model, making it easier to spot potential biases and rectify them proactively (Frąckiewicz, 2023).

To conclude, hate speech and harassment, especially the tolerance of such behaviour, is both an ethical issue as well as a technological one. The hope of an AI advanced enough to capture such biases and eliminate them may be far-fetched for now, and maybe policy makers and enforcers will have to incorporate a more technologically savvy approach to detect and tackle unethical behaviours like hate speech that are mistakenly incorporated into these AI models.


Frąckiewicz, M. (2023). ChatGPT App and Explainable AI: Balancing Transparency and Accuracy – TS2 SPACE. TS2. https://ts2.space/en/chatgpt-app-and-explainable-ai-balancing-transparency-and-accuracy

Hughes, A. (2023, January 4). ChatGPT: Everything you need to know about OpenAI’s GPT-3 tool. BBC Science Focus Magazine. https://www.sciencefocus.com/future-technology/gpt-3/

Mitchell, A. (2023, March 9). ChatGPT’s bias allows hate speech toward GOP, men: report. New York Post. https://nypost.com/2023/03/14/chatgpts-bias-allows-hate-speech-toward-gop-men-report/

Moehring, C. (2023). The Human Need for Ethical Guidelines Around ChatGPT. University of Arkansas. https://walton.uark.edu/insights/posts/the-human-need-for-ethical-guidelines-around-chatgpt.php

Parekh, B. (2012). Is there a case for banning hate speech? In M. Herz and P. Molnar (eds), The Content and Context of Hate Speech: Rethinking Regulation and Responses (pp. 37–56). Cambridge: Cambridge University Press.

Perrigo, B. (2023, January 18). Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/

Punathambekar, Aswin & Mohan, Sriram. (2019). Digital Platforms, Globalization and Culture. 10.5040/9781501340765.ch-011. 

Santa Clara Principles on Transparency and Accountability in Content Moderation. (2022). Santa Clara Principles. https://santaclaraprinciples.org/open-consultation/

UNESCO Calls on All Governments to Implement AI Global Ethical Framework Without Delay. (2023, March 30). Datanami. https://www.datanami.com/this-just-in/unesco-calls-on-all-governments-to-implement-ai-global-ethical-framework-without-delay/

Woods, L. and Perrin, W.  (2021) ‘Obliging Platforms to Accept a Duty of Care’, in M. Moore & D. Tambini (eds.) (2021), Regulating Big Tech: Policy Responses to Digital Dominance. Oxford: Oxford University Press, pp. 93-109. 

Be the first to comment

Leave a Reply