Can you 100% rely on social media platforms to moderate online harm and hate speech?–challenges and the way forward

The power of internet platforms has grown over people in recent years. 4 billion people utilise the internet worldwide. The social media platform has integrated seamlessly into people’s daily lives in the context of platform-based social trends (Saurwein & Spencer,2021). It serves as the primary venue for people to interact, engage, and exercise their right to free expression (Sinpeng et al., 2021).

Social media has significantly increased information mobility and human interaction. However, there is also the issue that people have varying levels of tolerance for individual differences in the interaction process. Intolerant attitudes and words can breed hatred, which can result in online harm and hate speech (Flew, 2021).

What is Online Harm? Which forms of online harm do you know?

Online harm includes the following types: cyberbullying, adult cyber abuse , image-based abuse, illegal and restricted online content. Cyberbullying has the widest range, such as sending mean texts, hacking, spreading secrets and rumours, or impersonating someone online. The “Nth Room”case is a typical crime that combines both types of online harm. It is a criminal case in South Korea between 2018 and 2020 that involves extortion, cybersex trafficking, and the dissemination of pornographic videos over the Telegram app. It was recently streamed on Netflix in the form of a documentary called “cyber hell”

(Source from: Netflix)

The major issue is: Social media has become a megaphone for amplifying the dangers of online hate speech (United Nations, n.d.). In the United States, 41% of internet users reported experiencing online harassment in 2017. 75% of users claimed to be impacted by online abuse on social media, with the most common reasons being gender, appearance, race, politics, religious beliefs, and sexual orientation (Flew, 2021).

In addition, social media may play a greater and direct role in hate crimes. An example of this was the murder of Labour MP Jo Cox by far-right extremist Thomas Mair in 2016. The case came after Thomas had posted hostile comments about Cox on social media in the run-up to the UK’s EU referendum.

So, what is hate speech? Have you ever received hate speech on social media?

Hate speech can truly harm others immediately and persist for a long time, unlike when you use crude language to discuss family issues. It can be defined as inciting people to stigmatise specific groups or persons by inflammatory discourse, and in more severe circumstances, it will result in physical violence (Flew, 2021). People who are most vulnerable to hate speech are often the disadvantaged or marginalised groups in society, such as women, black people, and the homosexual group.

(Source from: Human Rights Watch)

Moderation is the core of the digital platform, which plays an equally critical role as data(Gillespie,2018). Therefore, platforms should take primary responsibility for moderating online harm and hate speech.

Then, how does the platform moderate content?

Each platform has its own standards. Like Facebook has a set of community standards that outlines what types of content are not allowed. Facebook’s approach to moderating hate speech is a combination of human review and Content Moderation System (CMS). Facebook also have the content regulation ecosystem.

It involves three areas: Public and Content Policy, Global Operations, and Engineering and Product. These teams inform each other by working cross-functionally. In order to fully understand regional circumstances, there are numerous collaborations with users of platforms for voluntary labour as well as with firms that provide outsourced moderation services, experts from other nations and market specialists, and trusted partner organisations (Sinpeng et al., 2021).

TikTok also refreshed its community rules on 21 March this year in order to provide greater security and transparency for the platform’s services and to help users better understand the community rules.

Despite the standards, review structures and advanced technology, platforms often fail in moderating hate speech. According to the survey, 79% of people interviewed were dissatisfied with the work of social media platforms in addressing online harm and moderating hate speech (Flew, 2021).

So, what led to the failure? And, what challenges do platforms have while moderating content?

The first challenge: how to balance public interest and platform interest?

The content we see on social media is already filtered by the platform’s algorithms, that is, the content the platform wants us to see. The algorithm provided by the platform is a commercial one rather than just a single content algorithm. This is because what it wants to do is to attract maximum attention for maximum benefit.

For example, a case of Youtube’s Adpocalyse

This profit-driven platform algorithmic architecture could potentially see a discarded video content as “valuable and profitable” again, just because it could be played with ads to increase revenue for the platform (Kumar, 2019).This has led users to query whether social media platforms can take responsibility for properly moderating content and being accountable for general public perception.

The second challenge: how to balance the use of automation or human to make the moderation accurate?

Effectively capturing hate speech requires analysis that relies on contextual associations. It requires a thorough understanding of local knowledge, historical context to identify. Sometimes, assessing specific damage level stereotypes also necessitates communicating with the affected group. Platform classifiers and community standards cannot be relied upon to identify these exclusively (Sinpeng et al., 2021).

Like “ruthless Indian savages”, this text does meet the conditions of most platforms’ definitions of hate speech. However, the text is in fact quoted from the Declaration of Independence. When assessed in the historical context of this text, the result may not be defined as hate speech. This implies that user motivation and context are crucial factors in the recognition of hate speech.

In this light, page moderators should play a crucial role in regulating hate speech. These keepers are also required to hold higher levels of intellectual competence. However, at present the majority of regulators are made up of volunteers and low-paid people. For instance, Facebook’s page moderators are composed of 10 million underpaid Indian and Filipino workers, most of whom lack relevant community management expertise (Sinpeng et al., 2021). It is reasonable for users to suspect the accuracy and professionalism of the content audit.

In addition, the language diversity of the detection system is the basis for ensuring the accuracy of content moderation. Especially on multinational platforms like Facebook and Instagram, which have global users.

An example of this is the Facebook Asia Pacific region content audit. The Asia Pacific region is a highly diverse region in terms of cultural, linguistic and religious backgrounds. People in this region use more than 2,300 languages. Yet Facebook’s current language detector can only monitor 40 languages (Sinpeng et al., 2021). This Language gaps result in automated platform detection systems often failing to accurately capture hate speech.

The third challenge: how to safeguard transparency, fairness and human rights in the process of moderation?

In recent years, the questioning of how platforms make content moderation decisions has been broadening. Even more scholars have spoken out about calls for greater transparency in content moderation on social media platforms being so prevalent that they go virtually unanswered (Gillespie, 2018).

The first controversy: users are generally confused about what exact content and behaviour will receive sanctions from the platform.
Even if some content can be easily identified as offending, users are often incapable to understand the platform’s moderation decisions due to lack of access to the relevant rules.

Typically, the content producer will only receive “Your post contains malicious or misleading content” and no additional information will be given to explain anything beyond that.

The second controversy: the platform was unable to provide adequate justification for its arbitration outcomes. Almost few platforms could provide unreserved information on how they implement community guidelines, especially with the influence of various stakeholders (Dragiewicz et al., 2018).

The third controversy: biased algorithms may threaten human rights.The modern content moderation system is an extremely complex affair, in which the moderation process includes different contributors, automated agents, and platform itself. This gives users reason to be concerned that the moderation system may be affected by other powerful external stakeholders, such as legislation enforcement authorities or other government officers.

Users may be concerned that they could be a target of government or other corporate organisations seeking to suppress their freedom of expression due to conflicting interests. For example, one user from Turkey, who regularly posts about political topics on Facebook, expressed concern that his account could be suspended at any time due to organised activity by the Turkish government (Suzor et al., 2018).

The way forward

Firstly, improving transparency in content moderation is one of the most vital steps towards the future of the platform.

The Santa Clara Principles require the platform to be more transparent in two ways: firstly, it is required to provide individual notification of specific decision-making; secondly, it is necessary to regularly publish aggregated information on previous content moderation.

The Santa Clara Principles call for platforms to inform users of what specific content is offending and which specific rules are being violated. In addition, platforms are required to disclose the details of how this content was detected, According to the Santa Clara Principles, platforms have to notify users of any offensive content and particular guidelines that are being violated. Platforms must also reveal the specifics of how this content was discovered, including whether it was the result of automatic monitoring or a human decision and who made it (Santa Clara Principles, n.d.).

Therefore, an effective moderation procedure should make sure that users are informed of the banned content, the moderation procedure, and those who are in charge of it, coupled with a thorough URL or pertinent community extract, so they can grasp the restriction in detail and immediately (Suzor et al., 2018).

AI plays a significant role in content moderation, which makes it essential to increase the transparency of its work. Helping users to better understand the role of AI in content review work is an effective way.

Platforms are required to notify users in advance of what specific content will be filtered as offending content directly by automated moderation. The platform should also regularly release statistics about the AI’s automated moderation practices, including the details of the content processed by the AI and the quantity of content filtered (Santa Clara Principles, n.d.).

Besides automated system processing, the content moderation process also involves a large number of human decision making actions. When it comes to human involvement in the review process, we are most curious about what role they play in the process and what their cultural background is. Do they have the expertise to review? Do they have good moral character?

Therefore, the platform should disclose the specific training courses attended by all auditors prior to their induction. It is also necessary to notarise the process and regulations for recruiting auditors. In order to demonstrate that they are competent to make unbiased and accurate decisions (not only within the platform but also for the third party outsourcing company).

Secondly, platforms should ensure that the process of content moderating upholds fairness and human rights.

Platforms should provide users an opportunity to appeal in a truly meaningful way. Enabling users with the ability to appeal any removed content or suspended accounts to the platform at any time (Santa Clara Principles, n.d.).Furthermore, introducing external scrutiny is an effective way to safeguard the fairness of the review process. Establish an external review authority that is structurally independent of the platform. It is beneficial to identify particular problems in the review process.

Finally, how to improve the accuracy of content moderation on multinational platforms is a challenge that needs to be overcome in the future.

The most serious problem is how to improve the system’s understanding of the specific context, the historical background and the different levels of acceptability of speech.

The aggregation of specific decisions is not sufficient to improve the overall system understanding of the review complexity. New types of institutes may need to be established in the future for a deeper understanding of the complexity of the review process (Suzor et al., 2018). Continued support and improved research on the differences between global and local diversity of moderation systems.

# hate speech #online harm #moderation #challenges #social media


Dragiewicz, M., Burgess, J., Matamoros-Fernandez, A., Salter, M. W., Suzor, N., Woodlock, D., & Harris, B. (2018). Technology facilitated coercive control: domestic violence and the competing roles of digital media platforms. Feminist Media Studies, 18(4), 609–625.

Flew, T. (2021). Regulating platforms. Polity, pp. 91-96.

Gillespie, T. (2018). Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press.

Kumar, S. (2019). The algorithmic dance: YouTube’s Adpocalypse and the gatekeeping of cultural content on digital platforms. Internet Policy Review, 8(2).

Saurwein, F., & Spencer, C. (2021). Automated Trouble: The Role of Algorithmic Selection in Harms on Social Media Platforms. Media and Communication, 9(4), 222–233.

Sinpeng, A., Martin, F. R., Gelber, K., & Shields, K. (2021). Facebook: Regulating       Hate Speech in the Asia Pacific. Department of Media and Communications, The University of Sydney.

Suzor, N. P., West, S. M., Quodling, A., & York, J. (2019). What Do We Mean       When We Talk About Transparency? Toward Meaningful Transparency in Commercial Content Moderation. International Journal of Communication (Online), 1526–.

Santa Clara Principles. (n.d.). Santa Clara Principles.

United Nations. (n.d.). Say #NoToHate – The impacts of hate speech and actions you can take | United Nations.

Be the first to comment

Leave a Reply