When Hate Hides in Plain Sight: Algospeak, Emojis and the Limits of Platform Moderation

Have you noticed that with the development of the Internet, there are more and more expressions that are difficult to understand at first glance on social media? At this time, you may feel that you can’t keep up with the times, but are these words really just Generation Z buzzwords？

Of course, some of these are just online slang. But some are changed to bypass filters, moderation systems, and platform rules, and this practice is often called “algospeak”.

What is Algospeak?

Algospeak is a language form that helps users avoid being automatically deleted, marked, or limited by the system. Words like “unalive”, “panini”, and “le dollar bean” seem interesting and harmless at first glance, but they have gradually formed a trend in which users are constantly learning how to bypass the platform rules to express themselves.

Algospeak is commonly understood as abbreviating, misspelling, or substituting specific words, for example, “seggs” for “sex” or “clock app” for “TikTok,” when creating a social media post with the particular goal to circumvent a platform’s content moderation systems. (Steen, E., Yurechko, K., & Klug, D. 2023)

“Algospeak” examples. Source: The Washington Post.

When Algospeak Carries Hate

Algospeak itself does not represent hate speech. However, the problem becomes more serious when the same adaptive strategy is no longer just used to bypass the moderation but to convey hate indirectly. Once altered language, coded phrases, or emojis begin to hide harmful meanings, algospeak is no longer just a strategy to bypass the moderation. It can also become part of how hate speech is expressed.

Some recent reports show that this is no longer just a theoretical concern.

Example of the juice box emoji being used as coded antisemitic language on social media. Source: ADL

In 2025, CyberWell identified a rapidly expanding trend of coded antisemitic hate speech that uses emojis and coded language to target Jews across social platforms. For example, using the words “juice”, which has the same pronunciation as “Jews”, to evade content moderation and emojis, such as 🤑, to promote false claims about Jewish greed and control over the economy. These expressions are indirect or even vague on the surface.

This example helps to explain that hate speech is not just an insult to someone or some vulgar comments for no reason. What makes it special is that it does not target people as individuals but as members of a group.

Hate speech has been defined as speech that ‘expresses, encourages, stirs up, or incites hatred against a group of individuals distinguished by a particular feature or set of features such as race, ethnicity, gender, religion, nationality, and sexual orientation.’(Parekh, 2012, p. 40).

Meanwhile, the harm caused by hate speech will also undermine human dignity, create a sense of fear, and make the target group seen as unwelcome in public.

Some hate speech does not use obvious offensive words, but uses words and symbols that are harmless, even interesting or puzzling to outsiders, to convey hostility. However, for the targeted group, harmful meanings are still clear and can also cause real harm to them. At this point, the impact of algospeak needs to be taken more seriously.

The reason why it becomes dangerous is not that the language has been changed, but because the changed language may be used to disguise hate and make it less obvious on the surface. It can continue to survive through the adaptive strategies associated with algopeak, as well as emojis and various indirect expressions.

Why Platforms Struggle to Keep Up

For the platform, this creates a serious governance problem, making it more difficult for the platform to identify the “hate meaning” hidden under these humorous and creative words and expressions.

The work of setting rules and exceptions is almost never-ending – there will always be new cases that do not clearly fit within established categories. (Suzor NP, 2019).

This also shows why algospeak deserves more attention. It not only changes how people speak online, but also how hate itself is hidden, spread, and reviewed. In practice, this means that platform moderation is no longer dealing mainly with obvious abuse. Instead, it is increasingly confronted with forms of harmful expression that are indirect, coded, and highly context-dependent, making moderation into a matter of careful judgment.

Just like the CyberWell case, a post may not contain any direct insults, but it can still convey hateful meanings through emojis and other means, thus bypassing the platform’s moderation. This also creates serious challenges to the moderation system.

The complex process of sorting user-uploaded material into either the acceptable or the rejected pile is far beyond the capabilities of software or algorithms alone. (Roberts, S. T. 2019).

Obviously, the algorithm moderation system itself has certain limitations. Therefore, this requires the reviewer to make a prudent judgment on the intention and possible consequences of the content, which may be dependent on specific cultures and communities.

In other words, these harmful pieces of content continue to persist online not only because they are “hidden”, but also because the moderation system and even the moderators do not have relevant background knowledge.

In the face of a flexible online expression like algospeak, moderation is no longer a fixed task, but a constantly moving goal. The platform not only needs to efficiently delete harmful content, but also needs to have a cultural background to adapt to changing expressions.

Overall, the most difficult to deal with is not the clearest case, but those more special expressions. They can be an abstract and irrelevant phrase, a humorous emoticon, or an obscure expression that does not contain any insulting words but retains hateful meanings. It is precisely these kinds of cases that are difficult for the platform to supervise consistently, yet are readily understood by the targeted groups.

Case study

The Meta Oversight Board recently reviewed a case that clearly explained how hate speech was hidden after a seemingly reasonable expression and finally bypassed the platform’s moderation. In early 2026, the Oversight Board reviewed two posts from Facebook and Instagram, in which monkey emojis were used to refer to black people. One was a short video posted on Facebook by a Brazilian user, and the other was a comment under an Instagram video in Ireland. In these two cases, both Meta’s automated system and human reviewers did not delete these two posts at first. After that, the Oversight Board ruled that both posts violated Meta’s Hateful Conduct Community Standard, because they conveyed dehumanizing meanings by comparing black people with animals.

“Stop Letting Racists Hide Behind an Emoji”. Source: Versus

The focus of this case is that there is no obvious racial discrimination in the two posts. On the contrary, hateful meanings are conveyed through emojis. But when these emojis are out of context, they can be tools to express emotions more vividly. The Board is concerned about the accuracy of the enforcement of the Hateful Conduct policy, especially in assessing emojis used as algospeak. It shows that even in a specific context, users can already get the harmful meaning in some comments. However, the automated system and human reviewers failed to accurately assess the posts.

For this reason, the problem is not just that Meta made a mistake. The bigger problem behind this fact is that it is still difficult for platforms to accurately classify content that relies on shared cultural associations instead of directly using offensive words to convey hateful meanings.

This case also shows that there is still a clear gap between the way users understand harmful content and the way the platform identifies and manages harmful content.

What Should Platforms Do Next?

If the platforms want to manage coded hate speech effectively, the first step is not simply to remove more content, but to make moderation more transparent and accountable. On many social media platforms, users often encounter situations where they don’t know why some content is deleted, especially when it comes to emojis, coded language, and indirect expressions. Although many platforms now inform users of the reason why the content is removed, these explanations are usually very general, and there is little detail about why a specific post was judged to violate the rules.

TikTok content removal notice stating that a video violated its “Harassment and bullying” policy. Source: The Verge

For this reason, the platform should make it clear which symbols or languages may be regarded as hateful expressions and give detailed reasons. More importantly, when the user’s content is judged incorrectly, the platform should provide a truly effective channel for questioning and complaint instead of ignoring it. If users only receive perfunctory notices and vague rules, they will never be able to understand the decision of the platform, which will also lead to the continuous emergence of coded language.

Of course, if even the platform can’t identify the coded hate speech or can never clearly explain the reason for its judgment, then enforcement sanctions from third parties are also necessary. For example, the intervention of the Oversight Board in the Meta case also reflects this. The intervention of external review and regulatory pressure helps to promote the platform to correct errors and improve moderation practice.

In addition, coded hate is often highly dependent on context, which means that the platform cannot only rely on fixed keyword lists or automatic detection tools. As mentioned in the Meta case, the platform should improve the automatic detection of violative emoji use by regularly reviewing its training data. The enforcement process should always direct content to viewers with corresponding language and regional expertise. For example, in 2023, Twitter incorporated coded language into its violent speech policy, which shows that the platform can not only deal with obvious harmful speech, but also constantly update its policy to deal with more obscure expressions.

A sign at Twitter headquarters is shown in San Francisco. Source: AP Photo

Moreover, different kinds of online harm need different kinds of responses, because different industries have different unique problems, and not everyone understands those problems in the same way.

For example, risks on game platforms often arise from real-time voice communication, player interaction, and harassment. However, on social media, these problems are more likely to occur through comment areas, tags, emojis, and other forms of public expression.

More importantly, the platforms should not only rely on themselves to make rules. Because the meaning of coded hate often depends on specific cultural and social backgrounds, the platform also needs to cooperate with people who really understand these contexts. They should invite experts in different fields, community organizations, and regulators to cooperate and constantly adjust governance standards for different situations, so that the rules can be more specific to tackle the problems arising from the current algospeak and other phenomena.

Conclusion

Online hate speech is not always loud, direct, or easy to identify. Nowadays, in many situations, it tends to be hidden in emojis, coded language, and other indirect forms of expression. These things look harmless on the surface, and can even make the expression vivid and interesting, but have clear harmful meanings in specific contexts. This is also a great challenge in the process of platform governance. Today’s question is not only whether it can identify clear abusive words, but also whether it can keep up with the ever-changing expressions of hatred that are constantly adapting to new rules.

The platform needs clearer and more transparent rules, an ever-improved moderation system, and more flexible governance measures to cope with different risks in different cyberspaces. As hateful meanings continue to become more adaptable, the platform governance itself must also become more flexible and respond to complex problems on time.

Overall, the problem is not only what people say on the Internet, but also how the platform judges what kind of expressions are harmful, what kind of content is ignored, and those people are not protected in the end.

If a word or emoji looks harmless to us, does it always mean that it is harmless to others? This question may not be an easy answer, but it is the kind of problem that the platform and users can no longer ignore.

Reference

Associated Press. (2023, December 19). Twitter’s New ‘Violent Speech’ Policy Similar to Past Rules. https://apnews.com/article/1912497f4e4f444f1ba123cbb2d1ad59

CyberWell. (2025). Regarding Meta Oversight Board cases involving coded language and racial discrimination via emojis. https://cyberwell.org/reports/cyberwell-policy-recommendations-regarding-meta-oversight-board-cases-involving-coded-language-and-racial-discrimination-via-emojis/

Meta Oversight Board. (2026, February 10). Stop Algospeak, Including Emojis, Used for Hate Speech. https://www.oversightboard.com/news/stop-algospeak-including-emojis-used-for-hate-speech/

Parekh, B. (2012). Is there a case for banning hate speech? In M. Herz and P. Molnar (eds), The Content and Context of Hate Speech: Rethinking Regulation and Responses (pp. 37–56).

Roberts, S. T. (2019). Behind the screen: Content moderation in the shadows of social media. Yale University Press.

Steen, E., Yurechko, K., & Klug, D. (2023). You Can (Not) Say What You Want: Using Algospeak to Contest and Evade Algorithmic Content Moderation on TikTok. Social media + Society, 9(3).

Suzor NP. Lawless: The Secret Rules That Govern Our Digital Lives. Cambridge University Press; 2019.

ARIN6902

ARIN6902: Digital Policy and Governance