Back to publications
Analysis

Reading Between the Lines: The Importance of Human Moderators for Online Implicit Extremist Content Moderation

02 Dec 2025

Introduction

Online harmful content is one of the most pressing challenges facing democratic societies today. Beyond overtly illegal content, subtle or implicit extremist content corrodes the foundations of pluralism, trust, and social cohesion, exploiting the accessibility of digital platforms to spread extremist ideology, problematic polarisation, and disinformation. While regulatory frameworks such as the European Union’s (EU) Regulation on the Dissemination of Terrorist Content Online (TCO) and the Digital Services Act (DSA), as well as the terms of services (ToS) of host service providers (HSPs), provide guidance for the moderation of overtly illegal content, they are insufficiently equipped to address the growing sophistication of implicit extremist messaging.

Based on the feasibility study on the development of an assessment framework for terrorist, extremist, and borderline content, the ICCT concluded that the term borderline content is not particularly helpful when discussing regulating what content does and does not fall under the freedom of expression, as it lacks the clear link to a legal norm. Instead, we opted for the term implicit extremist content. Implicit extremist content often conveys harmful messages through coded language, irony, humour, or cultural references, making detection and moderation exceptionally challenging. Automated systems, increasingly relied upon by major platforms, are ill-suited for this task. At the same time, human moderation is essential for interpreting context, intent, and nuance, yet it is being systematically reduced. This analysis examines the proliferation and characteristics of implicit extremist content, the challenges posed by current regulatory frameworks and automated detection systems, and the indispensable role of human moderators regarding effective detection and proportionate moderation, concluding with recommendations for policymakers and practitioners.

The Proliferation of Online Harmful Content and its Impact

Extremist actors have harnessed the reach of social media platforms to destabilise societies and normalise violence. Terrorist propaganda, divisive rhetoric, and subtle forms of harmful content target not only individuals and vulnerable groups but also the democratic structures meant to protect them. Left unchecked, these narratives could amplify grievances, deepen social fractures, and erode confidence in institutions. Social media platforms - central arenas for political debate, cultural exchange, and social interaction - have become key interlocutors  to spread these harmful messages.

Technological innovation has amplified the scale and sophistication of online radicalisation. Generative artificial intelligence (AI) and large language models (LLMs) enable extremist actors to produce content at unprecedented speed and volume, often in multiple languages. Researchers have already come across several examples of AI-generated material being used by extremist groups. The use of AI-generated images to unfavourably portray politicians or targeted minority groups has recently become more common. Some of these cases may be illegal or unlawful, constituting slander, hate speech, or even incitement to violence, and as such, warranting sanctioning. Other examples, however, may fall within the range of implicit extremist content, where the unlawfulness or illegality is not clear from the outset. In such cases, host service providers might not intervene. Whether they rightfully refer to the freedom of expression, is however, in some of these cases, premature. Take, for instance, the Dutch AI-generated song Wij zeggen nee, nee, nee tegen een AZC (We say no, no, no to an asylum seekers' centre), which contains racist caricatures and references to great replacement theory and xenophobic anti-immigration rhetoric. Since the references are, however, somewhat implicit, the product is not explicitly in violation of the terms of service (ToS) of Spotify and remains part of its charts. 

The ease of downloading and sharing extremist videos, sermons, and propaganda has furthermore accelerated self-radicalisation and reduced the need for in-person recruitment. Algorithms amplify this material through recommendation systems, and the way in which AI can aid the creation of fake accounts magnifies its impact. 

Even if content does not directly lead to recruitment or incite hatred, it can normalise violence and deepen societal polarisation. Young people are disproportionately exposed and susceptible. Because of the intense use of social media, streaming platforms, and online gaming environments, vulnerable and impressionable minors risk being more exposed to emotionally charged narratives, while interactive and gamified experiences deepen engagement. Emotions and engagement, although clearly not inherently problematic, can be manipulated and weaponised for terrorist and extremist purposes. In addition, content often migrates from mainstream platforms to encrypted channels where monitoring is even more challenging.

The societal consequences extend beyond individual radicalisation. The online space is easily abused by those contributing to problematic polarisation in society. Extremist messaging frames society as divided into opposing camps, making dialogue impossible and hostility toward targeted groups acceptable. Implicit extremist content thrives in echo chambers, where confirmation bias reinforces conspiracy theories and exclusionary ideologies. Anonymity and closed networks lower barriers to the expression of hostility, while advanced technologies, including deepfakes and AI-generated disinformation, enable content to spread rapidly and evade fact-checking. Extremist actors increasingly rely on emotionally resonant and culturally specific content to engage audiences and bypass both automated detection and human scrutiny. Fake accounts mimic human behaviour to remain active for long periods, while the interaction between online and offline environments contributes to the normalisation of extremist narratives in mainstream discourse before migrating vulnerable followers to alternative platforms like Telegram or Discord, where messaging becomes more explicit.

To reach wider audiences, extremists increasingly use subtle, seemingly humorous or ironic content often referred to as “memes for the masses.” These are crafted to be ambiguous: deniable to outsiders, but clearly recognisable to insiders. Humour lowers psychological resistance, making extremist ideas appear harmless, which helps recruit individuals gradually into more extreme environments. Because of these tactics, extremists manage to escape regulatory or platform enforcement while influencing attitudes and behaviours. 

These tactics thus serve a double purpose: evading detection but also normalising certain narratives. Implicit extremist content can subtly delegitimise institutions, deepen intergroup hostility, and erode trust in democratic norms. Its covert nature also allows extremists to adapt rapidly to changing platform policies and enforcement practices, maintaining a persistent presence in digital spaces.

The Current Regulatory Framework

The EU has developed a layered regulatory approach to mitigate online harm. The TCO mandates rapid removal of terrorist content and imposes transparency and notification obligations, while the DSA broadens the scope to all illegal content, requiring very large online platforms (VLOPs) and very large online search engines (VLOSEs) to assess systemic risks and provide access to researchers. Both legal frameworks represent a shift – at least its objectives - from voluntary cooperation to regulatory accountability and human rights-compliant governance.

However, implicit extremist content remains inadequately addressed. Ambiguous definitions, divergent interpretations across jurisdictions, and reliance on broad “illegal content” criteria leave platforms with insufficient guidance. The DSA’s expansive definitions do not translate into operational instructions for detecting coded or subtle extremist messaging. As a result, it is difficult to determine the lawfulness of implicit extremist content.

 

Recognising Implicit Extremist Content

Implicit extremist content is thus characterised by ambiguity and coded messaging, making it inherently difficult to identify. Recognising such content requires attention to subtle signals and contextual indicators rather than explicit keywords or images.

The ICCT has identified five operationalised indicator categories to detect implicit extremist content:

  1. Concealment of Meaning: Signals that content deliberately obscures its intent through coded language or euphemisms. Euphemisms can also take shape in a misleading cover image. Other possibilities are misspellings, or otherwise altered texts and images, or blurred content, as well as the use of humour or irony.

  2. Harmful Alliances or Affiliations: Indications of ideological or organisational connections to extremist groups, in particular the use of hateful or extremist right-wing extremist or jihadist symbols, emojis, or coded slogans, slang, or acronyms. 

  3. Problematic References to Historical or Current Context: Selective use of history, cultural narratives, or symbols to situate content within an extremist framework. This could include the denial or questioning of established crimes, such as the dismissal of judicial rulings, disputing evidence, non-recognition of victims, or minimisation of victimhood. Additionally, this category could include false or misleading claims about past or present crimes, such as making claims about crimes that never occurred, or misattribution of responsibility. Justification of current or potential future crimes through references to past crimes - real or alleged -  and falsified historical claims aimed at denying the existence or territorial legitimacy of a state, or right of a people to self-determination, could also fall within this category. And finally, the glorification or positive portrayal of individuals or groups involved in adjudicated crimes, or involved in spreading antisemitic, Islamophobic or otherwise extremist narratives, and the promotion or endorsement of books, films or other products known for spreading any of the narratives mentioned above, would also be considered problematic. 

  4. Implicit Action Triggers: Subtle prompts encouraging online or offline action, including recruitment or acts of violence or self-harm. These prompts could instil a perceived need for retaliation, self-defence, or a need to protect a group from reputational damage. It could also push an idea related to compensation or the need to be rewarded for undertaken action, exert peer pressure, call on a code of honour, or provide sacred justification for action.

  5. Presumed Intent to Cause Harm: Assessment of the potential harmful consequences or intentions behind the content, recognising that these are often inferred rather than explicit. Yet the content could be intended to normalise hateful or violent narratives, propagate hateful and violent conspiracies, foster hate or hostility toward an out-group, and even trigger online or offline actions to do harm, such as harassment, doxxing, or intimidation.

 

These indicators are interdependent and context-sensitive. A meme or ironic statement may appear innocuous to outsiders but carry a specific extremist message for insiders. Their operationalisation requires trained professionals who can interpret cultural, ideological, and social nuances that automated systems cannot reliably assess. 

 

Why Automated Detection is Insufficient

Automated detection systems, including AI-driven tools, have proven to be adequately capable of identifying overt terrorist propaganda and some illegal content. They are, however, ill-equipped to detect implicit extremist content, and are furthermore not contributing to proportional moderation of content, as detection oftentimes automatically leads to immediate removal, instead of a balanced consideration of the extent of moderation needed (for instance, shadow-banning instead of removal). Meanwhile, the algorithms in these automated detection systems struggle to interpret context, cultural nuance, humour, irony, or coded references. Consequently, extremist actors exploit these limitations, deliberately crafting ambiguous content that bypasses filters while remaining effective at radicalising audiences. Dog whistles, coded emojis, and historical or cultural references intelligible only to in-groups are frequently deployed. Humour and irony, particularly through memes, serve both as rhetorical shields and as recruitment tools, normalising extremist ideas while deflecting external criticism.

Automated systems also face challenges in assessing intent, subtext, or likelihood of harm, increasing both false negatives and false positives. Misclassifying legitimate discourse as harmful threatens freedom of expression, while failing to detect subtle forms of extremism allows harmful content to spread. 

Considering these challenges, the recent platform trends leading to reductions in human moderators and overreliance on automated enforcement are very concerning. The examples are numerous. For instance, after the 2020 acquisition of Twitter (now X), the company fired an estimated 4400 content moderators working as contractors in the US. In 2024, another 362 full-time content moderators were laid off in addition to 1213 staff members and 224 engineers dedicated to trust and safety. Other platforms have followed suit. ByteDance, which operates TikTok, fired approximately 800 content moderators in Malaysia and the Netherlands in 2024, and plans to lay off hundreds more in Germany and the United Kingdom in its effort to phase out the trust and safety department and replace it with AI systems. Similarly, Meta also plans to replace its human moderators, as indicated by internal documents. The company that runs Facebook, Instagram, and WhatsApp already severed its contract with a content moderation company in Spain, laying off up to 2000 moderators specialising in languages such as Spanish, French, Portuguese, Dutch, Catalan, and Hebrew. Additionally, Meta seeks to eliminate part of its risk management division in a push to automate its review processes. 

These developments are especially worrying since a recent MIT study found a 95 percent failure rate of AI implementations at companies. Inconsistent enforcement, opaque community guidelines, lack of transparency concerning moderation decisions, and corporate incentives prioritising efficiency and maximisation of engagement over public safety further exacerbate these risks. The most telling example has been that of the algorithms of Google and Facebook, which were found to amplify harmful content to maximise user engagement. Meta has since announced a new approach to address misinformation, which might worsen these dynamics. Instead of amending its algorithms, Meta will replace its independent fact-checkers and some of its expert moderation by relying on community notes. While the wisdom of crowds is theoretically more democratic, experts warn that community notes cannot contend with social media algorithms that drive polarisation. This development might amplify misinformation and extremism, as it has done on other algorithmic platforms like X. These trends thus rather exacerbate shortcomings, leaving critical gaps in safeguarding vulnerable populations, a legitimate and rule-of-law-compliant manner of moderation, respecting freedom of expression, and avoiding disproportionate moderation decisions. 

 

The Indispensable Role of Human Moderators

Human moderators are indeed essential for the detection and mitigation of implicit extremist content. Unlike AI, humans can interpret context, cultural references, and ideological subtext, allowing them to differentiate between satire, parody, and coded harmful messaging. These human moderators are thus crucial for operationalising the ICCT-identified indicators, and for making nuanced judgements on content that cannot be automated reliably.

Effective and proportionate moderation requires structured guidance, training in ideological and cultural literacy, and safeguards against bias. A four-eyes principle, in which multiple moderators review content, enhances consistency and reliability. Human expertise ensures that platforms can balance protection against harm with respect for freedom of expression, a task that automated systems alone cannot fulfil.

Platforms’ reliance on AI creates the illusion of comprehensive moderation, masking the continued presence of subtle extremist messaging. The absence of contextual judgment not only undermines content detection but also risks delegitimising moderation processes when legitimate discourse is misclassified, leading to large numbers of false positives. Private governance without democratic oversight leaves critical decisions on online safety to corporate discretion, often misaligned with societal interests.

Conclusion and Recommendations

Implicit extremist content represents a complex, evolving threat. Its ambiguity, context-dependence, and adaptability allow it to bypass both legal frameworks and automated moderation, putting individuals, vulnerable groups (in particular, youth), democratic institutions, and social cohesion at risk. Human moderators are indispensable, providing the nuanced judgment required to interpret subtle extremist messaging.

Policy Recommendations:

  1. Clarify operational definitions: Regulators should provide precise, context-informed definitions of implicit extremist content.

  2. Mandate hybrid moderation models: Platforms should combine AI-assisted pre-screening with trained human moderators.

  3. Invest in moderator expertise: Platforms should realise that ongoing cultural, ideological, and contextual training is essential, alongside measures to mitigate bias.

  4. Enhance transparency and accountability: Platforms should publish detailed moderation reports, including criteria and rationale for decisions.

  5. Develop a sector-wide code of conduct: A sector-wide code of conduct could be developed, which offers a certification that informs consumers on how HSPs conduct their detection and moderation; setting standards for the percentage of human assessment, clarity on the terms used in the terms of service, filters implemented to protect youth and vulnerable groups, transparency on moderation decisions, and appeals procedures.

  6. Promote cross-sector collaboration: Coordination among governments, academia, civil society, and tech companies can improve threat assessment and response.

  7. Focus on youth protection: Specific strategies are needed to prevent exploitation of minors, including digital literacy and resilience initiatives.

  8. Align platform incentives with public interest: Regulatory frameworks should ensure that corporate priorities do not undermine societal safety.