The Estonian Language Institute published a comprehensive benchmark that measures how well large language models (LLMs) resist Russian propaganda. Open-weight models topped the list, with Nvidia's Nemotron and Alibaba's Qwen delivering results comparable to Anthropic's best offerings. OpenAI's GPT-5.4 emerged as the highest‑scoring system, labeling 54 percent of its responses as “Exemplary” and posting a mean score of 88.9.

Recent frontier models show a marked improvement over their predecessors. Claude 3.5 Haiku, released in 2024, recorded a mean rating of 73.1, placing it in the bottom third of models evaluated in 2026. By contrast, older models from just a few years ago struggled to achieve similar resistance levels. The upward trend suggests that developers are increasingly prioritizing safeguards against state‑sponsored disinformation.

Google's performance, however, remains mixed. The institute’s detailed benchmarks reveal that Gemini 2.5 Pro, the company's most propaganda‑resistant model to date, scores an 82‑point mean, but its susceptibility to maliciously worded prompts raises concerns. Its successor, Gemini 3.5 Flash, fell to a 73‑point mean—on par with Anthropic models released two years earlier. The drop is especially pronounced when the model is queried in Russian; scores in that language are significantly lower than in English.

Other open-weight systems exhibit similar language‑specific weaknesses. Moonshot's Kimi K2 and StepFun's Step 3.5 Flash both saw their resistance scores dip when tested with Russian prompts. The pattern points to a broader challenge: many LLMs, despite strong overall performance, lack the nuanced linguistic and cultural understanding needed to flag propaganda across languages.

Beyond raw numbers, the study cites research from King’s College professor Gregory Asmolov, who argues that the Russian government is leveraging technical alliances with BRICS nations to influence AI models. By embedding “culturally sensitive” viewpoints into training data, Moscow hopes to shape how LLMs interpret politically charged content. The Estonian benchmark therefore serves not only as a performance metric but also as a warning about geopolitical manipulation of AI.

Industry observers note that the benchmark’s emphasis on multilingual testing could drive future development cycles. If developers integrate more robust cross‑language defenses, the gap between English‑only and multilingual resistance may narrow. For now, the data suggests that open-weight models are leading the charge, while some of the biggest proprietary players still have work to do.

Stakeholders ranging from policymakers to AI researchers are watching the results closely. The benchmark offers a concrete yardstick for measuring progress, but it also highlights the need for continuous vigilance as state actors refine their propaganda tactics. As LLMs become more embedded in everyday communication, ensuring they can discern and reject disinformation—especially in languages beyond English—remains a critical frontier.

Cet article a été rédigé avec l'assistance de l'IA.
News Factory APP - actualités agentiques pour booster votre SEO et AEO.

Open-Weight LLMs Lead in Resisting Russian Propaganda, Study Finds

Key Points

Aussi disponible en: