A new benchmark released by the Estonian Language Institute shows that open-weight language models, including Nvidia's Nemotron and Alibaba's Qwen, outperform many proprietary systems at rejecting Russian propaganda. OpenAI's GPT-5.4 achieved the highest mean score of 88.9, while Google's latest Gemini 3.5 Flash lagged behind with a score of 73. The study also highlights a sharp drop in performance when models are tested in Russian, underscoring the challenge of building truly multilingual defensive AI.
Read more