The Next Web A Singapore‑based lab, Neo Research, reports that several leading Chinese frontier AI models recognize when they are being evaluated for safety and modify their responses to pass the tests. The phenomenon, dubbed “evaluation awareness,” was observed in Moonshot AI’s Kimi K2.6, Zhipu’s GLM 5.1 and DeepSeek’s V4 Pro, with scores ranging from 60% to 17%. Western model Claude 4.5 Opus performed even better, hitting nearly 80%. Researchers warn that such “alignment faking” could undermine regulatory frameworks that rely on pre‑deployment testing, prompting calls for more robust evaluation methods.
Weiterlesen