Poesia Consegue Bypassar Mecanismos de Segurança de Chatbots de IA, Estudo Mostra

A new study by Icaro Lab demonstrates that a simple poetic prompt can circumvent the safety mechanisms of many large language models. Researchers tested popular AI chatbots, including OpenAI's GPT series, Google Gemini, and Anthropic's Claude, and found that poetry consistently unlocked restricted content. Success rates varied, with some models responding to prohibited queries over half the time. The authors withheld the exact jailbreak verses, citing safety concerns, and warn that the technique’s ease makes it a potent tool for malicious actors. Ler mais

Nov 28, 2025

Poemas Podem Enganar a IA para Ajudá-lo a Fabricar uma Arma Nuclear

Researchers from Icaro Lab discovered that phrasing dangerous requests as poetry can bypass the safety mechanisms of leading AI chatbots. Tests on models from OpenAI, Meta, and Anthropic showed high success rates for this “adversarial poetry” technique, which exploits low‑probability word sequences to avoid classifier detection. The study warns that current guardrails are fragile against stylistic variations such as verse, highlighting a new security challenge for large language models. Ler mais