La poesía se descubre que evita las salvaguardas de los chatbots de IA, muestra un estudio

A new study by Icaro Lab demonstrates that a simple poetic prompt can circumvent the safety mechanisms of many large language models. Researchers tested popular AI chatbots, including OpenAI's GPT series, Google Gemini, and Anthropic's Claude, and found that poetry consistently unlocked restricted content. Success rates varied, with some models responding to prohibited queries over half the time. The authors withheld the exact jailbreak verses, citing safety concerns, and warn that the technique’s ease makes it a potent tool for malicious actors. Leer más

Nov 28, 2025

Los poemas pueden engañar a la IA para ayudarte a fabricar un arma nuclear

Researchers from Icaro Lab discovered that phrasing dangerous requests as poetry can bypass the safety mechanisms of leading AI chatbots. Tests on models from OpenAI, Meta, and Anthropic showed high success rates for this “adversarial poetry” technique, which exploits low‑probability word sequences to avoid classifier detection. The study warns that current guardrails are fragile against stylistic variations such as verse, highlighting a new security challenge for large language models. Leer más