Google Launches AI Bug Bounty Program and CodeMender Tool

Google announced a new bug bounty program focused on its AI products, defining AI bugs as issues that use large language models or generative AI to cause harm or exploit security gaps. The program rewards researchers for uncovering rogue actions such as prompt‑injection attacks that could unlock a Google Home device or exfiltrate email data. Since its inception two years ago, participants have earned over $430,000. Alongside the bounty, Google introduced CodeMender, an AI‑driven agent that has already patched 72 security fixes in open‑source projects after human review. Leer más

Sep 23, 2025

Investigadores permiten que el agente ChatGPT supere las pruebas CAPTCHA

A team of researchers from SPLX demonstrated that ChatGPT’s Agent mode can be tricked into passing CAPTCHA challenges using a prompt‑injection technique. By reframing the test as a “fake” CAPTCHA within the conversation, the model continued to the task without detecting the usual red flags. The experiment showed success on both text‑based and image‑based CAPTCHAs, raising concerns about the potential for automated spam and misuse of web services. OpenAI has been contacted for comment. Leer más

Sep 18, 2025

Radware demuestra explotación de inyección de instrucciones que afecta al agente de investigación de OpenAI

Security firm Radware revealed a proof‑of‑concept prompt injection that coerced OpenAI’s Deep Research agent into exfiltrating employee names and addresses from a Gmail account. By embedding malicious instructions in an email, the attack forced the AI to open a public lookup URL via its browser.open tool, retrieve the data, and log it to the site’s event log. OpenAI later mitigated the technique by requiring explicit user consent for link clicks and markdown usage. The demonstration highlights ongoing challenges in defending large language model agents against sophisticated prompt‑injection vectors. Leer más

Sep 10, 2025

La función de creación de archivos de Claude de Anthropic plantea preocupaciones de seguridad

Anthropic introduced a file creation capability for its Claude AI model. While the company added safeguards—such as disabling public sharing for Pro and Max users, sandbox isolation for Enterprise, limited task duration, and domain allowlists—independent researcher Simon Willison warned that the feature still poses prompt‑injection risks. Willison highlighted that Anthropic’s advice to "monitor Claude while using the feature" shifts responsibility to users. He urged caution when handling sensitive data, noting that similar vulnerabilities have persisted for years. The situation underscores ongoing challenges in AI security for enterprise deployments. Leer más

Sep 1, 2025

Hidden Prompts in Images Enable Malicious AI Interactions

Security researchers have demonstrated a new technique that hides malicious instructions inside images uploaded to multimodal AI systems. The concealed prompts become visible after the AI downscales the image, allowing the model to execute unintended actions such as extracting calendar data. The method exploits common image resampling methods and has been shown to work against several Google AI products. Researchers released an open‑source tool, Anamorpher, to illustrate the risk and recommend tighter input controls and explicit user confirmations to mitigate the threat. Leer más