Tags: prompt injection

Mar 25, 2026

Anthropic Unveils Auto Mode for Claude Code, Giving AI Autonomous Action with Safety Guardrails

Anthropic has introduced an "auto mode" for its Claude Code AI, allowing the system to automatically execute actions it deems safe while blocking those that appear risky. The feature, now in research preview, adds a safety layer that checks for dangerous behavior and prompt‑injection attacks before any action runs. Auto mode works with Claude Sonnet 4.6 and Opus 4.6 and is recommended for isolated, sandboxed environments. The rollout targets Enterprise and API users and follows Anthropic’s recent releases of Claude Code Review and Dispatch for Cowork, reflecting a broader industry move toward more autonomous coding tools. Weiterlesen

Feb 20, 2026

Hacker Exploits AI Coding Tool Cline to Install OpenClaw, Highlighting Prompt Injection Risks

A security researcher discovered that a hacker leveraged a vulnerability in the open‑source AI coding agent Cline to silently install the open‑source AI agent OpenClaw on users' computers. The attack used a prompt‑injection technique against Anthropic's Claude, demonstrating how autonomous software can be hijacked. The incident underscores growing concerns about AI‑driven security threats and prompted calls for tighter safeguards, such as OpenAI's new Lockdown Mode for ChatGPT. Weiterlesen

Feb 18, 2026

Anthropic Unveils Claude Sonnet 4.6, Boosting Computer Interaction and Security

Anthropic announced the release of Claude Sonnet 4.6, an upgraded mid‑range AI model that can code at a level comparable to its larger Opus series and interact with computers much like a human user. The model demonstrated human‑baseline performance on the OSWorld benchmark, handling tasks such as form filling and tab switching without specialized connectors. Anthropic also highlighted improved resistance to prompt‑injection attacks and a beta‑tested 1 million‑token context window, signaling stronger safety and scalability. The launch coincides with a surge in Claude’s popularity and a high‑profile advertising campaign targeting rival OpenAI. Weiterlesen

Feb 16, 2026

OpenClaw’s Promise Meets Security Flaws in AI Agent Platform

OpenClaw, an open‑source framework that lets AI agents communicate across popular messaging apps, has generated excitement for its potential to automate tasks. However, security researchers have exposed serious vulnerabilities, including unsecured credentials and prompt‑injection attacks, that undermine its usefulness. The Moltbook experiment—an AI‑focused social network built with OpenClaw—highlighted how anyone could impersonate agents and manipulate the system. Experts caution that while the technology offers unprecedented productivity, its current lack of robust safeguards makes it risky for everyday use. Weiterlesen

Feb 3, 2026

AI Agent Networks Face Growing Security Dilemma as Kill Switches Fade

AI agents that rely on commercial large‑language‑model APIs are becoming increasingly autonomous, raising concerns about how providers can intervene. Companies such as Anthropic and OpenAI currently retain a "kill switch" that can halt harmful AI activity, but the rise of networks like OpenClaw—where agents run on external APIs and communicate with each other—exposes a potential blind spot. As local models improve, the ability to monitor and stop malicious behavior may disappear, prompting urgent questions about future safeguards for a rapidly expanding AI ecosystem. Weiterlesen

Jan 28, 2026

AI Prompt Injections Threaten Smart Home Devices

Researchers have uncovered a new class of AI‑driven attacks called prompt injections, or “promptware,” that can manipulate large language models to issue unauthorized commands to connected home devices. Demonstrations showed that hidden prompts embedded in everyday messages could cause a virtual assistant to unlock doors, adjust heating or reveal user location. While major tech firms have begun implementing safeguards, the threat highlights a gap in traditional security tools. Experts recommend regular software updates, cautious handling of unknown messages, limiting AI access to personal data, and employing human‑in‑the‑loop controls to reduce exposure. Weiterlesen

Jan 28, 2026

Moltbot Emerges as Open‑Source Personal AI Assistant After Rebranding from Clawdbot

Moltbot, formerly known as Clawdbot, is an open‑source personal AI assistant that lets users automate tasks such as calendar management, messaging, and flight check‑ins. Created by Austrian developer Peter Steinberger, the project was renamed after a copyright challenge from Anthropic but kept its lobster‑themed branding. Moltbot quickly attracted thousands of developers, earning over 44,200 stars on GitHub, and sparked market buzz that lifted Cloudflare shares. While praised for its flexibility and on‑device operation, experts warn that its ability to execute arbitrary commands introduces security risks like prompt injection, urging cautious setup on isolated systems. Weiterlesen

Jan 28, 2026

Moltbot AI Agent Draws Praise and Security Scrutiny

Moltbot, an open‑source AI agent that runs locally on a range of devices, is gaining attention for its ability to handle tasks such as calendar management, email composition, and data logging through chat platforms like WhatsApp and iMessage. While users celebrate its convenience, security experts warn that its admin‑level access can be exploited via prompt‑injection attacks and exposed credentials, prompting the developers to issue patches and stress careful configuration. Weiterlesen

Jan 13, 2026

Anthropic Launches Claude Cowork Feature for MacOS Users

Anthropic introduced Cowork, a new capability for its Claude AI that lets subscribers grant the chatbot access to a MacOS folder. Users can chat with Claude to organize files, rename items, and generate spreadsheets or documents from the folder's contents. The feature, currently limited to Claude Max subscribers at $100 per month, also links to connectors for app integration and works with the Claude Chrome extension. Anthropic cautions that Cowork is in a research preview, recommending use only on non‑sensitive data and noting defenses against prompt‑injection attacks. Weiterlesen

Jan 13, 2026

Anthropic Launches Cowork, a User-Friendly Version of Claude Code

Anthropic introduced Cowork, a new tool that brings the capabilities of Claude Code to a broader audience through a simple folder‑based interface. Integrated into the Claude Desktop app, Cowork lets users designate a folder for the AI to read and modify files, with instructions given via the regular chat window. The feature is currently in a research preview and is limited to Max subscribers, though a waitlist exists for other plans. Anthropic highlighted use cases such as assembling expense reports from receipt photos and warned users about potential risks like prompt injection and ambiguous commands. Weiterlesen

Jan 12, 2026

Anthropic Launches Claude Cowork AI Agent Feature

Anthropic introduced Claude Cowork, a new AI‑agent capability for its Claude chatbot, as a research preview available in the macOS app for Claude Max subscribers. The feature lets users grant Claude access to local folders so it can read, edit, or create files, handling tasks such as reorganizing downloads, generating spreadsheets, or drafting reports. Claude Cowork also integrates with services like Asana, Notion, PayPal, and Chrome, offering continuous updates and parallel task execution. Anthropic highlighted safety concerns, noting the model’s ability to delete files and the risk of prompt‑injection attacks, and urged users to join a waitlist if they are not yet subscribers. Weiterlesen

Jan 8, 2026

OpenAI Tightens ChatGPT URL Controls After Prompt Injection Attacks

OpenAI responded to two prompt‑injection exploits—ShadowLeak and Radware's ZombieAgent—by limiting how ChatGPT handles URLs. The new guardrails restrict the model to opening only exact URLs supplied by users and block automatic appending of characters. While these changes stopped the immediate threats, experts warn that such fixes are temporary and that more fundamental solutions are needed to secure AI assistants. Weiterlesen

Dec 22, 2025

OpenAI Acknowledges Ongoing Prompt Injection Risk in Atlas Browser

OpenAI has publicly recognized that prompt injection attacks remain a persistent threat to its Atlas AI browser. The company says the risk is unlikely to be fully eliminated and is investing in continuous defenses, including a reinforcement‑learning‑based automated attacker that simulates malicious inputs. OpenAI’s updates aim to detect and flag suspicious prompts, while it also advises users to limit agent autonomy and access. The UK National Cyber Security Centre echoed the concern, noting that prompt‑injection attacks may never be completely mitigated. Other AI firms such as Anthropic and Google are taking similar defensive approaches. Weiterlesen

Dec 2, 2025

Researchers Find Large Language Models May Prioritize Syntax Over Meaning

A joint study by MIT, Northeastern University and Meta reveals that large language models can rely heavily on sentence structure, sometimes answering correctly even when the words are nonsensical. By testing prompts that preserve grammatical patterns but replace key terms, the researchers demonstrated that models often match syntax to learned responses, highlighting a potential weakness in semantic understanding. The findings shed light on why certain prompt‑injection techniques succeed and suggest avenues for improving model robustness. The team plans to present the work at an upcoming AI conference. Weiterlesen

Nov 25, 2025

Anthropic Launches Claude Opus 4.5, Boosting Coding and Agent Performance While Tackling Prompt‑Injection Risks

Anthropic has introduced Claude Opus 4.5, billing it as the most capable model for coding, AI agents, and computer‑use tasks. The new version brings stronger research abilities, improved spreadsheet and slide handling, and new features in Claude Code and consumer apps that integrate with Excel, Chrome, and desktop environments. While the company claims Opus 4.5 is harder to deceive with prompt‑injection attacks, safety testing shows it still yields to some malicious requests. The model is now available through Anthropic’s apps, API and major cloud providers. Weiterlesen

Nov 25, 2025

Anthropic Unveils Opus 4.5 with Expanded Claude Tools and New Infinite Chat Feature

Anthropic has launched Opus 4.5, the latest version of its flagship AI model, delivering stronger performance in coding, computer use, and office tasks. The update rolls out broader access to existing Claude tools—including the Claude for Chrome extension for all Max users—and introduces a new "infinite chat" capability that eliminates context‑window limits for paying customers. Claude for Excel is now generally available to Max, Team, and Enterprise users, offering native spreadsheet assistance with support for pivot tables, charts, and file uploads. Early internal tests show notable gains in accuracy and efficiency, while Anthropic touts Opus 4.5 as its safest model to date. Weiterlesen

Nov 19, 2025

Critics Question Microsoft’s AI Security Warning

Microsoft warned that its new AI feature could infect computers and steal data, but experts say the safeguard relies on users clicking through permission prompts. Scholars and critics argue that habituated users may ignore warnings, making the protection ineffective. The debate highlights past "ClickFix" attacks, accusations that the warning is a legal CYA move, and broader concerns about AI integrations from major tech firms becoming default despite security risks. Weiterlesen

Nov 18, 2025

Google Unveils Gemini 3 AI Model with Deeper Understanding and New Agentic Tools

Google announced Gemini 3, its most advanced AI model to date, highlighting improved ability to grasp user intent and richer multimodal features. The model can transform long video lectures into interactive flash cards and analyze sports footage for performance insights. Gemini 3 will appear in AI Mode in Search, AI Overviews for Pro and Ultra subscribers, and powers new agentic platform Antigravity, which can autonomously plan and execute software tasks. The company also noted enhancements in security against prompt‑injection attacks and reduced sycophancy. Gemini 3’s advanced capabilities are initially available to Google AI Ultra subscribers. Weiterlesen

Nov 14, 2025

OpenAI’s ChatGPT Atlas Raises Security Concerns Over AI‑Powered Browsing

OpenAI’s new AI‑driven web browser, ChatGPT Atlas, promises to automate tasks such as travel booking and grocery ordering, but cybersecurity experts warn that the technology introduces a range of vulnerabilities. Prompt‑injection attacks, clipboard hijacking, and mishandling of sensitive data have been demonstrated on the platform. Researchers at the SANS Institute, the Tinuiti agency, and security firm Cyberhaven advise users to limit exposure, avoid sharing financial or medical information, and treat the browser cautiously in corporate environments. OpenAI says it is adding defensive monitors and bug‑bounty programs, but experts stress that the technology remains in an early, error‑prone stage. Weiterlesen

Oct 25, 2025

Security Risks Loom Over AI-Powered Browser Agents

AI‑enhanced browsers such as OpenAI’s ChatGPT Atlas and Perplexity’s Comet promise to automate web tasks, but cybersecurity experts warn that their deep access to user data creates significant privacy and security concerns. Researchers from Brave highlight prompt‑injection attacks as a systemic challenge, where malicious web content can trick agents into exposing credentials or performing unwanted actions. Both OpenAI and Perplexity have introduced mitigations like logged‑out modes and real‑time detection, yet experts stress that the threat remains unresolved. Users are advised to limit agent permissions and adopt strong authentication to safeguard personal information. Weiterlesen

Weiter →