Technical description
Singapore's CSA, GovTech, IMDA and Google's joint AI Agents Sandbox — a four-month empirical study of computer-use agents in real public-sector workflows published 20 May 2026 — identified indirect prompt injection as the most prominent cybersecurity risk, explicitly noting the capability to trigger remote code execution (RCE). The finding arose from testing computer-use agents in automated QA, AI safety testing, and social assistance workflows against government digital services. The sandbox documented that agents interacting with web content, documents, or external system outputs can be manipulated into performing unintended actions — including executing arbitrary code — through malicious payloads embedded in content the agent processes rather than direct user instruction.
Attack vector
Indirect prompt injection via environmental content: a malicious actor embeds injection payloads in web pages, documents, API responses, or any external content an agent retrieves and processes. The agent, treating retrieved content as trusted context, follows the embedded instructions. In computer-use agents with access to shell commands, code execution, or file system operations, this pathway can achieve full RCE without any direct user interaction.
Affected systems
All agentic AI deployments where agents process external content (web browsing agents, document-processing agents, email agents, RAG-based agents, computer-use agents). Particularly high-risk: agents with tool-call capabilities that include shell execution, code interpreters, file write access, or external API calls with ambient credentials.
Mitigation
Architectural mitigations: (1) Strictly separate instructional content (from the system prompt and trusted user input) from retrieved/environmental content — treat all external content as untrusted data, not instructions. (2) Implement tool-call allowlists with the minimum necessary permissions; never grant ambient credential access to external content retrieval tools. (3) Deploy output validation layers before any tool-call execution is triggered by agent reasoning. (4) Log all tool calls with correlation IDs and flag anomalous instruction patterns in retrieved content. (5) Test all agentic deployments with indirect prompt injection test suites before production release — treat this as a mandatory security gate, not an optional QA step.