What happened
Researchers from the University of Science and Technology of China, National University of Singapore, Singapore Management University, and Shanghai AI Laboratory published a preprint on May 27, 2026 (arXiv:2605.28201) formalising the 'Sleeper Attack' threat model for LLM agents. Tested on 1,896 instances across seven LLMs (open-source and closed-source), the study shows that adversarial content injected into tool-returned data, web pages, or MCP context can persist in agent state (session context, memory, reusable skills) across multiple interactions and activate via a benign user query — achieving higher attack success rates than single-interaction baselines even on agents that appeared resistant to direct prompt injection.
Why it matters
Existing defensive postures for agentic AI — including most prompt-injection defences — assume that adversarial content must trigger harmful behaviour within the same user request. Sleeper Attacks invalidate this assumption: a malicious instruction planted in an agent's memory could remain dormant for days or weeks before being triggered by a completely unrelated benign request, making detection and attribution dramatically harder. This research, co-authored from Singapore, has direct relevance to enterprises deploying memory-enabled or long-running agentic AI systems.
Action needed
Review whether deployed agents have persistent memory or reusable skill stores, and apply stricter controls: restrict writes to memory from external content, add integrity checks on loaded skills, and implement behavioural monitoring that looks for anomalous tool calls correlated across sessions — not just within a single request.