What happened
Microsoft's AI Red Team published an updated taxonomy of failure modes in agentic AI systems on June 4, adding seven new categories derived from a year of red-team engagements against production deployments including Microsoft Security Copilot and MCP ecosystems. The new categories are: agentic supply-chain compromise, goal hijacking, inter-agent trust escalation, computer-use agent visual attacks, session context contamination, MCP/plugin abuse, and capability/architecture disclosure. The post describes how an open-source agentic framework (OpenClaw) rapidly accumulated thousands of deployments while harbouring 336 confirmed malicious plugins — illustrating how agent ecosystems can scale faster than security review.
Why it matters
Unlike the 2025 first edition (which was forward-looking), this update is based on confirmed exploit chains in production: zero-click data exfiltration and lateral movement were documented from external-origin inputs alone, with no user interaction beyond initial agent deployment. The human-in-the-loop bypass was identified as the most exploited failure mode, directly challenging the assumption that approval prompts provide meaningful security. Microsoft recommends agent SBOM generation, identity verification per-task, and least-privilege scoping per tool call.
Action needed
Map the seven new failure modes to your deployed agent architectures; specifically audit whether human-approval prompts can be bypassed and whether MCP/plugin registries have been reviewed for malicious entries.