Microsoft AI Red Team Updates Agentic AI Failure Mode Taxonomy — 7 New Modes from 12 Months of Production Red Teaming

What happened

Microsoft's AI Red Team published an updated taxonomy of failure modes in agentic AI systems on June 4, adding seven new categories derived from a year of red-team engagements against production deployments including Microsoft Security Copilot and MCP ecosystems. The new categories are: agentic supply-chain compromise, goal hijacking, inter-agent trust escalation, computer-use agent visual attacks, session context contamination, MCP/plugin abuse, and capability/architecture disclosure. The post describes how an open-source agentic framework (OpenClaw) rapidly accumulated thousands of deployments while harbouring 336 confirmed malicious plugins — illustrating how agent ecosystems can scale faster than security review.

Why it matters

Unlike the 2025 first edition (which was forward-looking), this update is based on confirmed exploit chains in production: zero-click data exfiltration and lateral movement were documented from external-origin inputs alone, with no user interaction beyond initial agent deployment. The human-in-the-loop bypass was identified as the most exploited failure mode, directly challenging the assumption that approval prompts provide meaningful security. Microsoft recommends agent SBOM generation, identity verification per-task, and least-privilege scoping per tool call.

Action needed

Map the seven new failure modes to your deployed agent architectures; specifically audit whether human-approval prompts can be bypassed and whether MCP/plugin registries have been reviewed for malicious entries.

Microsoft AI Red Team Updates Agentic AI Failure Mode Taxonomy — 7 New Modes from 12 Months of Production Red Teaming

What happened

Why it matters

Action needed

Sources