Agentic Red-Team Tools (12 Systems) — Systemic Sandbox Escape and API Key Exfiltration via Agent-Phishing (arXiv 2606.24496)

What happened

Researchers from Cracken published the first systematic security audit of 12 widely used agentic offensive-security platforms (CAI, RedAmon, PentestAgent, DarkMoon, PentAGI, AIRecon, PentestGPT, METATRON, Nebula, Xalgorix, Artemis, STRIX) on June 23, 2026. They found that 10/12 are vulnerable to full sandbox escape and host-level RCE, 11/12 leak LLM provider API keys, and all 12 are susceptible to unbounded weaponization bypassing guardrails. The key attack is 'agent-phishing': staging realistic-looking malicious artifacts (e.g., a fake password-vault tool 'pwcrypt') on a honeypot target that the agent downloads and executes as part of its normal workflow — no explicit prompt injection required. Across 10 agents and 6 frontier LLMs (Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4 Pro), the attack achieved 97.8% RCE success rate.

Why it matters

These agentic red-team platforms are increasingly deployed in real operational security contexts. An adversary who controls a penetration-test target can weaponize the testing agent against the organization running it — stealing LLM API keys, establishing persistence, escaping containers via Docker socket mounts, and achieving full host compromise on the operator's machine. This turns a security tool into a liability and represents a novel, high-impact attack class against AI-augmented security operations.

Attack vector

Attacker controls a penetration-test target host and stages malicious but realistic-looking binaries/tools; the agentic red-team system discovers, downloads, and executes them during normal operation, triggering reverse shells or memory corruption exploits that escalate to sandbox escape and host RCE.

Affected systems

CAI, RedAmon, PentestAgent, DarkMoon, PentAGI, AIRecon, PentestGPT, METATRON, Nebula, Xalgorix, Artemis, STRIX (all versions audited as of June 2026)

Mitigation

No single patch available — architectural mitigations required: enforce strict least-privilege container configs (no --privileged, no Docker socket mounts), network-segment worker environments, treat all tool outputs as untrusted, and implement human-in-the-loop gates before binary execution. See paper for detailed secure architecture: https://arxiv.org/abs/2606.24496