Definition
Agent-phishing is a twist on classic phishing where the target being tricked is not a human but an autonomous AI agent — for example, a security-testing AI agent that is deceived by crafted content into leaking its own access keys or breaking out of its sandbox. Researchers found this works across a wide range of real agentic red-teaming tools.
Why it matters
As companies deploy autonomous AI agents for security testing and other sensitive tasks, this shows the agents themselves — not just their human operators — are a phishing target, opening a new class of exploitation.