Definition
An attack in which an adversary hijacks a running AI agent — taking control of its actions mid-task — by feeding it malicious instructions through the data or tools it interacts with. The attacker essentially 'steers' the agent to execute harmful commands (run malware, exfiltrate data) as if those commands came from the legitimate operator.
Why it matters
Because agents have broad system access — developer credentials, code repositories, cloud infrastructure — a successful agentjacking can be as damaging as a full network intrusion. It was documented in production enterprise environments in 2026.