Definition
The practice of having a dedicated team — internal or external — try to make an AI system behave badly: generating harmful content, leaking private data, being manipulated, or failing in safety-critical ways. It mirrors the cybersecurity practice of ethical hacking but is specifically adapted for AI systems.
Why it matters
Standard software testing does not catch AI-specific failure modes. Red-teaming before deployment is the primary way organisations discover how their AI can be abused, and regulators are increasingly expecting evidence that it has been done.