AI red-teaming

Definition

The practice of having a dedicated team — internal or external — try to make an AI system behave badly: generating harmful content, leaking private data, being manipulated, or failing in safety-critical ways. It mirrors the cybersecurity practice of ethical hacking but is specifically adapted for AI systems.

Why it matters

Standard software testing does not catch AI-specific failure modes. Red-teaming before deployment is the primary way organisations discover how their AI can be abused, and regulators are increasingly expecting evidence that it has been done.

References

NIST AI Risk Management Framework (AI RMF 1.0)

Track this in the live feed See how this plays out in real AI security and governance developments.

Open the feed →

Definition

Why it matters

Related terms

Demonstrated by recent findings

Findings on this topic (11)

References