Governance  ·  Glossary

Jailbreak severity benchmark

A formal scoring system that rates how dangerous a successful AI jailbreak is — measuring factors such as how far the safety bypass went, what harmful capabilities became accessible, how easily it can be repeated, and what real-world harm could result. The White House and Anthropic are actively developing the first government-industry version of such a benchmark.
Without an agreed severity scale, governments and companies have no shared language for deciding when an AI model is too dangerous to deploy or must be recalled — the benchmark is the foundation for any credible AI model governance regime.
References
Politico: White House talks with Anthropic shift to setting AI security rules
Track this in the live feed See how this plays out in real AI security and governance developments.
Open the feed →