Guidelines  ·  2026-05-14

UK AI Security Institute: Frontier Models Have Broken All Prior Trend Lines for Autonomous Cyber Capability

GuidelinesHigh impactUnited Kingdom
The UK AI Security Institute (AISI), conducting pre-deployment evaluations on behalf of the British government, published independent research on May 13, 2026, showing that Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have substantially exceeded all prior forecasting trends for autonomous cyber task completion. The AISI had previously estimated that frontier models' 80% reliability cyber time horizon was doubling approximately every 5 months (down from 8-month doubling in November 2025). Mythos Preview and GPT-5.5 have now outpaced all measured trend lines: Mythos became the first model to complete both AISI cyber ranges (solving 'The Last Ones' 32-step attack in 6/10 attempts and completing 'Cooling Tower' — previously unsolved — in 3/10 attempts). Independent research from METR corroborated a ~4-month doubling time since late 2024.
The AISI report provides quantitative, government-backed evidence that frontier AI capability is accelerating faster than prior models predicted. The shift from 5-month to 4-month doubling (and the outperformance of both Claude Mythos and GPT-5.5 on cyber ranges) indicates a discontinuity in capability scaling. This directly underpins the 3–5 month window cited by Palo Alto and congressional lawmakers: if autonomous cyber task complexity is doubling every 4–5 months, organizations have approximately one doubling cycle to harden defenses before current-generation models can autonomously execute multistage attacks. The AISI is developing more demanding evaluations (new cyber ranges, active cyber defenses) to reflect real-world conditions, establishing a baseline for future capability benchmarking.
CISOs should use the 3–5 month window as a planning horizon for vulnerability detection and patch acceleration programs. Benchmark internal vulnerability triage and patch deployment velocity against the rate at which frontier models are identifying new flaws. Evaluate whether current patching timelines (often 30–60 days) are sufficient given AI-assisted exploitation velocity. Consider adopting 'zero standing privilege' architectures and 'assume breach' postures that reduce exposure even when patches lag.
Sources
Researchers say AI just broke every benchmark for autonomous cyber capability
See this in the live feed Explore related AI security and governance findings — updated every morning.
Open the feed →