Strategic Report  ·  2026-05-06

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

Strategic ReportHigh impactUnited Kingdom
The UK AI Safety Institute evaluated OpenAI's GPT-5.5 on cyber capability tasks using capture-the-flag format exercises designed to assess vulnerability research and exploitation skills. GPT-5.5 is the second model (after Anthropic's Claude Mythos Preview) to complete AISI's corporate network attack simulation end-to-end—a multi-step exercise estimated to take a human approximately 20 hours. Results from an early checkpoint suggest GPT-5.5 reaches a similar level of cyber performance to Claude Mythos, indicating multiple frontier developers are converging on advanced offensive cyber capabilities.
Two independent frontier models from different developers now demonstrate end-to-end autonomous cyber intrusion capabilities in structured testing. This signals that advanced offensive cyber AI is no longer a one-off capability but a reproducible outcome across the frontier lab ecosystem, compressing the timeline for defensive organizations to prepare for AI-augmented attacks.
Convene your red team and cyber defense leads to review AISI's published evaluation methodology and assess whether your organization's threat model accounts for multi-step autonomous intrusions. Update incident response playbooks to include scenarios where attackers leverage AI for reconnaissance, lateral movement, and exploitation at machine speed.
Sources
UK AI Safety Institute
See this in the live feed Explore related AI security and governance findings — updated every morning.
Open the feed →