Our evaluation of OpenAI's GPT-5.5 cyber capabilities

What happened

The UK AI Safety Institute evaluated OpenAI's GPT-5.5 on cyber capability tasks using capture-the-flag format exercises designed to assess vulnerability research and exploitation skills. GPT-5.5 is the second model (after Anthropic's Claude Mythos Preview) to complete AISI's corporate network attack simulation end-to-end—a multi-step exercise estimated to take a human approximately 20 hours. Results from an early checkpoint suggest GPT-5.5 reaches a similar level of cyber performance to Claude Mythos, indicating multiple frontier developers are converging on advanced offensive cyber capabilities.

Why it matters

Two independent frontier models from different developers now demonstrate end-to-end autonomous cyber intrusion capabilities in structured testing. This signals that advanced offensive cyber AI is no longer a one-off capability but a reproducible outcome across the frontier lab ecosystem, compressing the timeline for defensive organizations to prepare for AI-augmented attacks.

Action needed

Convene your red team and cyber defense leads to review AISI's published evaluation methodology and assess whether your organization's threat model accounts for multi-step autonomous intrusions. Update incident response playbooks to include scenarios where attackers leverage AI for reconnaissance, lateral movement, and exploitation at machine speed.

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

What happened

Why it matters

Action needed

Sources