AISI Engineering Playbook: An Open Guide to Building AI Evaluation Capabilities

What happened

The UK AI Security Institute published its Engineering Playbook on June 18, 2026, open-sourcing the full evaluation infrastructure stack it has developed for assessing frontier AI models. The Playbook is structured around five layers — Evaluate, Isolate, Connect, Run, and Scale — and documents the methods, practices, and supporting infrastructure required to run rigorous, reproducible AI evaluations at scale. AISI describes it as 'a complete resource that captures the methods and practices we've developed while evaluating frontier AI systems,' explicitly intended to allow researchers and organisations to 'stand up rigorous evaluation capability without starting from zero.' The release builds on AISI's Inspect AI toolkit (adopted by METR for its 228-task time-horizon evaluations cited in the International AI Safety Report 2026, and by Apollo Research after deprecating their internal framework) and accompanies that toolkit with infrastructure documentation covering secure sandboxing, model-provider proxying, compute management, and supercomputer-scale inference. The Playbook is freely available at engineering-playbook.aisi.org.uk.

Why it matters

For any organisation — government, enterprise, or research body — seeking to run credible independent evaluations of frontier AI models, this is now the authoritative open reference from the world's leading government AI evaluation body; adopting it de-risks evaluation programme design and signals alignment with the emerging international standard.

Action needed

Share with your AI safety and evaluation team as the baseline reference for any internal frontier model evaluation programme; assess which of the five infrastructure layers (Evaluate, Isolate, Connect, Run, Scale) your current setup lacks and prioritise accordingly.