Governance  ·  Glossary

Pre-deployment AI evaluation (third-party)

An independent safety and capability assessment of an AI model conducted by an external organisation before the model is released to the public. The evaluating body is given early access to the model and runs structured tests to identify dangerous capabilities (such as helping create weapons or executing autonomous tasks beyond safe bounds) and to measure whether the model's safety controls actually work.
Third-party pre-deployment evaluation is rapidly becoming a regulatory expectation — it is already a voluntary commitment by major labs and is required under some emerging frameworks. Boards should ask whether the AI systems they deploy have undergone such evaluations, and whether those evaluations were conducted independently from the developer.
References
METR — Summary of Predeployment Evaluation of GPT-5.6 SolAISI Engineering Playbook: An Open Guide to Building AI Evaluation Capabilities
Track this in the live feed See how this plays out in real AI security and governance developments.
Open the feed →