Muse Spark Contemplating Safety & Preparedness Report

What happened

Meta's AI Safety & Preparedness team published the safety and preparedness evaluation for Muse Spark Contemplating, the company's deep-reasoning model that extends Muse Spark with multi-agent orchestration at inference time. The report covers evaluations across the three risk domains in Meta's Advanced AI Scaling Framework: Chemical & Biological, Cybersecurity, and Loss of Control. Key finding: Muse Spark Contemplating's extended reasoning and multi-agent orchestration 'retains the same risk thresholds as Muse Spark' and 'does not introduce qualitatively new risk vectors,' with the same multi-layered mitigations assessed as adequate. The report includes cross-model comparisons against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on capability benchmarks (e.g., WMDP-Bio, WMDP-Cyber, ProtocolQA) and refusal/robustness evaluations, and discloses that Muse Spark Contemplating scored 'high risk' for Chemical and/or Biological risks in an unmitigated assessment, with mitigations bringing the deployment posture to 'moderate or lower risk.' The report also introduces a dedicated section on Loss of Control evaluating reliable monitorability and misaligned propensities — an increasingly important category for governance practitioners.

Why it matters

As reasoning models with multi-agent orchestration become the deployment standard, this report sets a reference point for what frontier-lab transparency looks like for an incremental but capability-expanding model update. Safety teams and CISOs should compare Meta's evaluation methodology and risk-threshold framework against those of Anthropic and OpenAI to identify gaps in their own AI vendor due-diligence processes.

Action needed

Forward to AI security and procurement teams as a reference for vendor due-diligence checklists; compare Meta's CBRN and Loss of Control evaluation methodology against the AI supplier assessments in your existing vendor governance framework.

Muse Spark Contemplating Safety & Preparedness Report

What happened

Why it matters

Action needed

Sources