Position: AI Security Policy Should Target Systems, Not Models

What happened

A multi-institutional research team published a preprint (arXiv:2605.09504, not peer-reviewed) arguing that AI security policy should redirect attention from access restrictions on individual frontier models to system-level capability assessment. The paper presents two experiments: (1) a swarm of five 1.2 billion-parameter models achieved a 45.8% Effective Harm Rate in jailbreak attacks against GPT-4o, producing 49 critical-severity breaches, and (2) the same models performed combined source code analysis and binary fuzzing against a vulnerable C application with 9 planted CWEs, recovering 9 of 9 vulnerabilities (100% recall) in approximately four minutes on a consumer MacBook when scaffolded with regex pattern detection and AddressSanitizer-based crash classification. The central claim: "the offensive capabilities motivating [model access] restrictions reside primarily in the scaffold around the model and are reproducible with small open-weights models on commodity hardware."

Why it matters

If offensive capabilities are reproducible at effectively zero cost using open-weights models and commodity hardware, then access restrictions on individual frontier models provide little defensive value. This challenges the rationale for restricted releases like Anthropic's Mythos Preview and suggests that AI security policy should focus on system architectures, scaffolding techniques, and deployment contexts rather than model access alone. Preprint, not peer-reviewed—treat findings as preliminary but policy-relevant.

Action needed

CISOs and AI security leads should review internal security postures for scaffold-based attack vectors by Q3, independent of whether they use frontier or open-weights models.

Position: AI Security Policy Should Target Systems, Not Models

What happened

Why it matters

Action needed

Sources