Sockpuppeting: Universal Single-Line Jailbreak Affects 11 Major LLMs

Technical description

Trend Micro researchers disclosed 'Sockpuppeting', a jailbreak technique that bypasses safety guardrails on 11 major LLMs using a single line of code exploiting the API assistant prefill feature. Successfully extracted functional malware code and confidential system prompts.

Attack vector

Injection of a fake acceptance into the assistant-role message via standard API prefill feature, exploiting the model's self-consistency tendency to continue prohibited output. Requires only API access supporting assistant prefill—no model weights, optimisation, or specialised tooling.

Affected systems

GPT-4o, GPT-4o-mini, Claude 4 Sonnet, Gemini 2.5 Flash (most susceptible at 15.7% ASR), and 7 other major LLMs. Three models blocked at API layer.

Mitigation

Implement message-ordering validation that blocks assistant-role messages at the API layer. Apply output filtering for known attack patterns. Monitor API usage for anomalous prefill patterns.

Sockpuppeting: Universal Single-Line Jailbreak Affects 11 Major LLMs

Technical description

Attack vector

Affected systems

Mitigation

Sources