NeuralTrust: Chain-of-Thought Hijacking Research — Up to 100% Safety Bypass on Frontier Reasoning Models

What happened

NeuralTrust published research on June 25, 2026 demonstrating Chain-of-Thought Hijacking — a jailbreak technique that buries harmful prompts under thousands of tokens of benign reasoning to dilute refusal signals, achieving 99% success on Gemini 2.5 Pro and 100% on Grok 3 Mini. NeuralTrust's platform positions in-flight safety verification as the mitigation.

Why it matters

Demonstrates that 'reasoning more' does not equal 'being safer' — the very capability that makes Large Reasoning Models (LRMs) powerful becomes an exploitable attack surface. Any deployment of frontier reasoning models in agentic or customer-facing contexts is exposed.

Applicability

Teams deploying reasoning models (Gemini 2.5, GPT o-series, Grok) in agentic or user-facing roles should implement continuous in-flight intent monitoring rather than relying solely on initial prompt filters; evaluate LLM firewall vendors for reasoning-chain inspection capability.