Attack  ·  Glossary

CoT forgery (chain-of-thought forgery)

A specific technique within chain-of-thought hijacking where an attacker injects text that perfectly mimics the internal 'thinking' style of an AI reasoning model. Because the model uses writing style — not secure structural tags — to distinguish its own thoughts from external input, forged reasoning text is accepted as if the model generated it itself, bypassing safety checks.
This is a structural flaw in how current AI reasoning models work — not a bug that can be patched with a software update. It means that safety guardrails built into reasoning models can be systematically defeated by anyone who understands the model's reasoning style.
References
ICML 2026 — Prompt Injection as Role Confusion (Ye, Cui, Hadfield-Menell)
Track this in the live feed See how this plays out in real AI security and governance developments.
Open the feed →