Defense  ·  Glossary

Guardrails

Safety filters and rules built around an AI model to prevent it from producing harmful, off-topic, or policy-violating outputs. Guardrails may check what the user sends in, what the AI is about to say, or both. They can be built by the AI provider, the company deploying the AI, or both working together.
Guardrails are the primary line of defence between a capable AI and misuse, but research has proven that no finite set of guardrails is unbreakable. They must be continuously updated as new attacks emerge — and paradoxically, very sophisticated guardrails can themselves be weaponised in denial-of-service attacks.
References
NIST: Mathematical Proof That No Finite AI Guardrail Set Is Universally Robust
Track this in the live feed See how this plays out in real AI security and governance developments.
Open the feed →