What happened
Prior to vLLM 0.22.0, the activation function loader used an assert-based security check to validate function names loaded from model configs. Python's assert statements are silently stripped when the interpreter runs in optimized mode (python -O or PYTHONOPTIMIZE=1). An unauthenticated attacker can publish a malicious HuggingFace model with a crafted activation function name; when vLLM loads the model under optimized mode, the assert check is skipped and arbitrary code executes on the server. CVSS 7.5 High.
Why it matters
This creates a novel attack class: a poisoned public model on HuggingFace Hub can silently RCE any vLLM server that loads it — with no authentication or direct network access needed. Attackers can target MLOps pipelines that auto-pull new or fine-tuned models. This is a model-supply-chain attack against inference infrastructure.
Attack vector
Attacker publishes a malicious HuggingFace model with crafted activation function config; vLLM running with PYTHONOPTIMIZE=1 loads it and executes attacker code without authentication
Affected systems
vLLM < 0.22.0 when running with Python optimizations enabled
Mitigation
Upgrade to vLLM 0.22.0. Avoid running vLLM with python -O or PYTHONOPTIMIZE=1. Restrict model sources to trusted registries. Fix: https://github.com/vllm-project/vllm/commit/b3c7ffcab82c2439726f8cb213800f6f38c023d3