vLLM Arbitrary Code Execution via Malicious HuggingFace Model — Assert Bypass in Optimized Mode

What happened

Prior to vLLM 0.22.0, the activation function loader used an assert-based security check to validate function names loaded from model configs. Python's assert statements are silently stripped when the interpreter runs in optimized mode (python -O or PYTHONOPTIMIZE=1). An unauthenticated attacker can publish a malicious HuggingFace model with a crafted activation function name; when vLLM loads the model under optimized mode, the assert check is skipped and arbitrary code executes on the server. CVSS 7.5 High.

Why it matters

This creates a novel attack class: a poisoned public model on HuggingFace Hub can silently RCE any vLLM server that loads it — with no authentication or direct network access needed. Attackers can target MLOps pipelines that auto-pull new or fine-tuned models. This is a model-supply-chain attack against inference infrastructure.

Attack vector

Attacker publishes a malicious HuggingFace model with crafted activation function config; vLLM running with PYTHONOPTIMIZE=1 loads it and executes attacker code without authentication

Affected systems

vLLM < 0.22.0 when running with Python optimizations enabled

Mitigation

Upgrade to vLLM 0.22.0. Avoid running vLLM with python -O or PYTHONOPTIMIZE=1. Restrict model sources to trusted registries. Fix: https://github.com/vllm-project/vllm/commit/b3c7ffcab82c2439726f8cb213800f6f38c023d3