Technical description
vLLM versions 0.8.0 and later are vulnerable to an Out-of-Memory Denial of Service attack in the VideoMediaIO.load_base64() method. When processing video/jpeg data URLs, the method splits the base64 data string on commas to extract JPEG frames without enforcing any frame count limit. An attacker can craft a single API request containing thousands of comma-separated base64 JPEG frames, causing the server to decode all frames into memory until it crashes. The vulnerability is reachable via the unauthenticated OpenAI-compatible chat completions API endpoint.
Attack vector
Single unauthenticated HTTP request to the vLLM /v1/chat/completions endpoint with a crafted video/jpeg data URL containing thousands of comma-separated base64-encoded JPEG frames. No authentication required if the API is exposed without an auth layer (common in self-hosted deployments).
Affected systems
vLLM 0.8.0 and all later versions through at least the disclosure date. vLLM is one of the most widely deployed open-source LLM inference servers, used for hosting models including Llama, Mistral, Qwen, and others in enterprise and cloud environments.
Mitigation
Apply the patch from commit 58ee614 in the vLLM repository. If immediate patching is not possible: place vLLM inference endpoints behind an authenticated API gateway, apply request-size limits and input validation before video data URLs reach the vLLM process, and enable OOM monitoring to detect attack attempts.