vLLM Audio Transcription Endpoint Decompression Bomb (25 MB OPUS → 14.9 GB PCM)

What happened

Prior to vLLM 0.23.1rc0, the /v1/audio/transcriptions endpoint limits compressed upload size but not the decoded PCM output. A 25 MB OPUS file expands to approximately 14.9 GB of float32 PCM at decode time. This causes memory exhaustion and denial of service on the inference server. CVSS 6.5 Medium, published 2026-06-22.

Why it matters

Any vLLM deployment exposing audio transcription can be taken offline by a single unauthenticated request containing a crafted OPUS file, disrupting all LLM inference served by that instance. This is especially impactful for production multimodal AI services.

Attack vector

POST a crafted 25 MB OPUS file to /v1/audio/transcriptions; server decodes to ~14.9 GB PCM exhausting memory

Affected systems

vLLM 0.x through < 0.23.1rc0 with audio transcription enabled

Mitigation

Upgrade to vLLM 0.23.1rc0 or later. PR fix: https://github.com/vllm-project/vllm/pull/44970