What happened
Prior to vLLM 0.23.1rc0, the /v1/audio/transcriptions endpoint limits compressed upload size but not the decoded PCM output. A 25 MB OPUS file expands to approximately 14.9 GB of float32 PCM at decode time. This causes memory exhaustion and denial of service on the inference server. CVSS 6.5 Medium, published 2026-06-22.
Why it matters
Any vLLM deployment exposing audio transcription can be taken offline by a single unauthenticated request containing a crafted OPUS file, disrupting all LLM inference served by that instance. This is especially impactful for production multimodal AI services.
Attack vector
POST a crafted 25 MB OPUS file to /v1/audio/transcriptions; server decodes to ~14.9 GB PCM exhausting memory
Affected systems
vLLM 0.x through < 0.23.1rc0 with audio transcription enabled
Mitigation
Upgrade to vLLM 0.23.1rc0 or later. PR fix: https://github.com/vllm-project/vllm/pull/44970