vLLM Sparse Tensor Validation Missing in Multimodal Embeddings — Denial of Service

What happened

CVE-2026-56340 was published to NVD on 2026-06-20 (CVSS 8.8 High). vLLM versions 0.10.2 through 0.12.x are missing input validation for sparse tensor indices in their multimodal embeddings processing pipeline. PyTorch's default configuration disables sparse tensor invariant checks for performance, so crafted tensors with negative or out-of-bounds indices pass through to processing undetected, crashing the server. The fix in vLLM 0.13.0 adds explicit validation before sparse tensor operations.

Why it matters

vLLM is the dominant open-source LLM inference engine used in production AI deployments. An unauthenticated attacker who can submit embedding requests to a multimodal vLLM endpoint can crash the inference server with a single malformed request, causing complete denial of service to all users of that deployment. In multi-tenant GPU inference environments, this also affects other tenants sharing the same server.

Attack vector

An attacker submits a crafted embedding request containing a malformed sparse tensor with negative or out-of-bounds indices. Because PyTorch disables sparse tensor invariant checks by default and vLLM performs no validation before processing, the malformed tensor triggers undefined/crash behaviour in the multimodal embeddings processing path, causing denial of service to the inference server

Affected systems

vLLM >= 0.10.2 and < 0.13.0

Mitigation

Upgrade vLLM to version 0.13.0 or later. Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-mcmc-2m55-j8jj