Vulnerability  ·  2026-05-11

Ollama Heap Out-of-Bounds Read (CVE-2026-7482 'Bleeding Llama') — Critical Memory Leak in 300k+ Deployments

VulnerabilityHigh impactGlobalCVE-2026-7482
Ollama before version 0.17.1 contains a heap out-of-bounds read vulnerability in the GGUF model loader. The /api/create endpoint accepts attacker-supplied GGUF files where the declared tensor offset and size exceed the file's actual length. During model quantization, the server reads past the allocated heap buffer, leaking arbitrary process memory.
Remote, unauthenticated. Attacker uploads a crafted GGUF model file with inflated tensor shape via HTTP POST to an exposed Ollama server's /api/create endpoint, triggering out-of-bounds heap read. Leaked data is exfiltrated via the /api/push endpoint to an attacker-controlled registry.
Ollama versions before 0.17.1 (GitHub: 171k+ stars, 16k+ forks). Exploitation likely affects ~300,000 Ollama servers globally. Particularly impactful in environments where Ollama is chained to Claude Code or other agent tools, where all inference outputs flow through the vulnerable server memory.
Upgrade to Ollama 0.17.1 or later immediately. Isolate all Ollama instances behind authentication proxies or API gateways (REST API has no built-in authentication). Limit network access to Ollama endpoints. Audit existing deployments for internet exposure. Deploy WAF rules to detect suspicious GGUF file uploads. Separate Ollama from sensitive data flows and agent tool pipelines until patched.
Sources
The Hacker NewsCyera Research (Bleeding Llama)CVE.org CVE-2026-7482Ollama GitHub Release v0.17.1
See this in the live feed Explore related AI security and governance findings — updated every morning.
Open the feed →