Ollama Heap Out-of-Bounds Read (CVE-2026-7482 'Bleeding Llama') — Critical Memory Leak in 300k+ Deployments

Technical description

Ollama before version 0.17.1 contains a heap out-of-bounds read vulnerability in the GGUF model loader. The /api/create endpoint accepts attacker-supplied GGUF files where the declared tensor offset and size exceed the file's actual length. During model quantization, the server reads past the allocated heap buffer, leaking arbitrary process memory.

Attack vector

Remote, unauthenticated. Attacker uploads a crafted GGUF model file with inflated tensor shape via HTTP POST to an exposed Ollama server's /api/create endpoint, triggering out-of-bounds heap read. Leaked data is exfiltrated via the /api/push endpoint to an attacker-controlled registry.

Affected systems

Ollama versions before 0.17.1 (GitHub: 171k+ stars, 16k+ forks). Exploitation likely affects ~300,000 Ollama servers globally. Particularly impactful in environments where Ollama is chained to Claude Code or other agent tools, where all inference outputs flow through the vulnerable server memory.

Mitigation

Upgrade to Ollama 0.17.1 or later immediately. Isolate all Ollama instances behind authentication proxies or API gateways (REST API has no built-in authentication). Limit network access to Ollama endpoints. Audit existing deployments for internet exposure. Deploy WAF rules to detect suspicious GGUF file uploads. Separate Ollama from sensitive data flows and agent tool pipelines until patched.

Ollama Heap Out-of-Bounds Read (CVE-2026-7482 'Bleeding Llama') — Critical Memory Leak in 300k+ Deployments

Technical description

Attack vector

Affected systems

Mitigation

Sources