Technical description
A heap out-of-bounds read vulnerability in Ollama's GGUF model loader allows attackers to trigger memory corruption during model quantization. The /api/create endpoint accepts attacker-supplied GGUF files in which the declared tensor offset and size exceed the file's actual length. When Ollama processes such files during quantization in fs/ggml/gguf.go and server/quantization, it reads beyond allocated memory boundaries, enabling arbitrary code execution in the context of the Ollama server process.
Attack vector
An attacker can craft a malicious GGUF model file and submit it to the /api/create endpoint. If an organization's Ollama instance is exposed or if an attacker has internal network access, they can upload the weaponized model file. Upon processing, the out-of-bounds read triggers, allowing the attacker to execute arbitrary code on the server hosting Ollama, potentially gaining full control of the system and access to all models and data managed by the instance.
Affected systems
Ollama versions prior to 0.17.1. Ollama is widely deployed for local LLM inference and model management in enterprise environments, developer workstations, and research labs.
Mitigation
Upgrade to Ollama version 0.17.1 or later immediately. Organizations should audit all Ollama instances (including developer laptops and edge deployments) to ensure they are patched. If immediate patching is not feasible, restrict access to the /api/create endpoint via network segmentation or authentication controls, and monitor for suspicious model upload activity.