Gibberish results running gemma4 on my integrated GPU

Apr 03, 2026

I’ve never really tried running LLM models on my machine. Decided to see if I could run gemma4. So, I’ve installed ollama, and the first thing I noticed was a warning that it would run the models on my CPU, not my GPU, because it did not detect any NVIDIA GPU. So I went on a journey trying to run gemma4 on my Intel Meteor Lake integrated GPU.

It turned out not to be that hard. You must set an environment flag on your ollama service.

Edit the service:

sudo systemctl edit ollama

And add:

[Service]
Environment="OLLAMA_VULKAN=1"

When I ran the model ollama run gemma4 and gave a prompt, it responded with a lot of gibberish. So, I removed the flag and ran again, and the model behaved just fine.

I wasn’t sure whether the flag worked. Trying to find that out, I came across ollama ps. That command showed how much compute was going to the CPU and how much to the GPU. With OLLAMA_VULKAN=1, it showed a 68%/32% split (in which 68% going to the CPU). And with the flag disabled, it showed 100% CPU usage. That meant that, for some reason, Ollama decided to split the model, which forced the data to move between CPU and GPU, and in that movement, something must have gone wrong.

To test whether the split was the problem, I ran a smaller model ollama run phi3.5:3.8b . And this time ollama ps showed compute was 100% GPU, and the model worked just fine.

I’d love to know why ollama decided split like that. I have 54 GiB of memory in my system, maybe ollama could not see through Vulkan driver how much of that was being allocated to GPU and it thought it would not have enough? And the second question is, even with the split, it should be working; there must be a bug somewhere that introduces the gibberish.

#gemma4 #gpu #llm #ollama