ollama - 💡(How to fix) Fix gemma4:e4b only offloads 2.8 GiB to ROCm GPU despite 7.5 GiB available [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15749Fetched 2026-04-23 07:23:24
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
0
Participants
Timeline (top)
commented ×3labeled ×1mentioned ×1subscribed ×1

Code Example

gpu memory id=0 library=ROCm available="7.5 GiB" free="8.0 GiB" minimum="457.0 MiB" overhead="0 B"
model weights device=ROCm0 size="2.8 GiB"
model weights device=CPU   size="6.6 GiB"
offloaded 42/43 layers to GPU
RAW_BUFFERClick to expand / collapse

What is the issue?

GPU: AMD Radeon RX 6600 XT (gfx1032, 8GB VRAM) OS: Linux Ollama: 0.21.0 Backend: ROCm

Issue: Ollama reports 7.5 GiB available on ROCm0 but only offloads 2.8 GiB of model weights to GPU, putting 6.6 GiB on CPU.

ollama ps shows: 68% CPU / 32% GPU

Logs: gpu memory id=0 library=ROCm available="7.5 GiB" free="8.0 GiB" minimum="457.0 MiB" overhead="0 B" model weights device=ROCm0 size="2.8 GiB" model weights device=CPU size="6.6 GiB" offloaded 42/43 layers to GPU

Tried:

  • OLLAMA_NUM_GPU=999
  • OLLAMA_GPU_OVERHEAD=0
  • OLLAMA_FLASH_ATTENTION=0
  • HSA_OVERRIDE_GFX_VERSION=10.3.0 None changed the weights distribution.

Note: Other models (qwen3.5:9b, qwen3-vl) work fine on this GPU.

Relevant log output

gpu memory id=0 library=ROCm available="7.5 GiB" free="8.0 GiB" minimum="457.0 MiB" overhead="0 B"
model weights device=ROCm0 size="2.8 GiB"
model weights device=CPU   size="6.6 GiB"
offloaded 42/43 layers to GPU

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.21.0

extent analysis

TL;DR

The issue might be resolved by investigating and adjusting the memory allocation settings for the Ollama model, as the current configuration only offloads a fraction of the model weights to the GPU.

Guidance

  • Review the Ollama documentation to understand how memory allocation is handled, especially for large models, to see if there are specific settings or configurations that can be adjusted to optimize GPU usage.
  • Investigate the difference in behavior between the problematic model and other models (like qwen3.5:9b, qwen3-vl) that work fine on this GPU, focusing on model size, complexity, and any specific settings used for those models.
  • Consider experimenting with different values for OLLAMA_NUM_GPU and other environment variables related to GPU usage, even though the provided attempts did not yield results, as there might be other combinations or settings that could influence the memory allocation.
  • Look into the ROCm library settings and version (currently using version implied by HSA_OVERRIDE_GFX_VERSION=10.3.0) to ensure compatibility and optimal performance with the AMD Radeon RX 6600 XT.

Example

No specific code example can be provided without more details on the model or the exact commands used to run Ollama. However, reviewing the commands or scripts that load and run the models, especially focusing on any parameters related to GPU allocation, could provide insights.

Notes

The issue seems specific to the combination of the model being used and the Ollama version (0.21.0) on the specified hardware. The fact that other models work correctly suggests a potential issue with how this particular model interacts with Ollama's memory management.

Recommendation

Apply workaround: Given the model-specific nature of the issue and the lack of success with environment variable adjustments, the next step would involve deeper investigation into model-specific settings or optimizations that could improve GPU utilization, rather than upgrading Ollama, as there's no clear indication a newer version would resolve this specific issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix gemma4:e4b only offloads 2.8 GiB to ROCm GPU despite 7.5 GiB available [3 comments, 2 participants]