ollama - 💡(How to fix) Fix Vulkan runtime, allow more than 64GB VRAM with splitted memory heaps. [1 participants]

ollama2026-04-03 19:31:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15302•Fetched 2026-04-08 02:44:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rjmalagon

Participants

rjmalagon

Timeline (top)

labeled ×1

Code Example

This is what is detected by Ollama Vulkan runtime

---

When filling the larger heap (above 54.7 GB in this example, Qwen3.5 27B bf16) Ollama does not crash, just gets indefinitely stuck loading the model.

RAW_BUFFERClick to expand / collapse

On Linux AMDGPU, when assigned more than 64 GB GTT memory, Vulkan split the memory heap. Ollama counts the assigned GTT memory pool, but gets stuck when using above the larger splitted memory heap.

In my current setup, a GFX900-level AMD APU, with 96GB of main RAM, 82GB assigned (amdgpu.gttsize=82000)

These are the reported memory heaps by vulkaninfo

=================================
memoryHeaps: count = 2
        memoryHeaps[0]:
                size   = 29376905216 (0x6d7000000) (27.36 GiB)
                budget = 29367087104 (0x6d66a3000) (27.35 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags:
                        None
        memoryHeaps[1]:
                size   = 58753810432 (0xdae000000) (54.72 GiB)
                budget = 58734174208 (0xdacd46000) (54.70 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT

This is what is detected by Ollama Vulkan runtime

time=2026-04-03T19:23:01.730Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0900-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="AMD Radeon Vega 8 Graphics (RADV RAVEN)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:09:00.0 type=iGPU total="82.1 GiB" available="82.1 GiB"

When filling the larger heap (above 54.7 GB in this example, Qwen3.5 27B bf16) Ollama does not crash, just gets indefinitely stuck loading the model.

time=2026-04-03T19:05:34.815Z level=INFO source=device.go:240 msg="model weights" device=Vulkan1 size="49.3 GiB"
time=2026-04-03T19:05:34.815Z level=INFO source=device.go:245 msg="model weights" device=CPU size="2.5 GiB"
time=2026-04-03T19:05:34.815Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan1 size="4.7 GiB"
time=2026-04-03T19:05:34.832Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan1 size="789.3 MiB"
time=2026-04-03T19:05:34.832Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB"
time=2026-04-03T19:05:34.833Z level=INFO source=device.go:272 msg="total memory" size="57.2 GiB"

extent analysis

TL;DR

The issue can be mitigated by ensuring Ollama correctly handles the split memory heap reported by Vulkan, potentially by accounting for the device-local memory heap.

Guidance

Review the Vulkan documentation to understand how to properly handle split memory heaps, especially when the device-local heap is involved.
Verify that Ollama's memory allocation logic is compatible with the Vulkan memory model, particularly for devices with multiple memory heaps.
Consider implementing a check to detect when the allocated memory exceeds the size of the first memory heap (27.36 GiB in this case), and adjust the allocation strategy accordingly to utilize the device-local heap.
Investigate if there are any Vulkan extensions or features that can help manage or merge the memory heaps, potentially simplifying the allocation process.

Example

No specific code example can be provided without more context on Ollama's implementation, but ensuring that memory allocations are made with awareness of the MEMORY_HEAP_DEVICE_LOCAL_BIT flag may be crucial.

Notes

The exact solution may depend on the specifics of Ollama's Vulkan integration and how it manages memory allocations across different heaps.
Understanding the implications of the MEMORY_HEAP_DEVICE_LOCAL_BIT flag on memory allocation and access patterns is essential for a proper fix.

Recommendation

Apply workaround: Adjust Ollama's memory allocation logic to correctly handle the split memory heap scenario, ensuring it can utilize the device-local memory heap efficiently. This approach is recommended because it directly addresses the observed issue without requiring external updates or upgrades.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#embedding generation #cache error #pipeline error #runtime error #dependency conflict

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Vulkan runtime, allow more than 64GB VRAM with splitted memory heaps. [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Vulkan runtime, allow more than 64GB VRAM with splitted memory heaps. [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING