ollama - 💡(How to fix) Fix Performance decrease on Beelink GTR9 Pro AMD Ryzen™ AI Max+ 395

ollama2026-06-02 21:21:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

Environment="OLLAMA_NUM_GPU=99"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=60m"
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="ROCR_VISIBLE_DEVICE=0"
Environment="HIP_VISIBLE_DEVICE=0"
Environment="OLLAMA_LLM_LIBRARY=rocm"
Environment="TORCH_BLAS_PREFER_HIPBLASLT=1"

---

total duration:       23.206342987s
load duration:        10.715541252s
prompt eval count:    13 token(s)
prompt eval duration: 77.061094ms
prompt eval rate:     168.70 tokens/s
eval count:           542 token(s)
eval duration:        12.26131007s
eval rate:            44.20 tokens/s

---

total duration:       20.519295811s
load duration:        6.97770851s
prompt eval count:    19 token(s)
prompt eval duration: 81.412507ms
prompt eval rate:     233.38 tokens/s
eval count:           693 token(s)
eval duration:        13.174333915s
eval rate:            52.60 tokens/s

---

total duration:       12.825772198s
load duration:        213.465616ms
prompt eval count:    4 token(s)
prompt eval duration: 52.509ms
prompt eval rate:     76.18 tokens/s
eval count:           372 token(s)
eval duration:        12.531793s
eval rate:            29.68 tokens/s

---

total duration:       41.32533364s
load duration:        16.666850597s
prompt eval count:    19 token(s)
prompt eval duration: 158.064ms
prompt eval rate:     120.20 tokens/s
eval count:           753 token(s)
eval duration:        24.498851s
eval rate:            30.74 tokens/s

---

RAW_BUFFERClick to expand / collapse

What is the issue?

After upgrading to 0.30 I noticed a performance decrease. I used a simple prompt like: "test your speed" --verbose

ubuntu 26.04

my ollama override.conf

Environment="OLLAMA_NUM_GPU=99"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=60m"
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="ROCR_VISIBLE_DEVICE=0"
Environment="HIP_VISIBLE_DEVICE=0"
Environment="OLLAMA_LLM_LIBRARY=rocm"
Environment="TORCH_BLAS_PREFER_HIPBLASLT=1"

v0.24.0

qwen3.6:35b

total duration:       23.206342987s
load duration:        10.715541252s
prompt eval count:    13 token(s)
prompt eval duration: 77.061094ms
prompt eval rate:     168.70 tokens/s
eval count:           542 token(s)
eval duration:        12.26131007s
eval rate:            44.20 tokens/s

gemma4:26b

total duration:       20.519295811s
load duration:        6.97770851s
prompt eval count:    19 token(s)
prompt eval duration: 81.412507ms
prompt eval rate:     233.38 tokens/s
eval count:           693 token(s)
eval duration:        13.174333915s
eval rate:            52.60 tokens/s

v0.30.0 qwen3.6:35b

total duration:       12.825772198s
load duration:        213.465616ms
prompt eval count:    4 token(s)
prompt eval duration: 52.509ms
prompt eval rate:     76.18 tokens/s
eval count:           372 token(s)
eval duration:        12.531793s
eval rate:            29.68 tokens/s

gemma4:26b

total duration:       41.32533364s
load duration:        16.666850597s
prompt eval count:    19 token(s)
prompt eval duration: 158.064ms
prompt eval rate:     120.20 tokens/s
eval count:           753 token(s)
eval duration:        24.498851s
eval rate:            30.74 tokens/s

Maybe I am missing a config

Relevant log output

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.30.0

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering