ollama - 💡(How to fix) Fix Bug: gemma 4 model warmup timeout

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

May 17 12:36:37 df-mini02 ollama[1667]: .time=2026-05-17T12:36:37.195Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error" May 17 12:36:39 df-mini02 ollama[1667]: .time=2026-05-17T12:36:39.489Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error" May 17 12:37:37 df-mini02 ollama[1667]: time=2026-05-17T12:37:37.237Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"

Code Example

May 17 12:36:37 df-mini02 ollama[1667]: .time=2026-05-17T12:36:37.195Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"
May 17 12:36:39 df-mini02 ollama[1667]: ..time=2026-05-17T12:36:39.110Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server not responding"
May 17 12:36:39 df-mini02 ollama[1667]: .time=2026-05-17T12:36:39.489Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"
May 17 12:36:57 df-mini02 ollama[1667]: ...............................................
May 17 12:36:57 df-mini02 ollama[1667]: common_init_result: added <eos> logit bias = -inf
May 17 12:36:57 df-mini02 ollama[1667]: common_init_result: added <|tool_response> logit bias = -inf
May 17 12:36:57 df-mini02 ollama[1667]: common_init_result: added <turn|> logit bias = -inf
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: constructing llama_context
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_seq_max     = 1
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ctx         = 65536
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ctx_seq     = 65536
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_batch       = 512
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ubatch      = 512
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: causal_attn   = 1
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: flash_attn    = enabled
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: kv_unified    = false
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: freq_base     = 1000000.0
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: freq_scale    = 1
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ctx_seq (65536) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: Vulkan_Host  output buffer size =     1.00 MiB
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache_iswa: creating non-SWA KV cache, size = 65536 cells
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache:    Vulkan0 KV buffer size =   360.00 MiB
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache: size =  360.00 MiB ( 65536 cells,   5 layers,  1/1 seqs), K (q4_0):  180.00 MiB, V (q4_0):  180.00 MiB
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 512
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_v = 1, n_embd_head_k_all = 512
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache_iswa: creating     SWA KV cache, size = 1536 cells
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache:    Vulkan0 KV buffer size =    84.38 MiB
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache: size =   84.38 MiB (  1536 cells,  25 layers,  1/1 seqs), K (q4_0):   42.19 MiB, V (q4_0):   42.19 MiB
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 256
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_v = 1, n_embd_head_k_all = 256
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: reserving ...
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: resolving fused Gated Delta Net support:
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: fused Gated Delta Net (autoregressive) enabled
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: fused Gated Delta Net (chunked) enabled
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve:    Vulkan0 compute buffer size =   517.50 MiB
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: Vulkan_Host compute buffer size =   143.31 MiB
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: graph nodes  = 3007
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: graph splits = 2
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: reserve took 126.37 ms, sched copies = 1
May 17 12:36:58 df-mini02 ollama[1667]: common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
May 17 12:37:36 df-mini02 ollama[1667]: time=2026-05-17T12:37:36.817Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server not responding"
May 17 12:37:37 df-mini02 ollama[1667]: time=2026-05-17T12:37:37.237Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"

---
RAW_BUFFERClick to expand / collapse

What is the issue?

I was loading Gemma 4 26B, and it was happened. before 0.30.0 was loaded in 1min, but after 0.30.0 is timeout. BC-250, Vulkan, ubuntu 26.04, 0.30.0, Gemma 4 26B, no mmproj, no mtp

May 17 12:36:37 df-mini02 ollama[1667]: .time=2026-05-17T12:36:37.195Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"
May 17 12:36:39 df-mini02 ollama[1667]: ..time=2026-05-17T12:36:39.110Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server not responding"
May 17 12:36:39 df-mini02 ollama[1667]: .time=2026-05-17T12:36:39.489Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"
May 17 12:36:57 df-mini02 ollama[1667]: ...............................................
May 17 12:36:57 df-mini02 ollama[1667]: common_init_result: added <eos> logit bias = -inf
May 17 12:36:57 df-mini02 ollama[1667]: common_init_result: added <|tool_response> logit bias = -inf
May 17 12:36:57 df-mini02 ollama[1667]: common_init_result: added <turn|> logit bias = -inf
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: constructing llama_context
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_seq_max     = 1
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ctx         = 65536
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ctx_seq     = 65536
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_batch       = 512
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ubatch      = 512
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: causal_attn   = 1
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: flash_attn    = enabled
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: kv_unified    = false
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: freq_base     = 1000000.0
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: freq_scale    = 1
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: n_ctx_seq (65536) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
May 17 12:36:57 df-mini02 ollama[1667]: llama_context: Vulkan_Host  output buffer size =     1.00 MiB
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache_iswa: creating non-SWA KV cache, size = 65536 cells
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache:    Vulkan0 KV buffer size =   360.00 MiB
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache: size =  360.00 MiB ( 65536 cells,   5 layers,  1/1 seqs), K (q4_0):  180.00 MiB, V (q4_0):  180.00 MiB
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 512
May 17 12:36:57 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_v = 1, n_embd_head_k_all = 512
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache_iswa: creating     SWA KV cache, size = 1536 cells
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache:    Vulkan0 KV buffer size =    84.38 MiB
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache: size =   84.38 MiB (  1536 cells,  25 layers,  1/1 seqs), K (q4_0):   42.19 MiB, V (q4_0):   42.19 MiB
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_k = 1, n_embd_head_k_all = 256
May 17 12:36:58 df-mini02 ollama[1667]: llama_kv_cache: attn_rot_v = 1, n_embd_head_k_all = 256
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: reserving ...
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: resolving fused Gated Delta Net support:
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: fused Gated Delta Net (autoregressive) enabled
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: fused Gated Delta Net (chunked) enabled
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve:    Vulkan0 compute buffer size =   517.50 MiB
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: Vulkan_Host compute buffer size =   143.31 MiB
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: graph nodes  = 3007
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: graph splits = 2
May 17 12:36:58 df-mini02 ollama[1667]: sched_reserve: reserve took 126.37 ms, sched copies = 1
May 17 12:36:58 df-mini02 ollama[1667]: common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
May 17 12:37:36 df-mini02 ollama[1667]: time=2026-05-17T12:37:36.817Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server not responding"
May 17 12:37:37 df-mini02 ollama[1667]: time=2026-05-17T12:37:37.237Z level=INFO source=llama_server.go:886 msg="waiting for llama-server to become available" status="llm server error"

Relevant log output

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

ollama version is 0.30.0-rc17

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Bug: gemma 4 model warmup timeout