ollama - 💡(How to fix) Fix Allow to set the the type of K/V Cache in Modelfile

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

llama-server process has terminated: llama_init_from_model: V cache type q8_0 with block size 32 does not divide n_embd_head_v=72
RAW_BUFFERClick to expand / collapse

Please allow K/V Cache to be set from Modelfile as a parameter. At the moment, Ollama Server only support setting K/V Cache type using environment: KV_CACHE_TYPE.

Context: I'm using Ollama 0.30.0-rc17 with K/V Cache Type q8_0 to run Ollama Server. All models with parameters around 30B are fine. But for this GGUF from HuggingFace: https://huggingface.co/bartowski/moonshotai_Kimi-Linear-48B-A3B-Instruct-GGUF, I'm experiencing an issue:

llama-server process has terminated: llama_init_from_model: V cache type q8_0 with block size 32 does not divide n_embd_head_v=72

This mean that I will need to use KV_CACHE_TYPE=f16 to resolve the issue with Kimi Linear. But doing so will require me to reduce the context size for other models by a half in order to fit them in VRAM.

I tried to add KV_CACHE in Modelfile here https://github.com/lukaz17/ollama/tree/kv-cache-type-per-model and it is very straight forward.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING