ollama - 💡(How to fix) Fix Allow to set the the type of K/V Cache in Modelfile

ollama2026-05-21 06:36:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

llama-server process has terminated: llama_init_from_model: V cache type q8_0 with block size 32 does not divide n_embd_head_v=72

RAW_BUFFERClick to expand / collapse

Please allow K/V Cache to be set from Modelfile as a parameter. At the moment, Ollama Server only support setting K/V Cache type using environment: KV_CACHE_TYPE.

Context: I'm using Ollama 0.30.0-rc17 with K/V Cache Type q8_0 to run Ollama Server. All models with parameters around 30B are fine. But for this GGUF from HuggingFace: https://huggingface.co/bartowski/moonshotai_Kimi-Linear-48B-A3B-Instruct-GGUF, I'm experiencing an issue:

llama-server process has terminated: llama_init_from_model: V cache type q8_0 with block size 32 does not divide n_embd_head_v=72

This mean that I will need to use KV_CACHE_TYPE=f16 to resolve the issue with Kimi Linear. But doing so will require me to reduce the context size for other models by a half in order to fit them in VRAM.

I tried to add KV_CACHE in Modelfile here https://github.com/lukaz17/ollama/tree/kv-cache-type-per-model and it is very straight forward.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Allow to set the the type of K/V Cache in Modelfile

Recommended Tools

GitHub issue graph ai analysis

Code Example

Still need to ship something?

TRENDING