ollama - 💡(How to fix) Fix ollama launch claude - wrong model context window [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15013Fetched 2026-04-08 01:17:17
View on GitHub
Comments
4
Participants
2
Timeline
10
Reactions
0
Timeline (top)
commented ×4mentioned ×2subscribed ×2closed ×1

Error Message

time=2026-03-22T18:17:43.929Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=32768 prompt=55247 keep=4 new=32768

Code Example

❯ ollama ps
NAME            ID              SIZE      PROCESSOR    CONTEXT    UNTIL
qwen3.5:0.8b    f3817196d142    2.8 GB    100% GPU     32768      4 minutes from now

---

/context
Context Usage
     ⛀ ⛁ ⛁ ⛀ ⛀   qwen3.5:0.8b · 24k/200k tokens (12%)
     ⛀ ⛁ ⛁ ⛁ ⛁
     ⛶ ⛶ ⛶ ⛶ ⛶   Estimated usage by category
     ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System prompt: 3.1k tokens (1.6%)
     ⛶ ⛝ ⛝ ⛝ ⛝   ⛁ System tools: 15.5k tokens (7.8%)
MCP tools: 4.5k tokens (2.3%)
Custom agents: 64 tokens (0.0%)
Skills: 393 tokens (0.2%)
Messages: 32.5k tokens (16.3%)
Free space: 111k (55.4%)
Autocompact buffer: 33k tokens (16.5%)

---

ollama launch claude --model qwen3.5:0.8b

---

GIN] 2026/03/22 - 18:17:30 | 200 | 24.644508083s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-03-22T18:17:43.929Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=32768 prompt=55247 keep=4 new=32768
RAW_BUFFERClick to expand / collapse

What is the issue?

Claude is not recognising model's context window

❯ ollama ps
NAME            ID              SIZE      PROCESSOR    CONTEXT    UNTIL
qwen3.5:0.8b    f3817196d142    2.8 GB    100% GPU     32768      4 minutes from now
 /context
  ⎿  Context Usage
     ⛀ ⛁ ⛁ ⛀ ⛀   qwen3.5:0.8b · 24k/200k tokens (12%)
     ⛀ ⛁ ⛁ ⛁ ⛁
     ⛶ ⛶ ⛶ ⛶ ⛶   Estimated usage by category
     ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System prompt: 3.1k tokens (1.6%)
     ⛶ ⛝ ⛝ ⛝ ⛝   ⛁ System tools: 15.5k tokens (7.8%)
                 ⛁ MCP tools: 4.5k tokens (2.3%)
                 ⛁ Custom agents: 64 tokens (0.0%)
                 ⛁ Skills: 393 tokens (0.2%)
                 ⛁ Messages: 32.5k tokens (16.3%)
                 ⛶ Free space: 111k (55.4%)
                 ⛝ Autocompact buffer: 33k tokens (16.5%)

to reproduce

ollama launch claude --model qwen3.5:0.8b

and play for a bit to fill up the context window

Relevant log output

GIN] 2026/03/22 - 18:17:30 | 200 | 24.644508083s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-03-22T18:17:43.929Z level=WARN source=runner.go:187 msg="truncating input prompt" limit=32768 prompt=55247 keep=4 new=32768

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

ollama version is 0.18.2

extent analysis

Fix Plan

The issue is caused by the context window being filled up, causing Claude to not recognize the model's context. To fix this, we need to increase the context window size or implement a mechanism to manage the context window size dynamically.

Here are the steps to fix the issue:

  • Increase the context window size by setting the CONTEXT_WINDOW_SIZE environment variable:
    • Run the following command: export CONTEXT_WINDOW_SIZE=65536
  • Alternatively, you can modify the ollama configuration file to increase the context window size:
    • Add the following line to the ollama.cfg file: context_window_size = 65536
  • If you want to implement a dynamic context window size management, you can modify the runner.go file to truncate the input prompt based on the available context window size:
    • Modify the truncating input prompt logic in the runner.go file to use a dynamic threshold based on the available context window size

Example code snippet to modify the runner.go file:

// ...
if len(prompt) > contextWindowSize {
    // Calculate the available context window size
    availableWindowSize := contextWindowSize - len(systemPrompt) - len(systemTools) - len(mcpTools) - len(customAgents) - len(skills) - len(messages)
    // Truncate the input prompt based on the available context window size
    prompt = prompt[:availableWindowSize]
    log.Warn("truncating input prompt", zap.Int("limit", availableWindowSize), zap.Int("keep", 4), zap.Int("new", availableWindowSize))
}
// ...

Verification

To verify that the fix worked, you can:

  • Check the context window usage by running the ollama ps command
  • Verify that the context window size has been increased by checking the ollama.cfg file or the environment variable CONTEXT_WINDOW_SIZE
  • Test the dynamic context window size management by filling up the context window and verifying that the input prompt is truncated correctly

Extra Tips

  • Make sure to update the ollama version to the latest version to ensure that you have the latest fixes and features.
  • Consider implementing a mechanism to monitor the context window size and alert when it reaches a certain threshold to prevent issues like this in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING