claude-code - 💡(How to fix) Fix [BUG] Auto-mode permission levels don't compact context.

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

Code Example

[GIN] 2026/05/29 - 10:22:09 | 200 |   19.0036097s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-29T10:22:09.439-04:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000
time=2026-05-29T10:22:09.439-04:00 level=DEBUG source=sched.go:309 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 duration=2562047h47m16.854775807s
time=2026-05-29T10:22:09.439-04:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 refCount=0
time=2026-05-29T10:22:09.558-04:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b
time=2026-05-29T10:22:09.579-04:00 level=DEBUG source=server.go:1550 msg="completion request" images=0 prompt=95825 format=""

---

[GIN] 2026/05/29 - 10:23:46 | 200 |          1m5s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-29T10:23:46.177-04:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000
time=2026-05-29T10:23:46.177-04:00 level=DEBUG source=sched.go:309 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 duration=2562047h47m16.854775807s
time=2026-05-29T10:23:46.177-04:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 refCount=0
time=2026-05-29T10:23:46.646-04:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b
time=2026-05-29T10:23:46.748-04:00 level=DEBUG source=server.go:1550 msg="completion request" images=0 prompt=260051 format=""

---
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

I'm actually using Ollama via claude code using ollama launch claude and I am running qwen3.6-27b-q8_0 locally on my GPU and I set it to auto-mode to perform tasks autonomously and I can't help but notice a huge slowdown between authorization when auto-mode reviews an action.

I did some digging and I found out that unlike the main agent, auto-mode actually doesn't compact the context nor inherit the compacted context, leading to a steadily bloated context that gradually slows down the generation speed of the auto-mode model.

This is a huge slowdown, so I will show you the ollama logs to prove it:

Main Agent (95K tokens):

[GIN] 2026/05/29 - 10:22:09 | 200 |   19.0036097s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-29T10:22:09.439-04:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000
time=2026-05-29T10:22:09.439-04:00 level=DEBUG source=sched.go:309 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 duration=2562047h47m16.854775807s
time=2026-05-29T10:22:09.439-04:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 refCount=0
time=2026-05-29T10:22:09.558-04:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b
time=2026-05-29T10:22:09.579-04:00 level=DEBUG source=server.go:1550 msg="completion request" images=0 prompt=95825 format=""

Auto-mode agent (260K+ tokens!!!):

[GIN] 2026/05/29 - 10:23:46 | 200 |          1m5s |       127.0.0.1 | POST     "/v1/messages?beta=true"
time=2026-05-29T10:23:46.177-04:00 level=DEBUG source=sched.go:404 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000
time=2026-05-29T10:23:46.177-04:00 level=DEBUG source=sched.go:309 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 duration=2562047h47m16.854775807s
time=2026-05-29T10:23:46.177-04:00 level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.6:27b-q8_0 runner.inference="[{ID:GPU-cfb0ba39-843d-1317-9aad-0e9a190e6dc2 Library:CUDA}]" runner.size="38.5 GiB" runner.vram="38.5 GiB" runner.parallel=1 runner.pid=62588 runner.model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b runner.num_ctx=128000 refCount=0
time=2026-05-29T10:23:46.646-04:00 level=DEBUG source=sched.go:672 msg="evaluating already loaded" model=H:\ai\ollama\models\blobs\sha256-005f96c1e053bc16570f6a9e848847dcdd85b11d0e09e10e7865ce0316a17b5b
time=2026-05-29T10:23:46.748-04:00 level=DEBUG source=server.go:1550 msg="completion request" images=0 prompt=260051 format=""

Unless there's any way to remedy this on my end, I recommend this be looked into. Its causing huge slowdown between processes.

What Should Happen?

Auto-mode should review the request about as fast as, if not faster than, the main agent.

Error Messages/Logs

Steps to Reproduce

  • In a VSCode terminal, run ollama serve to start a server and obtain access to the logs. Make sure to set the OLLAMA_DEBUG env variable to 1 beforehand and the most recent ollama instance is running with that setting enabled.
  • Run ollama launch claude - choose any model inside a separate VSCode terminal.
  • Inside the terminal, press shift+tab to cycle between permissions until you reach auto-mode.
  • Give the model a request that requires auto-mode to review. Or better yet, make the model plan a complex request.
  • Let the model run autonomously until it begins compacting context. Wait for a task that requires auto-mode review.
  • Once the review finishes, check the ollama server logs immediately to compare the difference in prompt size.

Claude Model

Other

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.156 (Claude Code)

Platform

Other

Operating System

Windows

Terminal/Shell

VS Code integrated terminal

Additional Information

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Auto-mode permission levels don't compact context.