claude-code - 💡(How to fix) Fix [BUG] Claude Code hangs with local Ollama on a trivial prompt, while direct /v1/messages works

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

Code Example

Environment:
- Claude Code: 2.1.114
- Ollama: 0.21
- OS: Ubuntu 25.10
- Kernel: 6.17.0-20-generic
- CPU: AMD Ryzen 7 6800H
- CPU cores: 8
- Logical CPUs / threads: 16
- RAM: 30 GiB
- GPU: no discrete GPU
- Graphics hardware present on the machine: AMD Radeon 680M integrated graphics
- Actual inference mode on this setup: CPU-only

Ollama systemd override:
[Service]
User=lupo
Group=lupo
Environment="HOME=/home/lupo"
Environment="OLLAMA_MODELS=/home/lupo/.ollama/models"
Environment="OLLAMA_CONTEXT_LENGTH=65536"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_KEEP_ALIVE=1m"
CPUAccounting=yes
CPUQuota=400%
CPUQuotaPeriodSec=10ms
Nice=10
IOSchedulingClass=idle

Claude Code / Anthropic-compatible configuration:
- ANTHROPIC_AUTH_TOKEN=ollama
- ANTHROPIC_BASE_URL=http://127.0.0.1:11434

Main reproduction model:
- qwen2.5-coder:7b-opencode-32k

Direct Ollama Anthropic-compatible request that succeeds:
curl http://127.0.0.1:11434/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: ollama' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{
    "model": "qwen2.5-coder:7b-opencode-32k",
    "max_tokens": 64,
    "messages": [
      {
        "role": "user",
        "content": "ciao"
      }
    ],
    "stream": false
  }'

Response:
{"id":"msg_e1a1726675e5761fd7896619","type":"message","role":"assistant","model":"qwen2.5-coder:7b-opencode-32k","content":[{"type":"text","text":"Ciao! Come posso aiutarti oggi?"}],"stop_reason":"end_turn","usage":{"input_tokens":31,"output_tokens":12}}

Relevant Ollama log line from successful direct test:
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Relevant Ollama log line observed during Claude Code usage:
[GIN] 2026/04/20 - 19:35:59 | 500 | 5m57s | 127.0.0.1 | POST "/v1/messages?beta=true"

Additional successful direct-test model/runtime details:
qwen2.context_length = 32768
llama_context: n_ctx = 32768
time=2026-04-20T19:45:17.491+02:00 level=INFO source=server.go:1402 msg="llama runner started in 3.29 seconds"
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Claude Code hangs when using a local Ollama backend through Anthropic-compatible configuration.

The problem happens even with a trivial prompt like ciao.

Important detail: the same Ollama model responds correctly when called directly through Ollama's Anthropic-compatible /v1/messages endpoint, so this does not look like a basic Ollama connectivity or basic /v1/messages compatibility problem.

This reproduces with local Ollama models on my setup and does not appear to be specific to a single model.

What Should Happen?

Claude Code should return a normal response for a trivial prompt, just like the same model does when queried directly through Ollama's Anthropic-compatible /v1/messages endpoint.

Error Messages/Logs

Environment:
- Claude Code: 2.1.114
- Ollama: 0.21
- OS: Ubuntu 25.10
- Kernel: 6.17.0-20-generic
- CPU: AMD Ryzen 7 6800H
- CPU cores: 8
- Logical CPUs / threads: 16
- RAM: 30 GiB
- GPU: no discrete GPU
- Graphics hardware present on the machine: AMD Radeon 680M integrated graphics
- Actual inference mode on this setup: CPU-only

Ollama systemd override:
[Service]
User=lupo
Group=lupo
Environment="HOME=/home/lupo"
Environment="OLLAMA_MODELS=/home/lupo/.ollama/models"
Environment="OLLAMA_CONTEXT_LENGTH=65536"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_KEEP_ALIVE=1m"
CPUAccounting=yes
CPUQuota=400%
CPUQuotaPeriodSec=10ms
Nice=10
IOSchedulingClass=idle

Claude Code / Anthropic-compatible configuration:
- ANTHROPIC_AUTH_TOKEN=ollama
- ANTHROPIC_BASE_URL=http://127.0.0.1:11434

Main reproduction model:
- qwen2.5-coder:7b-opencode-32k

Direct Ollama Anthropic-compatible request that succeeds:
curl http://127.0.0.1:11434/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: ollama' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{
    "model": "qwen2.5-coder:7b-opencode-32k",
    "max_tokens": 64,
    "messages": [
      {
        "role": "user",
        "content": "ciao"
      }
    ],
    "stream": false
  }'

Response:
{"id":"msg_e1a1726675e5761fd7896619","type":"message","role":"assistant","model":"qwen2.5-coder:7b-opencode-32k","content":[{"type":"text","text":"Ciao! Come posso aiutarti oggi?"}],"stop_reason":"end_turn","usage":{"input_tokens":31,"output_tokens":12}}

Relevant Ollama log line from successful direct test:
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Relevant Ollama log line observed during Claude Code usage:
[GIN] 2026/04/20 - 19:35:59 | 500 | 5m57s | 127.0.0.1 | POST "/v1/messages?beta=true"

Additional successful direct-test model/runtime details:
qwen2.context_length = 32768
llama_context: n_ctx = 32768
time=2026-04-20T19:45:17.491+02:00 level=INFO source=server.go:1402 msg="llama runner started in 3.29 seconds"
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Steps to Reproduce

  1. Start Ollama locally on 127.0.0.1:11434

  2. Configure Claude Code to use Ollama through:

  3. Launch Claude Code with a local model, for example: claude --model qwen2.5-coder:7b-opencode-32k

  4. Send a trivial prompt: ciao

  5. Observe that Claude Code hangs and does not return a response in the UI.

  6. Compare this with a direct Ollama Anthropic-compatible request to /v1/messages using the same model, which succeeds and returns a normal response.

Claude Model

Other

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.114

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

No response

extent analysis

TL;DR

The issue can likely be resolved by modifying the Claude Code configuration to match the successful direct Ollama Anthropic-compatible request, specifically by removing the beta=true parameter.

Guidance

  • Investigate the difference in requests between the successful direct Ollama test and the failed Claude Code usage, focusing on the beta=true parameter in the failed request.
  • Verify that the Ollama model and configuration are correctly set up and match the successful direct test.
  • Attempt to reproduce the issue with a minimal configuration to isolate the cause.
  • Consider modifying the Claude Code configuration to remove or modify the beta=true parameter to match the successful direct Ollama request.

Example

No code snippet is provided as the issue seems to be related to configuration and request parameters rather than code.

Notes

The provided information suggests that the issue may be related to the beta=true parameter in the Claude Code request, which is not present in the successful direct Ollama test. However, without further information or debugging, it is difficult to determine the exact cause.

Recommendation

Apply a workaround by modifying the Claude Code configuration to remove or modify the beta=true parameter, as this seems to be the main difference between the successful direct Ollama test and the failed Claude Code usage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Claude Code hangs with local Ollama on a trivial prompt, while direct /v1/messages works