claude-code - 💡(How to fix) Fix [BUG] Claude Code hangs with local Ollama on a trivial prompt, while direct /v1/messages works

Code Example

Environment:
- Claude Code: 2.1.114
- Ollama: 0.21
- OS: Ubuntu 25.10
- Kernel: 6.17.0-20-generic
- CPU: AMD Ryzen 7 6800H
- CPU cores: 8
- Logical CPUs / threads: 16
- RAM: 30 GiB
- GPU: no discrete GPU
- Graphics hardware present on the machine: AMD Radeon 680M integrated graphics
- Actual inference mode on this setup: CPU-only

Ollama systemd override:
[Service]
User=lupo
Group=lupo
Environment="HOME=/home/lupo"
Environment="OLLAMA_MODELS=/home/lupo/.ollama/models"
Environment="OLLAMA_CONTEXT_LENGTH=65536"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_KEEP_ALIVE=1m"
CPUAccounting=yes
CPUQuota=400%
CPUQuotaPeriodSec=10ms
Nice=10
IOSchedulingClass=idle

Claude Code / Anthropic-compatible configuration:
- ANTHROPIC_AUTH_TOKEN=ollama
- ANTHROPIC_BASE_URL=http://127.0.0.1:11434

Main reproduction model:
- qwen2.5-coder:7b-opencode-32k

Direct Ollama Anthropic-compatible request that succeeds:
curl http://127.0.0.1:11434/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: ollama' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{
    "model": "qwen2.5-coder:7b-opencode-32k",
    "max_tokens": 64,
    "messages": [
      {
        "role": "user",
        "content": "ciao"
      }
    ],
    "stream": false
  }'

Response:
{"id":"msg_e1a1726675e5761fd7896619","type":"message","role":"assistant","model":"qwen2.5-coder:7b-opencode-32k","content":[{"type":"text","text":"Ciao! Come posso aiutarti oggi?"}],"stop_reason":"end_turn","usage":{"input_tokens":31,"output_tokens":12}}

Relevant Ollama log line from successful direct test:
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Relevant Ollama log line observed during Claude Code usage:
[GIN] 2026/04/20 - 19:35:59 | 500 | 5m57s | 127.0.0.1 | POST "/v1/messages?beta=true"

Additional successful direct-test model/runtime details:
qwen2.context_length = 32768
llama_context: n_ctx = 32768
time=2026-04-20T19:45:17.491+02:00 level=INFO source=server.go:1402 msg="llama runner started in 3.29 seconds"
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Claude Code hangs when using a local Ollama backend through Anthropic-compatible configuration.

The problem happens even with a trivial prompt like ciao.

Important detail: the same Ollama model responds correctly when called directly through Ollama's Anthropic-compatible /v1/messages endpoint, so this does not look like a basic Ollama connectivity or basic /v1/messages compatibility problem.

This reproduces with local Ollama models on my setup and does not appear to be specific to a single model.

What Should Happen?

Claude Code should return a normal response for a trivial prompt, just like the same model does when queried directly through Ollama's Anthropic-compatible /v1/messages endpoint.

Error Messages/Logs

Environment:
- Claude Code: 2.1.114
- Ollama: 0.21
- OS: Ubuntu 25.10
- Kernel: 6.17.0-20-generic
- CPU: AMD Ryzen 7 6800H
- CPU cores: 8
- Logical CPUs / threads: 16
- RAM: 30 GiB
- GPU: no discrete GPU
- Graphics hardware present on the machine: AMD Radeon 680M integrated graphics
- Actual inference mode on this setup: CPU-only

Ollama systemd override:
[Service]
User=lupo
Group=lupo
Environment="HOME=/home/lupo"
Environment="OLLAMA_MODELS=/home/lupo/.ollama/models"
Environment="OLLAMA_CONTEXT_LENGTH=65536"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_KEEP_ALIVE=1m"
CPUAccounting=yes
CPUQuota=400%
CPUQuotaPeriodSec=10ms
Nice=10
IOSchedulingClass=idle

Claude Code / Anthropic-compatible configuration:
- ANTHROPIC_AUTH_TOKEN=ollama
- ANTHROPIC_BASE_URL=http://127.0.0.1:11434

Main reproduction model:
- qwen2.5-coder:7b-opencode-32k

Direct Ollama Anthropic-compatible request that succeeds:
curl http://127.0.0.1:11434/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: ollama' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{
    "model": "qwen2.5-coder:7b-opencode-32k",
    "max_tokens": 64,
    "messages": [
      {
        "role": "user",
        "content": "ciao"
      }
    ],
    "stream": false
  }'

Response:
{"id":"msg_e1a1726675e5761fd7896619","type":"message","role":"assistant","model":"qwen2.5-coder:7b-opencode-32k","content":[{"type":"text","text":"Ciao! Come posso aiutarti oggi?"}],"stop_reason":"end_turn","usage":{"input_tokens":31,"output_tokens":12}}

Relevant Ollama log line from successful direct test:
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Relevant Ollama log line observed during Claude Code usage:
[GIN] 2026/04/20 - 19:35:59 | 500 | 5m57s | 127.0.0.1 | POST "/v1/messages?beta=true"

Additional successful direct-test model/runtime details:
qwen2.context_length = 32768
llama_context: n_ctx = 32768
time=2026-04-20T19:45:17.491+02:00 level=INFO source=server.go:1402 msg="llama runner started in 3.29 seconds"
[GIN] 2026/04/20 - 19:45:21 | 200 | 7.100566535s | 127.0.0.1 | POST "/v1/messages"

Steps to Reproduce

Start Ollama locally on 127.0.0.1:11434
Configure Claude Code to use Ollama through:
- ANTHROPIC_AUTH_TOKEN=ollama
- ANTHROPIC_BASE_URL=http://127.0.0.1:11434
Launch Claude Code with a local model, for example: claude --model qwen2.5-coder:7b-opencode-32k
Send a trivial prompt: ciao
Observe that Claude Code hangs and does not return a response in the UI.
Compare this with a direct Ollama Anthropic-compatible request to /v1/messages using the same model, which succeeds and returns a normal response.

Claude Model

Other

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.114

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

No response

extent analysis

TL;DR

The issue can likely be resolved by modifying the Claude Code configuration to match the successful direct Ollama Anthropic-compatible request, specifically by removing the beta=true parameter.

Guidance

Investigate the difference in requests between the successful direct Ollama test and the failed Claude Code usage, focusing on the beta=true parameter in the failed request.
Verify that the Ollama model and configuration are correctly set up and match the successful direct test.
Attempt to reproduce the issue with a minimal configuration to isolate the cause.
Consider modifying the Claude Code configuration to remove or modify the beta=true parameter to match the successful direct Ollama request.

Example

No code snippet is provided as the issue seems to be related to configuration and request parameters rather than code.

Notes

The provided information suggests that the issue may be related to the beta=true parameter in the Claude Code request, which is not present in the successful direct Ollama test. However, without further information or debugging, it is difficult to determine the exact cause.

Recommendation

Apply a workaround by modifying the Claude Code configuration to remove or modify the beta=true parameter, as this seems to be the main difference between the successful direct Ollama test and the failed Claude Code usage.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Claude Code hangs with local Ollama on a trivial prompt, while direct /v1/messages works

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Claude Code hangs with local Ollama on a trivial prompt, while direct /v1/messages works

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING