ollama - 💡(How to fix) Fix /v1/chat/completions returns 400 "unexpected EOF" intermittently for cloud proxy models [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#16082Fetched 2026-05-11 03:13:20
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}} 2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038 2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}}

Fix Action

Workaround

Using the native /api/chat endpoint directly avoids the issue entirely, but this requires rewriting client code and loses OpenAI API compatibility.

Code Example

# This fails intermittently with 400 "unexpected EOF":
curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'

# The native /api/chat endpoint always works:
curl http://127.0.0.1:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"stream":false,"options":{"num_predict":1}}'

---

2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}}
2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038
2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}}
RAW_BUFFERClick to expand / collapse

Bug Description

Ollama's /v1/chat/completions endpoint returns 400 "unexpected EOF" intermittently when using cloud proxy models (e.g. glm-5.1:cloud, deepseek-v4-pro:cloud). The error is not deterministic — the same request can succeed or fail depending on timing.

A second variant, 400 "cannot parse request body", also occurs on the same endpoint with the same models.

Environment

  • Ollama version: 0.23.2
  • OS: Linux (WSL2, Ubuntu 22.04)
  • GPU: RTX 4080 SUPER
  • Affected models: All :cloud models (remote proxy) — glm-5.1:cloud, deepseek-v4-pro:cloud, kimi-k2.6:cloud
  • Local models work fine
  • Endpoints affected: /v1/chat/completions only

Reproduction

# This fails intermittently with 400 "unexpected EOF":
curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'

# The native /api/chat endpoint always works:
curl http://127.0.0.1:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"stream":false,"options":{"num_predict":1}}'

Observations

  1. Error is intermittent: Same request succeeds most of the time, fails ~5-10% of the time
  2. Only affects cloud models: Local GGUF models never exhibit this behavior
  3. Only affects /v1/chat/completions: The native /api/chat endpoint always works correctly
  4. Error responses:
    • {"error": {"message": "unexpected EOF", "type": "invalid_request_error"}}
    • {"error": {"message": "cannot parse request body"}}
  5. Response time on error: 0-11ms (indicates the request is rejected before reaching the model, not a timeout)
  6. Larger requests fail more often: Requests with 20 tool definitions (~74KB body) fail more frequently than simple requests
  7. Two error variants: Both unexpected EOF and cannot parse request body occur on /v1/chat/completions

Frequency

Over 3 days of monitoring, we observed 36 occurrences across glm-5.1:cloud and deepseek-v4-pro:cloud models. Errors occur throughout the day, roughly every 30-60 minutes during active use.

Example log entries:

2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}}
2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038
2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}}

Impact

This makes cloud models unreliable for production use with any OpenAI-compatible client (Hermes AI gateway, Codex CLI, etc.), as they all use /v1/chat/completions.

Workaround

Using the native /api/chat endpoint directly avoids the issue entirely, but this requires rewriting client code and loses OpenAI API compatibility.

Related Issues

  • #15419 — Frequent 503 errors with cloud models
  • #16066 — Cloud models: tool_call.function.arguments truncated or 502 Bad Gateway

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix /v1/chat/completions returns 400 "unexpected EOF" intermittently for cloud proxy models [1 participants]