ollama - 💡(How to fix) Fix /v1/chat/completions returns 400 "unexpected EOF" intermittently for cloud proxy models [1 participants]

Fyryxm · 2026-05-10T13:21:54Z

[ollama] Bug Description Ollama's /v1/chat/completions endpoint returns 400 "unexpected EOF" intermittently when using cloud proxy models e.g. glm-5.1:cloud ,… ## Workaround Using the native `/api/chat` endpoint directly avoids the issue entirely, but this requires rewriting client code and loses OpenAI API compatibility. ## Bug Description Ollama's `/v1/chat/completions` endpoint returns `400 "unexpected EOF"` intermittently when using cloud proxy models (e.g. `glm-5.1:cloud`, `deepseek-v4-pro:cloud`). The error is not deterministic — the same request can succeed or fail depending on timing. A second variant, `400 "cannot parse request body"`, also occurs on the same endpoint with the same models. ## Environment - **Ollama version**: 0.23.2 - **OS**: Linux (WSL2, Ubuntu 22.04) - **GPU**: RTX 4080 SUPER - **Affected models**: All `:cloud` models (remote proxy) — `glm-5.1:cloud`, `deepseek-v4-pro:cloud`, `kimi-k2.6:cloud` - **Local models work fine** - **Endpoints affected**: `/v1/chat/completions` only ## Reproduction ```bash # This fails intermittently with 400 "unexpected EOF": curl http://127.0.0.1:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"max_tokens":1}' # The native /api/chat endpoint always works: curl http://127.0.0.1:11434/api/chat \ -H "Content-Type: application/json" \ -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"stream":false,"options":{"num_predict":1}}' ``` ## Observations 1. **Error is intermittent**: Same request succeeds most of the time, fails ~5-10% of the time 2. **Only affects cloud models**: Local GGUF models never exhibit this behavior 3. **Only affects `/v1/chat/completions`**: The native `/api/chat` endpoint always works correctly 4. **Error responses**: - `{"error": {"message": "unexpected EOF", "type": "invalid_request_error"}}` - `{"error": {"message": "cannot parse request body"}}` 5. **Response time on error**: 0-11ms (indicates the request is rejected before reaching the model, not a timeout) 6. **Larger requests fail more often**: Requests with 20 tool definitions (~74KB body) fail more frequently than simple requests 7. **Two error variants**: Both `unexpected EOF` and `cannot parse request body` occur on `/v1/chat/completions` ## Frequency Over 3 days of monitoring, we observed **36 occurrences** across `glm-5.1:cloud` and `deepseek-v4-pro:cloud` models. Errors occur throughout the day, roughly every 30-60 minutes during active use. Example log entries: ``` 2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}} 2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038 2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}} ``` ## Impact This makes cloud models unreliable for production use with any OpenAI-compatible client (Hermes AI gateway, Codex CLI, etc.), as they all use `/v1/chat/completions`. ## Workaround Using the native `/api/chat` endpoint directly avoids the issue entirely, but this requires rewriting client code and loses OpenAI API compatibility. ## Related Issues - #15419 — Frequent 503 errors with cloud models - #16066 — Cloud models: tool_call.function.arguments truncated or 502 Bad Gateway

ollama2026-05-10 13:21:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#16082•Fetched 2026-05-11 03:13:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Fyryxm

Participants

Fyryxm

Error Message

2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}} 2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038 2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}}

Fix Action

Workaround

Using the native /api/chat endpoint directly avoids the issue entirely, but this requires rewriting client code and loses OpenAI API compatibility.

Code Example

# This fails intermittently with 400 "unexpected EOF":
curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'

# The native /api/chat endpoint always works:
curl http://127.0.0.1:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"stream":false,"options":{"num_predict":1}}'

---

2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}}
2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038
2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}}

RAW_BUFFERClick to expand / collapse

Bug Description

Ollama's /v1/chat/completions endpoint returns 400 "unexpected EOF" intermittently when using cloud proxy models (e.g. glm-5.1:cloud, deepseek-v4-pro:cloud). The error is not deterministic — the same request can succeed or fail depending on timing.

A second variant, 400 "cannot parse request body", also occurs on the same endpoint with the same models.

Environment

Ollama version: 0.23.2
OS: Linux (WSL2, Ubuntu 22.04)
GPU: RTX 4080 SUPER
Affected models: All :cloud models (remote proxy) — glm-5.1:cloud, deepseek-v4-pro:cloud, kimi-k2.6:cloud
Local models work fine
Endpoints affected: /v1/chat/completions only

Reproduction

# This fails intermittently with 400 "unexpected EOF":
curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"max_tokens":1}'

# The native /api/chat endpoint always works:
curl http://127.0.0.1:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-v4-pro:cloud","messages":[{"role":"user","content":"hi"}],"stream":false,"options":{"num_predict":1}}'

Observations

Error is intermittent: Same request succeeds most of the time, fails ~5-10% of the time
Only affects cloud models: Local GGUF models never exhibit this behavior
Only affects /v1/chat/completions: The native /api/chat endpoint always works correctly
Error responses:
- {"error": {"message": "unexpected EOF", "type": "invalid_request_error"}}
- {"error": {"message": "cannot parse request body"}}
Response time on error: 0-11ms (indicates the request is rejected before reaching the model, not a timeout)
Larger requests fail more often: Requests with 20 tool definitions (~74KB body) fail more frequently than simple requests
Two error variants: Both unexpected EOF and cannot parse request body occur on /v1/chat/completions

Frequency

Over 3 days of monitoring, we observed 36 occurrences across glm-5.1:cloud and deepseek-v4-pro:cloud models. Errors occur throughout the day, roughly every 30-60 minutes during active use.

Example log entries:

2026-05-07 20:44:58 ERROR: Non-retryable client error: Error code: 400 - {'error': {'message': 'unexpected EOF', ...}}
2026-05-10 20:16:58 ERROR: API call failed after 3 retries. HTTP 400: unexpected EOF | provider=ollama-cloud model=deepseek-v4-pro:cloud msgs=2 tokens=~4,038
2026-05-10 20:54:05 ERROR: HTTP 400: Error code: 400 - {'error': {'message': 'cannot parse request body'}}

Impact

This makes cloud models unreliable for production use with any OpenAI-compatible client (Hermes AI gateway, Codex CLI, etc.), as they all use /v1/chat/completions.

Workaround

Using the native /api/chat endpoint directly avoids the issue entirely, but this requires rewriting client code and loses OpenAI API compatibility.

Related Issues

#15419 — Frequent 503 errors with cloud models
#16066 — Cloud models: tool_call.function.arguments truncated or 502 Bad Gateway

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix /v1/chat/completions returns 400 "unexpected EOF" intermittently for cloud proxy models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

Code Example

Bug Description

Environment

Reproduction

Observations

Frequency

Impact

Workaround

Related Issues

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix /v1/chat/completions returns 400 "unexpected EOF" intermittently for cloud proxy models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

Code Example

Bug Description

Environment

Reproduction

Observations

Frequency

Impact

Workaround

Related Issues

Still need to ship something?

RELATED_DISCOVERY

TRENDING