ollama - 💡(How to fix) Fix Intermittent connection reset on :cloud model proxy after upgrading 0.20.2 → 0.22.0 (no GIN log entry) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15910Fetched 2026-05-01 05:33:19
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

After upgrading the macOS Ollama app from 0.20.2 → 0.22.0 (no intermediate 0.21.x), the local daemon started intermittently RST-ing client connections on POST /v1/chat/completions requests targeting :cloud-suffixed models. The distinctive signature is that the failed requests never appear in the daemon's GIN access log — the daemon resets the TCP connection before its request handler runs, so the failure is invisible from the daemon's perspective.

A daemon restart (Quit Ollama → re-open) appears to clear the issue temporarily.

Error Message

Logs

The smoking gun: client-side failures vs server-side log gap

The Ollama daemon's GIN access log shows no entries during the 72-second window in which the client made 3 attempts that all received TCP RST.

Ollama server.log — chat/completions requests on 2026-04-30 (14:30–14:45 window):

HH:MM:SS STATUS TIMING 14:30:18 | 200 | 20.25s 14:30:38 | 200 | 20.21s 14:30:52 | 200 | 8.39s 14:30:56 | 200 | 2.36s 14:31:00 | 200 | 3.34s 14:31:04 | 200 | 3.91s 14:31:07 | 200 | 2.78s 14:31:10 | 200 | 2.64s 14:31:21 | 200 | 7.48s 14:31:30 | 200 | 8.16s 14:31:34 | 200 | 2.93s 14:31:39 | 200 | 4.87s 14:31:48 | 200 | 7.82s 14:32:09 | 200 | 20.34s 14:32:35 | 200 | 2.80s 14:32:45 | 200 | 8.27s 14:33:02 | 200 | 16.97s 14:33:12 | 200 | 8.81s 14:33:28 | 200 | 14.67s 14:33:45 | 200 | 14.69s 14:35:05 | 200 | 11.16s 14:35:12 | 200 | 7.78s 14:35:22 | 200 | 5.15s 14:35:36 | 200 | 866.0ms 14:35:38 | 502 | 1.61s 14:35:44 | 200 | 3.03s 14:35:55 | 200 | 11.11s ← LAST LOGGED REQUEST before the gap ───────────────────────────────────────────────────────────────────── ⏳ 72-second gap — NO entries in server.log During this gap, the client made 3 retry attempts. All 3 received TCP RST in <10s elapsed each. None appear here. ───────────────────────────────────────────────────────────────────── 14:37:07 | 200 | rest of day continues normally 14:37:55 | 200 | 14:38:44 | 200 | 14:38:45 | 200 | 14:38:46 | 200 | 14:38:47 | 200 | 14:39:03 | 200 | 14:39:04 | 200 | 14:39:06 | 200 | 14:39:12 | 200 | 14:39:37 | 200 |

Notable: the 502 at 14:35:38 was logged. So when the daemon's upstream returns an error, the GIN log records it normally. Only this RST class of failure is invisible.

Client-side trace (Hermes Agent — Python httpx + openai SDK)

The user's terminal output for one failed session:

⚠️ API call failed (attempt 1/3): ReadError 🔌 Provider: ollama-cloud Model: kimi-k2.6:cloud 🌐 Endpoint: http://127.0.0.1:11434/v1 📝 Error: [Errno 54] Connection reset by peer ⏱️ Elapsed: 0.08s Context: 2 msgs, ~15,405 tokens ⏳ Retrying in 2.9s (attempt 1/3)...

⚠️ API call failed (attempt 2/3): ReadError 🔌 Provider: ollama-cloud Model: kimi-k2.6:cloud 🌐 Endpoint: http://127.0.0.1:11434/v1 📝 Error: [Errno 54] Connection reset by peer ⏱️ Elapsed: 3.16s Context: 2 msgs, ~15,405 tokens ⏳ Retrying in 4.7s (attempt 2/3)...

⚠️ API call failed (attempt 3/3): ReadError 🔌 Provider: ollama-cloud Model: kimi-k2.6:cloud 🌐 Endpoint: http://127.0.0.1:11434/v1 📝 Error: [Errno 54] Connection reset by peer ⏱️ Elapsed: 8.07s Context: 2 msgs, ~15,405 tokens

⚠️ Max retries (3) exhausted — trying fallback... 🔄 Primary model failed — switching to fallback: openrouter/free via openrouter

Each attempt's elapsed time is the round-trip up to RST: 0.08 s on the first attempt is essentially "TCP connect succeeded, request body sent, RST received." Subsequent attempts include retry backoff time so are longer cumulatively.

Same machine. Same model. localhost connection — no network involvement.

Pattern of recurrence in client logs (one day)

The client's structured log shows the failure recurring throughout 2026-04-30, not just once:

2026-04-30 03:00:10 Fallback activated: kimi-k2.6:cloud → openrouter/free (cron task) 2026-04-30 14:19:24 Fallback activated: kimi-k2.6:cloud → openrouter/free 2026-04-30 14:19:40 Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html> 2026-04-30 14:20:51 Fallback activated: kimi-k2.6:cloud → openrouter/free 2026-04-30 14:21:10 Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html> 2026-04-30 14:33:33 Fallback activated: kimi-k2.6:cloud → openrouter/free 2026-04-30 14:36:02 Fallback activated: kimi-k2.6:cloud → openrouter/free ← the example above 2026-04-30 14:37:07 Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>

(The 400 errors on title-generation are a separate code path that hits https://ollama.com/v1/ directly rather than the local daemon — included for completeness; they're not the primary issue but demonstrate that the client is exercising both paths.)

Daemon was healthy throughout

~/.ollama/logs/app.log shows normal startup, no panics, no recovered errors, no "broken pipe" or EOF events at any point during the failure window. Greps across all rolled app logs (app.log, app-1.logapp-5.log) for panic|fatal|recovered|connection.*close|broken pipe|EOF return zero matches. From the daemon's perspective, nothing went wrong — it never knew about the 3 failed requests.

Current run startup trace (clean):

time=2026-04-30T14:23:13.009-07:00 level=INFO source=app_darwin.go:217 msg="starting Ollama" version=0.22.0 OS=Darwin/25.3.0 time=2026-04-30T14:23:13.013-07:00 level=INFO source=app.go:239 msg="initialized tools registry" time=2026-04-30T14:23:13.013-07:00 level=INFO source=app.go:254 msg="starting ollama server" time=2026-04-30T14:23:13.014-07:00 level=INFO source=app.go:285 msg="starting ui server" port=60600 time=2026-04-30T14:23:13.148-07:00 level=INFO source=updater_darwin.go:405 msg="using v13+ SMAppService for login registration"

Server config (relevant non-default-only excerpts):

OLLAMA_HOST=http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE=5m0s OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_QUEUE=512 OLLAMA_MAX_LOADED_MODELS=0 OLLAMA_NO_CLOUD=false OLLAMA_REMOTES=[ollama.com]

Root Cause

After upgrading the macOS Ollama app from 0.20.2 → 0.22.0 (no intermediate 0.21.x), the local daemon started intermittently RST-ing client connections on POST /v1/chat/completions requests targeting :cloud-suffixed models. The distinctive signature is that the failed requests never appear in the daemon's GIN access log — the daemon resets the TCP connection before its request handler runs, so the failure is invisible from the daemon's perspective.

A daemon restart (Quit Ollama → re-open) appears to clear the issue temporarily.

Fix Action

Workaround

Restart the daemon (Quit Ollama → re-open). Issue clears for an unpredictable period before returning.

Code Example

[GIN] 2026/04/30 - 14:29:26 | 200 | 24.435999083s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2026/04/30 - 14:35:44 | 200 |  3.025813125s |       127.0.0.1 | POST     "/v1/chat/completions"

---

[GIN] 2026/04/30 - 14:35:38 | 502 |  1.606942792s |       127.0.0.1 | POST     "/v1/chat/completions"

---

curl -v -X POST http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.6:cloud",
    "messages": [
      {"role": "system", "content": "<~60KB of text>"},
      {"role": "user", "content": "hi"}
    ],
    "tools": [<~36 OpenAI-format tool definitions>],
    "tool_choice": "auto",
    "max_tokens": 5,
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

---

< HTTP/1.1 400 Bad Request
* Connection reset by peer
* Recv failure: Connection reset by peer

---

## Logs

### The smoking gun: client-side failures vs server-side log gap

The Ollama daemon's GIN access log shows **no entries** during the 72-second
window in which the client made 3 attempts that all received TCP RST.

**Ollama server.log`chat/completions` requests on 2026-04-30 (14:3014:45 window):**


HH:MM:SS  STATUS  TIMING
14:30:18  | 200 |    20.25s
14:30:38  | 200 |    20.21s
14:30:52  | 200 |     8.39s
14:30:56  | 200 |     2.36s
14:31:00  | 200 |     3.34s
14:31:04  | 200 |     3.91s
14:31:07  | 200 |     2.78s
14:31:10  | 200 |     2.64s
14:31:21  | 200 |     7.48s
14:31:30  | 200 |     8.16s
14:31:34  | 200 |     2.93s
14:31:39  | 200 |     4.87s
14:31:48  | 200 |     7.82s
14:32:09  | 200 |    20.34s
14:32:35  | 200 |     2.80s
14:32:45  | 200 |     8.27s
14:33:02  | 200 |    16.97s
14:33:12  | 200 |     8.81s
14:33:28  | 200 |    14.67s
14:33:45  | 200 |    14.69s
14:35:05  | 200 |    11.16s
14:35:12  | 200 |     7.78s
14:35:22  | 200 |     5.15s
14:35:36  | 200 |   866.0ms
14:35:38  | 502 |     1.61s
14:35:44  | 200 |     3.03s
14:35:55  | 200 |    11.11s     ← LAST LOGGED REQUEST before the gap
─────────────────────────────────────────────────────────────────────
72-second gap — NO entries in server.log
   During this gap, the client made 3 retry attempts.
   All 3 received TCP RST in <10s elapsed each.
   None appear here.
─────────────────────────────────────────────────────────────────────
14:37:07  | 200 |  rest of day continues normally
14:37:55  | 200 |
14:38:44  | 200 |
14:38:45  | 200 |
14:38:46  | 200 |
14:38:47  | 200 |
14:39:03  | 200 |
14:39:04  | 200 |
14:39:06  | 200 |
14:39:12  | 200 |
14:39:37  | 200 |


Notable: the `502` at 14:35:38 *was* logged. So when the daemon's upstream
returns an error, the GIN log records it normally. Only this RST class of
failure is invisible.

### Client-side trace (Hermes AgentPython httpx + openai SDK)

The user's terminal output for one failed session:


⚠️  API call failed (attempt 1/3): ReadError
   🔌 Provider: ollama-cloud  Model: kimi-k2.6:cloud
   🌐 Endpoint: http://127.0.0.1:11434/v1
   📝 Error: [Errno 54] Connection reset by peer
   ⏱️  Elapsed: 0.08s  Context: 2 msgs, ~15,405 tokens
Retrying in 2.9s (attempt 1/3)...

⚠️  API call failed (attempt 2/3): ReadError
   🔌 Provider: ollama-cloud  Model: kimi-k2.6:cloud
   🌐 Endpoint: http://127.0.0.1:11434/v1
   📝 Error: [Errno 54] Connection reset by peer
   ⏱️  Elapsed: 3.16s  Context: 2 msgs, ~15,405 tokens
Retrying in 4.7s (attempt 2/3)...

⚠️  API call failed (attempt 3/3): ReadError
   🔌 Provider: ollama-cloud  Model: kimi-k2.6:cloud
   🌐 Endpoint: http://127.0.0.1:11434/v1
   📝 Error: [Errno 54] Connection reset by peer
   ⏱️  Elapsed: 8.07s  Context: 2 msgs, ~15,405 tokens

⚠️ Max retries (3) exhausted — trying fallback...
🔄 Primary model failed — switching to fallback: openrouter/free via openrouter


Each attempt's elapsed time is the round-trip up to RST: 0.08 s on the first
attempt is essentially "TCP connect succeeded, request body sent, RST received."
Subsequent attempts include retry backoff time so are longer cumulatively.

Same machine. Same model. `localhost` connection — no network involvement.

### Pattern of recurrence in client logs (one day)

The client's structured log shows the failure recurring throughout 2026-04-30,
not just once:


2026-04-30 03:00:10  Fallback activated: kimi-k2.6:cloud → openrouter/free  (cron task)
2026-04-30 14:19:24  Fallback activated: kimi-k2.6:cloud → openrouter/free
2026-04-30 14:19:40  Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>
2026-04-30 14:20:51  Fallback activated: kimi-k2.6:cloud → openrouter/free
2026-04-30 14:21:10  Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>
2026-04-30 14:33:33  Fallback activated: kimi-k2.6:cloud → openrouter/free
2026-04-30 14:36:02  Fallback activated: kimi-k2.6:cloud → openrouter/free   ← the example above
2026-04-30 14:37:07  Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>


(The 400 errors on title-generation are a separate code path that hits
`https://ollama.com/v1/` directly rather than the local daemon — included
for completeness; they're not the primary issue but demonstrate that the
client is exercising both paths.)

### Daemon was healthy throughout

`~/.ollama/logs/app.log` shows normal startup, no panics, no recovered
errors, no "broken pipe" or EOF events at any point during the failure
window. Greps across all rolled app logs (`app.log`, `app-1.log``app-5.log`)
for `panic|fatal|recovered|connection.*close|broken pipe|EOF` return zero
matches. From the daemon's perspective, nothing went wrong — it never knew
about the 3 failed requests.

Current run startup trace (clean):

time=2026-04-30T14:23:13.009-07:00 level=INFO source=app_darwin.go:217  msg="starting Ollama" version=0.22.0 OS=Darwin/25.3.0
time=2026-04-30T14:23:13.013-07:00 level=INFO source=app.go:239         msg="initialized tools registry"
time=2026-04-30T14:23:13.013-07:00 level=INFO source=app.go:254         msg="starting ollama server"
time=2026-04-30T14:23:13.014-07:00 level=INFO source=app.go:285         msg="starting ui server" port=60600
time=2026-04-30T14:23:13.148-07:00 level=INFO source=updater_darwin.go:405 msg="using v13+ SMAppService for login registration"


Server config (relevant non-default-only excerpts):

OLLAMA_HOST=http://127.0.0.1:11434
OLLAMA_KEEP_ALIVE=5m0s
OLLAMA_NUM_PARALLEL=1
OLLAMA_MAX_QUEUE=512
OLLAMA_MAX_LOADED_MODELS=0
OLLAMA_NO_CLOUD=false
OLLAMA_REMOTES=[ollama.com]
RAW_BUFFERClick to expand / collapse

What is the issue?

Intermittent connection reset on :cloud model proxy after upgrading to 0.22.0

Summary

After upgrading the macOS Ollama app from 0.20.2 → 0.22.0 (no intermediate 0.21.x), the local daemon started intermittently RST-ing client connections on POST /v1/chat/completions requests targeting :cloud-suffixed models. The distinctive signature is that the failed requests never appear in the daemon's GIN access log — the daemon resets the TCP connection before its request handler runs, so the failure is invisible from the daemon's perspective.

A daemon restart (Quit Ollama → re-open) appears to clear the issue temporarily.

Environment

  • macOS: Darwin 25.3.0 (arm64)
  • Ollama version (current): 0.22.0 — installed from the macOS app
  • Previous version (worked): 0.20.2 (in service from at least 2026-04-05 through 2026-04-25; see ~/.ollama/logs/server-{4,5}.log)
  • Upgrade timestamp: 2026-04-29 13:59:57 PDT (per app-3.log first INFO line and Info.plist mtime)
  • Failure first observed: 2026-04-30 (the day after the upgrade)
  • Cloud model that triggers it: kimi-k2.6:cloud (the only :cloud model in active use here, but I'd expect it to apply to all :cloud models)
  • Client: Hermes Agent v0.12.0 (Python 3.11, openai SDK 2.31.0, httpx 0.28.1) — but the bug also reproduces with curl from the same machine, so the client is not implicated.

Symptoms

When a client opens a TCP connection to 127.0.0.1:11434 and sends a POST /v1/chat/completions request:

  • Sometimes (most of the time): daemon responds normally with HTTP 200 in seconds.
  • Sometimes (intermittent, hard to reproduce on demand): TCP connection is RST within ~80 ms of the request being sent. Client reports [Errno 54] Connection reset by peer (macOS ECONNRESET).

The distinctive forensic signature:

  • Failed requests are NOT logged in ~/.ollama/logs/server.log. The GIN access log contains lines for every request the daemon's HTTP handler sees (200s, 502s, 4xx), but the RST-ing requests have no corresponding line. This means the connection is being closed/reset before gin runs the request handler — i.e., somewhere in HTTP server pre-handler code, TLS negotiation, body framing, or a panic recovery path.
  • There's nothing in app.log either. No "panic", "recovered", "EOF", "broken pipe", "too large", or similar low-level error lines anywhere in the recent app or server logs that correspond to these failures.

Sample successful requests in server.log:

[GIN] 2026/04/30 - 14:29:26 | 200 | 24.435999083s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2026/04/30 - 14:35:44 | 200 |  3.025813125s |       127.0.0.1 | POST     "/v1/chat/completions"

A 502 was logged once when the daemon attempted to forward to ollama.com and got a backend error — but those at least appear in the log:

[GIN] 2026/04/30 - 14:35:38 | 502 |  1.606942792s |       127.0.0.1 | POST     "/v1/chat/completions"

The RST'd requests are completely absent.

Reproduction (best effort)

The bug is probabilistic — I have not been able to reproduce it deterministically. Synthetic load with identical request shapes (84 KB body, 36 tools array, streaming with stream_options.include_usage, ~15K-token context) succeeded 30/30 in concurrent batch tests after a daemon restart. The bug surfaces several minutes later under real client load and persists until daemon restart.

Best minimal repro shape (will RST occasionally; may need a fresh daemon that's been running for some time):

curl -v -X POST http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.6:cloud",
    "messages": [
      {"role": "system", "content": "<~60KB of text>"},
      {"role": "user", "content": "hi"}
    ],
    "tools": [<~36 OpenAI-format tool definitions>],
    "tool_choice": "auto",
    "max_tokens": 5,
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

When it fails:

< HTTP/1.1 400 Bad Request
* Connection reset by peer
* Recv failure: Connection reset by peer

(Exit code from curl: 7 / 56 depending on the exact moment of RST.)

What does NOT trigger it

I bisected request features against an isolated, freshly-restarted daemon and the following ALL succeeded individually:

  • Large body (60 KB+) without tools
  • 36 tools without large body
  • Streaming without tools
  • Non-streaming with 36 tools
  • Non-:cloud models (local-only models work fine throughout)

So it's not deterministic on any single request feature. It only manifests on :cloud model proxy, sporadically, after the daemon has been running for some time and/or has been under client load.

Workaround

Restart the daemon (Quit Ollama → re-open). Issue clears for an unpredictable period before returning.

Hypothesis

This looks like a regression in Ollama 0.22.0's :cloud proxy code — possibly:

  • A goroutine/connection pool leak that exhausts something resource-bound, causing the server to start dropping connections at the TCP layer
  • A panic in the cloud-proxy handler that isn't being caught by the standard recovery middleware (which is why GIN doesn't log it — gin.Recovery() would normally log a panic but here we see nothing)
  • A race in the cloud auth/upstream-connection setup that causes the connection to be closed during request body read

The fact that 0.20.2 was reliable for the same workload and 0.22.0 fails for it suggests a specific change between those versions in the cloud-proxy code path is to blame. The version skip (no 0.21.x in the upgrade) means there's a larger surface area of change to bisect.

Files I can attach if requested

  • ~/.ollama/logs/server.log (current — contains successful requests + the missing-failed-request gap)
  • ~/.ollama/logs/app.log (current)
  • ~/.ollama/logs/server-{1..5}.log and app-{1..5}.log (rolled, including the 0.20.2 → 0.22.0 transition)

I have not collected a tcpdump capture during the RST since the bug is too intermittent to pre-stage one, but I can leave one running if it would help.

Relevant log output

## Logs

### The smoking gun: client-side failures vs server-side log gap

The Ollama daemon's GIN access log shows **no entries** during the 72-second
window in which the client made 3 attempts that all received TCP RST.

**Ollama server.log — `chat/completions` requests on 2026-04-30 (14:30–14:45 window):**


HH:MM:SS  STATUS  TIMING
14:30:18  | 200 |    20.25s
14:30:38  | 200 |    20.21s
14:30:52  | 200 |     8.39s
14:30:56  | 200 |     2.36s
14:31:00  | 200 |     3.34s
14:31:04  | 200 |     3.91s
14:31:07  | 200 |     2.78s
14:31:10  | 200 |     2.64s
14:31:21  | 200 |     7.48s
14:31:30  | 200 |     8.16s
14:31:34  | 200 |     2.93s
14:31:39  | 200 |     4.87s
14:31:48  | 200 |     7.82s
14:32:09  | 200 |    20.34s
14:32:35  | 200 |     2.80s
14:32:45  | 200 |     8.27s
14:33:02  | 200 |    16.97s
14:33:12  | 200 |     8.81s
14:33:28  | 200 |    14.67s
14:33:45  | 200 |    14.69s
14:35:05  | 200 |    11.16s
14:35:12  | 200 |     7.78s
14:35:22  | 200 |     5.15s
14:35:36  | 200 |   866.0ms
14:35:38  | 502 |     1.61s
14:35:44  | 200 |     3.03s
14:35:55  | 200 |    11.11s     ← LAST LOGGED REQUEST before the gap
─────────────────────────────────────────────────────────────────────
⏳ 72-second gap — NO entries in server.log
   During this gap, the client made 3 retry attempts.
   All 3 received TCP RST in <10s elapsed each.
   None appear here.
─────────────────────────────────────────────────────────────────────
14:37:07  | 200 |  rest of day continues normally
14:37:55  | 200 |
14:38:44  | 200 |
14:38:45  | 200 |
14:38:46  | 200 |
14:38:47  | 200 |
14:39:03  | 200 |
14:39:04  | 200 |
14:39:06  | 200 |
14:39:12  | 200 |
14:39:37  | 200 |


Notable: the `502` at 14:35:38 *was* logged. So when the daemon's upstream
returns an error, the GIN log records it normally. Only this RST class of
failure is invisible.

### Client-side trace (Hermes Agent — Python httpx + openai SDK)

The user's terminal output for one failed session:


⚠️  API call failed (attempt 1/3): ReadError
   🔌 Provider: ollama-cloud  Model: kimi-k2.6:cloud
   🌐 Endpoint: http://127.0.0.1:11434/v1
   📝 Error: [Errno 54] Connection reset by peer
   ⏱️  Elapsed: 0.08s  Context: 2 msgs, ~15,405 tokens
⏳ Retrying in 2.9s (attempt 1/3)...

⚠️  API call failed (attempt 2/3): ReadError
   🔌 Provider: ollama-cloud  Model: kimi-k2.6:cloud
   🌐 Endpoint: http://127.0.0.1:11434/v1
   📝 Error: [Errno 54] Connection reset by peer
   ⏱️  Elapsed: 3.16s  Context: 2 msgs, ~15,405 tokens
⏳ Retrying in 4.7s (attempt 2/3)...

⚠️  API call failed (attempt 3/3): ReadError
   🔌 Provider: ollama-cloud  Model: kimi-k2.6:cloud
   🌐 Endpoint: http://127.0.0.1:11434/v1
   📝 Error: [Errno 54] Connection reset by peer
   ⏱️  Elapsed: 8.07s  Context: 2 msgs, ~15,405 tokens

⚠️ Max retries (3) exhausted — trying fallback...
🔄 Primary model failed — switching to fallback: openrouter/free via openrouter


Each attempt's elapsed time is the round-trip up to RST: 0.08 s on the first
attempt is essentially "TCP connect succeeded, request body sent, RST received."
Subsequent attempts include retry backoff time so are longer cumulatively.

Same machine. Same model. `localhost` connection — no network involvement.

### Pattern of recurrence in client logs (one day)

The client's structured log shows the failure recurring throughout 2026-04-30,
not just once:


2026-04-30 03:00:10  Fallback activated: kimi-k2.6:cloud → openrouter/free  (cron task)
2026-04-30 14:19:24  Fallback activated: kimi-k2.6:cloud → openrouter/free
2026-04-30 14:19:40  Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>
2026-04-30 14:20:51  Fallback activated: kimi-k2.6:cloud → openrouter/free
2026-04-30 14:21:10  Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>
2026-04-30 14:33:33  Fallback activated: kimi-k2.6:cloud → openrouter/free
2026-04-30 14:36:02  Fallback activated: kimi-k2.6:cloud → openrouter/free   ← the example above
2026-04-30 14:37:07  Title generation failed: <html><title>Error 400 (Bad Request)!!1</title></html>


(The 400 errors on title-generation are a separate code path that hits
`https://ollama.com/v1/` directly rather than the local daemon — included
for completeness; they're not the primary issue but demonstrate that the
client is exercising both paths.)

### Daemon was healthy throughout

`~/.ollama/logs/app.log` shows normal startup, no panics, no recovered
errors, no "broken pipe" or EOF events at any point during the failure
window. Greps across all rolled app logs (`app.log`, `app-1.log``app-5.log`)
for `panic|fatal|recovered|connection.*close|broken pipe|EOF` return zero
matches. From the daemon's perspective, nothing went wrong — it never knew
about the 3 failed requests.

Current run startup trace (clean):

time=2026-04-30T14:23:13.009-07:00 level=INFO source=app_darwin.go:217  msg="starting Ollama" version=0.22.0 OS=Darwin/25.3.0
time=2026-04-30T14:23:13.013-07:00 level=INFO source=app.go:239         msg="initialized tools registry"
time=2026-04-30T14:23:13.013-07:00 level=INFO source=app.go:254         msg="starting ollama server"
time=2026-04-30T14:23:13.014-07:00 level=INFO source=app.go:285         msg="starting ui server" port=60600
time=2026-04-30T14:23:13.148-07:00 level=INFO source=updater_darwin.go:405 msg="using v13+ SMAppService for login registration"


Server config (relevant non-default-only excerpts):

OLLAMA_HOST=http://127.0.0.1:11434
OLLAMA_KEEP_ALIVE=5m0s
OLLAMA_NUM_PARALLEL=1
OLLAMA_MAX_QUEUE=512
OLLAMA_MAX_LOADED_MODELS=0
OLLAMA_NO_CLOUD=false
OLLAMA_REMOTES=[ollama.com]

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.22.0

extent analysis

TL;DR

The most likely fix for the intermittent connection reset issue on the :cloud model proxy after upgrading to Ollama 0.22.0 is to downgrade to version 0.20.2, as it was reliable for the same workload.

Guidance

  • Verify that the issue is indeed related to the :cloud model proxy by checking if the problem persists when using non-:cloud models.
  • Check the Ollama daemon's configuration and logs for any errors or warnings that may indicate a resource leak or panic in the cloud-proxy handler.
  • Consider collecting a tcpdump capture during the RST to further analyze the issue.
  • If downgrading to 0.20.2 is not feasible, try to identify the specific change between 0.20.2 and 0.22.0 that may be causing the issue.

Example

No code snippet is provided as the issue is related to a specific version of Ollama and its configuration.

Notes

The issue seems to be related to a regression in Ollama 0.22.0's :cloud proxy code, possibly due to a goroutine/connection pool leak or a panic in the cloud-proxy handler. The fact that 0.20.2 was reliable for the same workload suggests that downgrading may be a viable solution.

Recommendation

Downgrade to Ollama version 0.20.2, as it was reliable for the same workload, to fix the intermittent connection reset issue on the :cloud model proxy. This is recommended because the issue is likely related to a regression in the 0.22.0 version, and downgrading to a previous version that worked may resolve the problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Intermittent connection reset on :cloud model proxy after upgrading 0.20.2 → 0.22.0 (no GIN log entry) [1 participants]