claude-code - 💡(How to fix) Fix Socket connection closed unexpectedly during long agentic sessions (Bun: no SO_KEEPALIVE, keepalives disabled via feature flags)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Claude Code CLI throws API Error: The socket connection was closed unexpectedly consistently during long agentic tasks. The error is systemic — it occurs across all projects, prompts, and directories, always mid-session, never at startup. The session must be fully restarted to recover.


Error Message

API Error: The socket connection was closed unexpectedly.
For more information, pass `verbose: true` in the second argument to fetch()

The error message itself is a raw Bun/JSC runtime string (hardcoded in the JSC engine section of the binary). It surfaces with no actionable guidance and no automatic retry.


Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

Claude Code (before workaround):

claude → 160.79.104.10:443 (no timer) ← SO_KEEPALIVE NOT SET claude → 2607:6bc0::10:443 (no timer) ← SO_KEEPALIVE NOT SET


After setting `NODE_OPTIONS=--dns-result-order=ipv4first` as a workaround, active connections
still include IPv6:

As a workaround, `SO_KEEPALIVE` was forced on all sockets via an LD_PRELOAD shim:

Code Example

API Error: The socket connection was closed unexpectedly.
For more information, pass `verbose: true` in the second argument to fetch()

---

# Every other application on the system:
chrome  → 34.194.3.76:443    timer:(keepalive,16sec,0)SO_KEEPALIVE ON
code    → 4.228.31.153:443   timer:(keepalive,39sec,0)SO_KEEPALIVE ON

# Claude Code (before workaround):
claude  → 160.79.104.10:443  (no timer)SO_KEEPALIVE NOT SET
claude  → 2607:6bc0::10:443  (no timer)SO_KEEPALIVE NOT SET

---

"tengu_bridge_poll_interval_config": {
  "heartbeat_interval_ms": 0,
  "session_keepalive_interval_ms": 0,
  "session_keepalive_interval_v2_ms": 0
}

---

tengu_streaming_stale_connection_retry   ← exists for background agents
cli_nonstreaming_fallback_started        ← exists for CLI
Error streaming (non-streaming fallback disabled):

---

192.168.1.150:36660160.79.104.10:443IPv4 (api.anthropic.com)
[2804:...]:47658[2607:6bc0::10]:443IPv6 (api.anthropic.com) — still present
[2804:...]:56560[2600:1901:0:3084::]:443IPv6 (other Claude service)

---

int socket(int domain, int type, int protocol) {
    int fd = orig_socket(domain, type, protocol);
    if (fd >= 0 && (type & 0xf) == SOCK_STREAM) {
        int opt = 1;
        setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &opt, sizeof(opt));
    }
    return fd;
}

---

claude --dangerously-skip-permissions -p "Run one at a time:
1. date
2. python3 -c \"import time; time.sleep(90); print('survived 90s')\"
3. python3 -c \"import time; time.sleep(90); print('survived second 90s')\"
4. date"

---

socket.setKeepAlive(true, 30000); // or equivalent in Bun's net API

---

export CLAUDE_CODE_REMOTE_SEND_KEEPALIVES=true
export BUN_CONFIG_HTTP_IDLE_TIMEOUT=300
export BUN_CONFIG_HTTP_RETRY_COUNT=3
export CLAUDE_STREAM_IDLE_TIMEOUT_MS=120000
export NODE_OPTIONS="--dns-result-order=ipv4first"

---

cat > /tmp/ka.c << 'EOF'
#define _GNU_SOURCE
#include <stddef.h>
#include <sys/socket.h>
#include <dlfcn.h>
int socket(int domain, int type, int protocol) {
    static int (*orig)(int,int,int);
    if (!orig) orig = dlsym(RTLD_NEXT, "socket");
    int fd = orig(domain, type, protocol);
    if (fd >= 0 && (type & 0xf) == SOCK_STREAM) {
        int v = 1;
        setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &v, sizeof(v));
    }
    return fd;
}
EOF
gcc -shared -fPIC -O2 -o ~/.local/lib/libkeepalive.so /tmp/ka.c -ldl
echo 'export LD_PRELOAD="$HOME/.local/lib/libkeepalive.so${LD_PRELOAD:+:$LD_PRELOAD}"' >> ~/.zshrc

---

sudo sysctl -w net.ipv4.tcp_keepalive_time=60
echo "net.ipv4.tcp_keepalive_time=60" | sudo tee /etc/sysctl.d/99-claude-keepalive.conf
RAW_BUFFERClick to expand / collapse

Bug: "Socket connection was closed unexpectedly" during long agentic sessions

Summary

Claude Code CLI throws API Error: The socket connection was closed unexpectedly consistently during long agentic tasks. The error is systemic — it occurs across all projects, prompts, and directories, always mid-session, never at startup. The session must be fully restarted to recover.


Environment

  • Claude Code version: 2.1.143
  • OS: Arch Linux (Manjaro), kernel 6.12.85-1
  • Install method: Official installer (curl | bash) + pacman/AUR (same binary, both 2.1.143)
  • Binary runtime: Bun 1.3.14 / Node v24.3.0 (confirmed via strings on the binary)

Error

API Error: The socket connection was closed unexpectedly.
For more information, pass `verbose: true` in the second argument to fetch()

The error message itself is a raw Bun/JSC runtime string (hardcoded in the JSC engine section of the binary). It surfaces with no actionable guidance and no automatic retry.


Root Cause Analysis

Finding 1: Bun does not set SO_KEEPALIVE on its TCP sockets

Confirmed live via ss -tnop state established '( dport = :443 )':

# Every other application on the system:
chrome  → 34.194.3.76:443    timer:(keepalive,16sec,0)   ← SO_KEEPALIVE ON
code    → 4.228.31.153:443   timer:(keepalive,39sec,0)   ← SO_KEEPALIVE ON

# Claude Code (before workaround):
claude  → 160.79.104.10:443  (no timer)                  ← SO_KEEPALIVE NOT SET
claude  → 2607:6bc0::10:443  (no timer)                  ← SO_KEEPALIVE NOT SET

Without SO_KEEPALIVE, idle TCP connections receive no probe packets. During agentic tasks, the HTTP/2 connection sits idle while tools execute locally. That idle window is long enough for the Anthropic server, Cloudflare, or intermediate router NAT to silently drop the connection. When Claude attempts the next API call on the dropped connection, Bun throws the error.

Finding 2: All application-level keepalives are disabled via GrowthBook feature flags

Inspecting ~/.claude.json (cachedGrowthBookFeatures):

"tengu_bridge_poll_interval_config": {
  "heartbeat_interval_ms": 0,
  "session_keepalive_interval_ms": 0,
  "session_keepalive_interval_v2_ms": 0
}

All three keepalive/heartbeat intervals are 0. There is no fallback mechanism at the application layer when the TCP layer provides none.

The environment variable CLAUDE_CODE_REMOTE_SEND_KEEPALIVES exists in the binary and appears to override this, but it is not documented and users cannot be expected to discover it.

Finding 3: Stale connection retry only fires for background agents, not CLI

Binary strings show two distinct code paths:

tengu_streaming_stale_connection_retry   ← exists for background agents
cli_nonstreaming_fallback_started        ← exists for CLI
Error streaming (non-streaming fallback disabled):

In practice, the CLI path does not recover — the error propagates directly to the user with no retry. Background agents have a retry path; CLI sessions do not.

Finding 4: NODE_OPTIONS=--dns-result-order=ipv4first is not fully honored by Bun

After setting NODE_OPTIONS=--dns-result-order=ipv4first as a workaround, active connections still include IPv6:

192.168.1.150:36660   → 160.79.104.10:443       ← IPv4 (api.anthropic.com)
[2804:...]:47658      → [2607:6bc0::10]:443      ← IPv6 (api.anthropic.com) — still present
[2804:...]:56560      → [2600:1901:0:3084::]:443 ← IPv6 (other Claude service)

Bun uses its own native DNS resolution for some code paths, bypassing Node.js's dns module settings. The flag has partial effect only.

Finding 5: Error persists after forcing SO_KEEPALIVE via LD_PRELOAD

As a workaround, SO_KEEPALIVE was forced on all sockets via an LD_PRELOAD shim:

int socket(int domain, int type, int protocol) {
    int fd = orig_socket(domain, type, protocol);
    if (fd >= 0 && (type & 0xf) == SOCK_STREAM) {
        int opt = 1;
        setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &opt, sizeof(opt));
    }
    return fd;
}

After applying this, ss -tnop confirms keepalive timers are now active on all claude sockets. A stress test (two consecutive 90-second idle windows) passed successfully.

However, the error continued to occur during shorter tasks. This indicates a second, independent cause: the Bun HTTP/2 client does not gracefully handle server-initiated connection closure (HTTP/2 GOAWAY frame or TCP RST) and surfaces it as a fatal unrecoverable error instead of transparently retrying on a new connection.


Reproduction

Run a long agentic task with multiple tool-call turns and significant idle time between turns:

claude --dangerously-skip-permissions -p "Run one at a time:
1. date
2. python3 -c \"import time; time.sleep(90); print('survived 90s')\"
3. python3 -c \"import time; time.sleep(90); print('survived second 90s')\"
4. date"

Without fixes: fails during step 2 or 3 with the socket error.
With LD_PRELOAD SO_KEEPALIVE fix: passes (3m21s, both sleeps survived).
During normal interactive agentic use: still fails intermittently even with the fix.


Suggested Fixes (for Anthropic)

Fix 1 — Set SO_KEEPALIVE on API sockets (one line, highest impact):

socket.setKeepAlive(true, 30000); // or equivalent in Bun's net API

This is the single most impactful change. Chrome and VS Code both do this; Claude Code does not.

Fix 2 — Enable heartbeat_interval_ms by default for CLI sessions: The GrowthBook flag currently forces 0 for all keepalive intervals. CLI sessions should have a non-zero default (e.g. 30s) independent of the server-side feature flag value.

Fix 3 — Implement transparent retry on stale connection for CLI: The background agent path (tengu_streaming_stale_connection_retry) already handles this. The CLI path should get the same treatment instead of surfacing a fatal error.

Fix 4 — Handle HTTP/2 GOAWAY gracefully: When the server closes an HTTP/2 connection (normal lifecycle behavior), Bun should transparently open a new connection and retry the request rather than propagating the low-level socket error to the user.

Fix 5 — Improve the error message: "pass verbose: true in the second argument to fetch()" is a raw Bun internal error. Users cannot act on this. At minimum it should say the session can be resumed with --continue.


Workaround (for users until fixed)

Add to ~/.zshrc or ~/.bashrc:

export CLAUDE_CODE_REMOTE_SEND_KEEPALIVES=true
export BUN_CONFIG_HTTP_IDLE_TIMEOUT=300
export BUN_CONFIG_HTTP_RETRY_COUNT=3
export CLAUDE_STREAM_IDLE_TIMEOUT_MS=120000
export NODE_OPTIONS="--dns-result-order=ipv4first"

For a more reliable fix, also apply the SO_KEEPALIVE shim:

cat > /tmp/ka.c << 'EOF'
#define _GNU_SOURCE
#include <stddef.h>
#include <sys/socket.h>
#include <dlfcn.h>
int socket(int domain, int type, int protocol) {
    static int (*orig)(int,int,int);
    if (!orig) orig = dlsym(RTLD_NEXT, "socket");
    int fd = orig(domain, type, protocol);
    if (fd >= 0 && (type & 0xf) == SOCK_STREAM) {
        int v = 1;
        setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &v, sizeof(v));
    }
    return fd;
}
EOF
gcc -shared -fPIC -O2 -o ~/.local/lib/libkeepalive.so /tmp/ka.c -ldl
echo 'export LD_PRELOAD="$HOME/.local/lib/libkeepalive.so${LD_PRELOAD:+:$LD_PRELOAD}"' >> ~/.zshrc

Also lower the TCP keepalive probe interval (requires sudo):

sudo sysctl -w net.ipv4.tcp_keepalive_time=60
echo "net.ipv4.tcp_keepalive_time=60" | sudo tee /etc/sysctl.d/99-claude-keepalive.conf

Evidence References

FindingMethodConfirmed
No SO_KEEPALIVE on claude socketsss -tnop live output
All keepalives at 0ms~/.claude.json GrowthBook cache
Binary is Bun 1.3.14strings on binary
Error is Bun/JSC hardcoded stringstrings on binary, line 42426
stale-connection retry missing for CLIstrings code path analysis
IPv6 connections despite ipv4first flagss -tnop after applying NODE_OPTIONS
Error persists after SO_KEEPALIVE forcedLive reproduction during diagnostic session
90s idle stress test passes with shimControlled test, two 90s windows

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Socket connection closed unexpectedly during long agentic sessions (Bun: no SO_KEEPALIVE, keepalives disabled via feature flags)