claude-code - ✅(Solved) Fix stdio MCP server killed every ~5 min when JSON-RPC ping response misses timeout window [1 pull requests, 2 comments, 3 participants]

claude-code2026-04-29 05:09:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#54544•Fetched 2026-04-30 06:42:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×6labeled ×4commented ×2

Stdio MCP servers (reproed with the official Telegram plugin v0.0.6) are killed by Claude Code at ~5-minute intervals when the server fails to respond to a JSON-RPC ping request within an apparent timeout window. The kill manifests as a SIGTERM to the server process, paired with the visual "Telegram MCP server failed" red text and "1 claude.ai connector needs auth" yellow text in the status bar. Inbound channel notifications drop until the user manually runs /mcp.

This bug appears to be the upstream cause of several existing issues: #50607, #53617, #43049, #45985, #46334.

Root Cause

#50607 — Telegram inbound notifications silently drop after /mcp reconnect (this issue describes the same root cause)
#53617 — StdIO MCP server disconnects after successful tools/call response when progress notifications are emitted
#43049 — Telegram plugin: inbound notifications not reaching session
#45985 — Telegram channel MCP server disconnects after 5s, drops messages (mismeasured timing; actual interval is 5 min, not 5s)
#46334 — Channels (Telegram) silently disabled by tengu_harbor feature flag
PR #1424 (anthropics/claude-plugins-official) — bot-side reliability fixes, related but addresses a different layer

Fix Action

Workaround

None reliable. Manual /mcp after each disconnect. A self-restart sidecar (killing the bot's stale poller and spawning a fresh one) is partial — it addresses the kill side but doesn't restore CC's view of the connection.

PR fix notes

PR #1424: fix(telegram): v0.0.7 reliability rollup — state-dir, PID guard, ppid watchdog, install stdout

Repository: anthropics/claude-plugins-official
Author: noahzweben
State: open | merged: False
Link: https://github.com/anthropics/claude-plugins-official/pull/1424

Description (problem / solution / changelog)

Four independent fixes for v0.0.7. Commits 3 and 4 fix regressions introduced by #1349.

1. Skills ignore TELEGRAM_STATE_DIR / CLAUDE_CONFIG_DIR (223c9b2)

server.ts:26 already honors TELEGRAM_STATE_DIR, but the /telegram:access and /telegram:configure skills hardcode ~/.claude/channels/telegram/ in 11 places — skill writes and server reads diverge, pairing/allowlist edits silently no-op. Skills now resolve the dir via shell expansion first; server gets CLAUDE_CONFIG_DIR fallback. Adds Bash(echo *) / Bash(chmod *) to allowed-tools.

Fixes #931, fixes #914, fixes #933, fixes #851; addresses anthropics/claude-code#37173.

2. PID-lockfile SIGTERM can hit a recycled PID (6ceddea)

#1349's lockfile stores only a PID. After enough churn the OS recycles it — potentially to the new launch's own bun run wrapper or any unrelated process. Now verify ps -p <pid> -o args= contains server.ts before SIGTERM (execFileSync, no shell).

Hardens #1349; partial mitigation for #1459 item 3.

3. Orphan-watchdog ppid check false-fires on normal startup (1efdff0) — #1349 regression

The watchdog's process.ppid !== bootPpid check fires when the bun-run/shell wrapper exits or execs during normal startup and we get reparented to init — plugin self-terminates ~5s after launch. Dropped the ppid check; stdin-close is the correct signal (kernel closes the MCP pipe on any CLI death regardless of intermediate wrappers), so ppid was both unnecessary and harmful.

Fixes #1467. This is the actual root cause of #1459 item 3 and likely #1425 (not PID-recycling as commit 2 theorized — though that guard remains a valid safety improvement).

4. `bun install` stdout corrupts MCP JSON-RPC handshake (1efdff0)

bun install --no-summary in the start script writes to stdout, which is the MCP transport. The harness sees non-JSON during handshake → "Failed to connect". Redirect install output to stderr (1>&2). Verified bun run --shell=bun supports the redirect.

Fixes #1470; addresses #1425 on Windows.

Testing

bun build server.ts --target=bun ✅ (all commits)
bun run --shell=bun redirect smoke test: echo OUT 1>&2 → stderr ✅
ps -p $$ -o args= smoke-tested on Linux

Net diff: +57/−28. Plugin v0.0.7.

Changed files

external_plugins/telegram/.claude-plugin/plugin.json (modified, +1/-1)
external_plugins/telegram/package.json (modified, +1/-1)
external_plugins/telegram/server.ts (modified, +19/-12)
external_plugins/telegram/skills/access/SKILL.md (modified, +18/-8)
external_plugins/telegram/skills/configure/SKILL.md (modified, +18/-6)

Code Example

{"method":"ping","jsonrpc":"2.0","id":2}\n

RAW_BUFFERClick to expand / collapse

Summary

This bug appears to be the upstream cause of several existing issues: #50607, #53617, #43049, #45985, #46334.

Environment

macOS 24.5.0 (Darwin)
Claude Code 2.1.121 and 2.1.122 (both reproduced)
Telegram plugin v0.0.6 (anthropics/claude-plugins-official)
--channels flag enabled
Stdio transport (bun-run wrapper → server.ts)

Reproduction

Start claude --channels and verify Telegram MCP connects successfully.
Pipe the bot's stderr to a log file (e.g., wrap the plugin start script to tee stderr) so you can observe MCP traffic.
Add a small bot.on('message:text', ...) log line that records timestamp + process.uptime() to confirm bot lifetime, and a process.stdin.on('data', chunk => ...) log line to capture small stdin packets.
Wait without interacting. At approximately 5 minutes from spawn, observe one of two outcomes:
- Outcome A (kill): bot receives SIGTERM at uptime ~300s, exits cleanly. A new bot is spawned by CC. Inbound notifications stop until /mcp.
- Outcome B (survive): bot receives a JSON-RPC ping (41 bytes: {"method":"ping","jsonrpc":"2.0","id":N}\n), responds via the MCP SDK's built-in ping handler, and continues running. Subsequent pings arrive at ~5-min intervals.

The outcome appears to depend on whether the bot's response reaches CC within some timeout. Bots actively handling tool calls or notifications miss the window more often than idle bots — but even idle bots are killed sometimes.

Observed Pattern (8 bots over ~1 hour)

Bot lifetime	Outcome	Notes
300s	SIGTERM kill	no ping received before kill
300s	SIGTERM kill	activity present (inbound + tool call)
300s	SIGTERM kill	activity present
301s	SIGTERM kill	activity present
632s+	killed by user /mcp	ping at uptime 342s (id:2) — survived
632s+	killed by user /mcp	ping at uptime 347s (id:2) — survived
948s	killed by user /mcp	ping at uptime 338s (id:2) — survived
1000s+	still alive at posting	pings at id:2 (338s), id:3 (659s), id:4 (961s) — all survived

Empirically: roughly 50% of bots are killed at the first 5-min mark; survivors get pinged successfully and live indefinitely until the user reconnects.

Wire-protocol Evidence

The 41-byte stdin packet that arrives at the 5-min mark is exactly:

{"method":"ping","jsonrpc":"2.0","id":2}\n

Subsequent pings increment the id (id:3, id:4, …) at ~5-min intervals.

The MCP SDK auto-responds to ping with {"jsonrpc":"2.0","id":N,"result":{}}. Bots that successfully complete this roundtrip survive; bots that don't receive a ping before the timeout (or whose response isn't received in time) are killed via SIGTERM.

The status-bar text "1 claude.ai connector needs auth · /mcp" appears at the same moment as the kill, suggesting the ping check is part of a broader auth/connection re-check cycle that happens to share the kill path.

Effect on Users

Telegram messages dropped without user awareness (CC marks server failed, bot keeps polling Telegram, but notifications/claude/channel writes are not relayed).
Manual /mcp reconnect required every disconnect cycle.
Severely degraded UX for anyone using CC channel mode for remote work.

Relationship to PR #1424 (anthropics/claude-plugins-official)

PR #1424 addresses bot-side reliability bugs (orphan watchdog false-fires, bun install stdout corrupting MCP handshake). Those are real and worth shipping, but they are a different layer — even with #1424 merged, the ping-timeout kill cycle described here would persist whenever the bot is busy enough to miss a ping response window.

Cross-References

#50607 — Telegram inbound notifications silently drop after /mcp reconnect (this issue describes the same root cause)
#53617 — StdIO MCP server disconnects after successful tools/call response when progress notifications are emitted
#43049 — Telegram plugin: inbound notifications not reaching session
#45985 — Telegram channel MCP server disconnects after 5s, drops messages (mismeasured timing; actual interval is 5 min, not 5s)
#46334 — Channels (Telegram) silently disabled by tengu_harbor feature flag
PR #1424 (anthropics/claude-plugins-official) — bot-side reliability fixes, related but addresses a different layer

Suggested Mitigations

Increase the ping response timeout for stdio MCP servers, or eliminate the kill-on-miss behavior in favor of explicit notifications/cancelled semantics.
Auto-reconnect stdio MCP servers when the connection is closed by the ping-timeout path. Currently, the visible behavior is that stdio servers do not auto-reconnect — the user must manually /mcp.
Surface the ping/keepalive failure to the user with a more accurate status message (e.g. "MCP server didn't respond to keepalive in time, killing and respawning") rather than the current "MCP server failed" + auth warning, which is misleading.

Workaround

extent analysis

TL;DR

Increase the ping response timeout for stdio MCP servers or implement auto-reconnect to prevent servers from being killed due to ping-timeout.

Guidance

Investigate the current ping response timeout value and consider increasing it to a higher value to accommodate busy bots.
Implement auto-reconnect for stdio MCP servers when the connection is closed by the ping-timeout path to minimize manual intervention.
Review the status message displayed to the user when the MCP server fails to respond to keepalive and update it to provide more accurate information.
Consider implementing a keepalive mechanism that allows the MCP server to respond to pings even when busy with other tasks.

Example

No code snippet is provided as the issue does not specify the exact implementation details of the MCP server or the ping response timeout.

Notes

The suggested mitigations may require modifications to the MCP server implementation or the Claude Code plugin. It is essential to test these changes thoroughly to ensure they do not introduce new issues.

Recommendation

Apply the suggested mitigations, specifically increasing the ping response timeout or implementing auto-reconnect, to prevent MCP servers from being killed due to ping-timeout and improve the overall user experience.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#indexing error #inference speed #output truncation #response parsing #generation error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - ✅(Solved) Fix stdio MCP server killed every ~5 min when JSON-RPC ping response misses timeout window [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #1424: fix(telegram): v0.0.7 reliability rollup — state-dir, PID guard, ppid watchdog, install stdout

Description (problem / solution / changelog)

1. Skills ignore TELEGRAM_STATE_DIR / CLAUDE_CONFIG_DIR (223c9b2)

2. PID-lockfile SIGTERM can hit a recycled PID (6ceddea)

3. Orphan-watchdog ppid check false-fires on normal startup (1efdff0) — #1349 regression

4. bun install stdout corrupts MCP JSON-RPC handshake (1efdff0)

Testing

Changed files

Code Example

Summary

Environment

Reproduction

Observed Pattern (8 bots over ~1 hour)

Wire-protocol Evidence

Effect on Users

Relationship to PR #1424 (anthropics/claude-plugins-official)

Cross-References

Suggested Mitigations

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

4. `bun install` stdout corrupts MCP JSON-RPC handshake (1efdff0)