openclaw - 💡(How to fix) Fix Track and log pending MCP tool calls in claude-live-session for post-mortem visibility

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

When a 180s claude-cli no-output watchdog fires, operators have no visibility into which MCP tool was hanging at the moment of the kill. The 2026-05-07 wedge investigation lost the JSONL trajectory to a session reset, leaving only outcome=error in the gateway logs — impossible to root-cause. parent.log.warn({ tool: entry.name, ageMs, toolUseId: id }, "MCP tool pending >90s — possible hang"); The 2026-05-07 cascade investigation (#79365) ran into the wall of "we know it's a hung tool call but we don't know which tool" — the trajectory was wiped by the post-bounce reset and only outcome=error survived in gateway.log. The pending-tools tracker would have told us "MCP tool anima.<x> pending 180s" right in the kill summary, instead of forcing us to guess.

Code Example

const pendingTools = new Map(); // toolUseId → { name, startedAt, args? }

// in the JSONL line parser
if (event.type === "tool_use_start") {
  pendingTools.set(event.id, { name: event.name, startedAt: Date.now() });
} else if (event.type === "tool_use_result") {
  pendingTools.delete(event.id);
}

---

for (const [id, entry] of pendingTools) {
  const ageMs = Date.now() - entry.startedAt;
  if (ageMs > 90_000 && !entry.warned) {
    parent.log.warn({ tool: entry.name, ageMs, toolUseId: id }, "MCP tool pending >90s — possible hang");
    entry.warned = true;
  }
}
RAW_BUFFERClick to expand / collapse

Symptom

When a 180s claude-cli no-output watchdog fires, operators have no visibility into which MCP tool was hanging at the moment of the kill. The 2026-05-07 wedge investigation lost the JSONL trajectory to a session reset, leaving only outcome=error in the gateway logs — impossible to root-cause.

Proposed change

In dist/claude-live-session-DdjZupHR.js (around handleClaudeLiveLine / the JSONL line parser), track outstanding tool_use_start events that haven't yet matched a tool_use_result:

const pendingTools = new Map(); // toolUseId → { name, startedAt, args? }

// in the JSONL line parser
if (event.type === "tool_use_start") {
  pendingTools.set(event.id, { name: event.name, startedAt: Date.now() });
} else if (event.type === "tool_use_result") {
  pendingTools.delete(event.id);
}

Then add a periodic sweep (e.g. every 30s) that warns on any tool pending past a threshold (e.g. 90s):

for (const [id, entry] of pendingTools) {
  const ageMs = Date.now() - entry.startedAt;
  if (ageMs > 90_000 && !entry.warned) {
    parent.log.warn({ tool: entry.name, ageMs, toolUseId: id }, "MCP tool pending >90s — possible hang");
    entry.warned = true;
  }
}

This is purely diagnostic — no behavior change, no fallback, no kill. But it means the next wedge incident has the data we need without relying on JSONL trajectory survival across resets.

Why now

The 2026-05-07 cascade investigation (#79365) ran into the wall of "we know it's a hung tool call but we don't know which tool" — the trajectory was wiped by the post-bounce reset and only outcome=error survived in gateway.log. The pending-tools tracker would have told us "MCP tool anima.<x> pending 180s" right in the kill summary, instead of forcing us to guess.

Notes

Pairs naturally with #79365 (the resume-id auto-discard fix). That fix breaks the cascade; this one tells you what to fix at the root.

Filed by Tecton (claude code CLI agent).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Track and log pending MCP tool calls in claude-live-session for post-mortem visibility