openclaw - 💡(How to fix) Fix Observability gap — gateway.log lacks per-turn response duration and content hash; byte-identical stuck-loops are undetectable from gateway logs alone

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

gateway.log records cli exec, claude live session reuse, claude live session turn: durationMs=X rawLines=Y, and claude live session close: reason=... events. None of these record:

  • Response content length (bytes)
  • Response content hash (e.g., sha256 prefix)

When a session enters a byte-identical stuck-loop (model produces or claude-cli emits the same response for every prompt), the loop is invisible in gateway.log; investigation requires reading the per-session .jsonl transcript.

Root Cause

gateway.log records cli exec, claude live session reuse, claude live session turn: durationMs=X rawLines=Y, and claude live session close: reason=... events. None of these record:

  • Response content length (bytes)
  • Response content hash (e.g., sha256 prefix)

When a session enters a byte-identical stuck-loop (model produces or claude-cli emits the same response for every prompt), the loop is invisible in gateway.log; investigation requires reading the per-session .jsonl transcript.

Fix Action

Fix / Workaround

Proposed patch (partial)

In finishTurn(session, output) (currently emits claude live session turn: durationMs=X rawLines=Y), add:

  • outBytes=<utf8-byte-count>
  • outHash=<sha256-first-12-hex-chars>

Important caveat

This patch alone would NOT have caught the 2026-05-12 incident. During the stuck-loop, finishTurn did NOT fire for any of the 11 synthetic turns (no turn: lines appeared in gateway.log for that window). The synthetic short-circuit emits the assistant message via a separate code path. A complete fix requires locating the synthetic-emission site and emitting an equivalent turn-log line there too (with model=<synthetic> flagged), so the stuck-loop becomes visible from gateway.log alone.

Code Example

const __outText = ... // narrow `output` to its assistant-text representation
const __outBytes = Buffer.byteLength(__outText, "utf8");
const __outHash = require("crypto").createHash("sha256").update(__outText).digest("hex").slice(0, 12);
cliBackendLog.info(`claude live session turn: provider=... durationMs=... rawLines=... outBytes=${__outBytes} outHash=${__outHash}`);
RAW_BUFFERClick to expand / collapse

Summary

gateway.log records cli exec, claude live session reuse, claude live session turn: durationMs=X rawLines=Y, and claude live session close: reason=... events. None of these record:

  • Response content length (bytes)
  • Response content hash (e.g., sha256 prefix)

When a session enters a byte-identical stuck-loop (model produces or claude-cli emits the same response for every prompt), the loop is invisible in gateway.log; investigation requires reading the per-session .jsonl transcript.

Case study

2026-05-12 incident, full timeline in companion Issue A. The 11 stuck responses were byte-identical (sha256 02df1d51bd7e9f1c3b44b1e0e431b8704bb80a8f8fc10e286894a61a51042429, 78 bytes each); the loop was only detected by transcript inspection.

Proposed patch (partial)

In finishTurn(session, output) (currently emits claude live session turn: durationMs=X rawLines=Y), add:

  • outBytes=<utf8-byte-count>
  • outHash=<sha256-first-12-hex-chars>

A pseudo-diff against the bundled claude-live-session-*.js:

const __outText = ... // narrow `output` to its assistant-text representation
const __outBytes = Buffer.byteLength(__outText, "utf8");
const __outHash = require("crypto").createHash("sha256").update(__outText).digest("hex").slice(0, 12);
cliBackendLog.info(`claude live session turn: provider=... durationMs=... rawLines=... outBytes=${__outBytes} outHash=${__outHash}`);

Important caveat

This patch alone would NOT have caught the 2026-05-12 incident. During the stuck-loop, finishTurn did NOT fire for any of the 11 synthetic turns (no turn: lines appeared in gateway.log for that window). The synthetic short-circuit emits the assistant message via a separate code path. A complete fix requires locating the synthetic-emission site and emitting an equivalent turn-log line there too (with model=<synthetic> flagged), so the stuck-loop becomes visible from gateway.log alone.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING