openclaw - 💡(How to fix) Fix LLM idle timeout error silently dropped when agentRunStarted is true [2 pull requests]

StepCodex · 2026-05-21T12:23:00Z

[openclaw] Bug Description When an LLM idle timeout occurs after the agent has started e.g., after tool calls , the error is written to the session log but nev… ## Fixed - Fixed by PR: fix(gateway): surface resolved chat errors (https://github.com/openclaw/openclaw/pull/84953) - Fixed by PR: fix(gateway): broadcast idle timeout errors to clients after agent run started (https://github.com/openclaw/openclaw/pull/85176) ## Bug Description When an LLM idle timeout occurs **after the agent has started** (e.g., after tool calls), the error is written to the session log but **never broadcast to connected clients**. Users see no error feedback — the response silently stops. ## Reproduction 1. Start an agent session via gateway (e.g., through ACP bridge / Ki-Agents) 2. The agent begins processing — reads skills, makes tool calls (so `agentRunStarted = true`) 3. On a subsequent LLM call, the model fails to produce any token within the idle timeout window (default 120s) 4. The timeout error is logged to the session JSONL file but never reaches the client ## Session Log Evidence ``` {"type":"custom","customType":"openclaw:prompt-error","data":{"error":"LLM idle timeout (120s): no response from model | LLM idle timeout (120s): no response from model",...}} ``` The session ends here — no final/error event is broadcast. ## Root Cause **File:** `src/gateway/server-methods/chat.ts` (`.then()` handler, ~line 2692 in main) ```typescript if (!agentRunStarted) { // Agent never started → processes deliveredReplies, broadcasts final/error ✅ broadcastChatFinal(...); } else if (!hasBeforeAgentRunGate) { // Agent started → only updates transcript, NO broadcast ❌ await emitUserTranscriptUpdate(); } ``` The timeout error flows like this: 1. **`run.ts`** handles the timeout by **returning** an error payload (`{ text: "...", isError: true }`), not throwing an exception 2. The error payload is collected in `deliveredReplies` via the `deliver` callback 3. The `.then()` handler checks `agentRunStarted` — since the agent had started (it made tool calls), it's `true` 4. The code only calls `emitUserTranscriptUpdate()` — **no `broadcastChatError()` or `broadcastChatFinal()` is called** 5. Meanwhile, `.catch()` (which does call `broadcastChatError()`) is **never reached** because `run.ts` returned normally, not threw **Result:** The error payload sits in `deliveredReplies` but is never broadcast. Connected clients (ACP bridges, etc.) never receive any error event. ## Expected Behavior Clients should receive a `state: "error"` chat event with the timeout error message, the same as other error scenarios. ## Suggested Fix In the `.then()` handler, when `agentRunStarted = true`, check `deliveredReplies` for payloads with `isError: true`. If found, call `broadcastChatError()` to notify connected clients: ```typescript } else { // Agent started — check for error payloads that weren't streamed const errorPayloads = deliveredReplies .filter((entry) => entry.payload.isError); if (errorPayloads.length > 0) { const errorMsg = errorPayloads .map((entry) => entry.payload.text) .filter(Boolean) .join(" | "); broadcastChatError({ context, runId: clientRunId, sessionKey, errorMessage: errorMsg, }); } else if (!hasBeforeAgentRunGate) { await emitUserTranscriptUpdate().catch(...); } } ``` ## Environment - OpenClaw version: main branch (`bde07ddb`) - Model: `glm-5-turbo` (via `anthropic-messages` API) - Connection: ACP bridge (Ki-Agents gateway) - Idle timeout: 120s (default)

openclaw2026-05-21 12:23:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

When an LLM idle timeout occurs after the agent has started (e.g., after tool calls), the error is written to the session log but never broadcast to connected clients. Users see no error feedback — the response silently stops. 4. The timeout error is logged to the session JSONL file but never reaches the client {"type":"custom","customType":"openclaw:prompt-error","data":{"error":"LLM idle timeout (120s): no response from model | LLM idle timeout (120s): no response from model",...}} The session ends here — no final/error event is broadcast. // Agent never started → processes deliveredReplies, broadcasts final/error ✅ The timeout error flows like this:

run.ts handles the timeout by returning an error payload ({ text: "...", isError: true }), not throwing an exception
The error payload is collected in deliveredReplies via the deliver callback Result: The error payload sits in deliveredReplies but is never broadcast. Connected clients (ACP bridges, etc.) never receive any error event. Clients should receive a state: "error" chat event with the timeout error message, the same as other error scenarios. // Agent started — check for error payloads that weren't streamed

Root Cause

File: src/gateway/server-methods/chat.ts (.then() handler, ~line 2692 in main)

if (!agentRunStarted) {
  // Agent never started → processes deliveredReplies, broadcasts final/error ✅
  broadcastChatFinal(...);
} else if (!hasBeforeAgentRunGate) {
  // Agent started → only updates transcript, NO broadcast ❌
  await emitUserTranscriptUpdate();
}

The timeout error flows like this:

run.ts handles the timeout by returning an error payload ({ text: "...", isError: true }), not throwing an exception
The error payload is collected in deliveredReplies via the deliver callback
The .then() handler checks agentRunStarted — since the agent had started (it made tool calls), it's true
The code only calls emitUserTranscriptUpdate() — no broadcastChatError() or broadcastChatFinal() is called
Meanwhile, .catch() (which does call broadcastChatError()) is never reached because run.ts returned normally, not threw

Result: The error payload sits in deliveredReplies but is never broadcast. Connected clients (ACP bridges, etc.) never receive any error event.

Fix Action

Fixed

Fixed by PR: fix(gateway): surface resolved chat errors (https://github.com/openclaw/openclaw/pull/84953)
Fixed by PR: fix(gateway): broadcast idle timeout errors to clients after agent run started (https://github.com/openclaw/openclaw/pull/85176)

Code Example

{"type":"custom","customType":"openclaw:prompt-error","data":{"error":"LLM idle timeout (120s): no response from model | LLM idle timeout (120s): no response from model",...}}

---

if (!agentRunStarted) {
  // Agent never started → processes deliveredReplies, broadcasts final/error ✅
  broadcastChatFinal(...);
} else if (!hasBeforeAgentRunGate) {
  // Agent started → only updates transcript, NO broadcast ❌
  await emitUserTranscriptUpdate();
}

---

} else {
  // Agent started — check for error payloads that weren't streamed
  const errorPayloads = deliveredReplies
    .filter((entry) => entry.payload.isError);
  if (errorPayloads.length > 0) {
    const errorMsg = errorPayloads
      .map((entry) => entry.payload.text)
      .filter(Boolean)
      .join(" | ");
    broadcastChatError({
      context,
      runId: clientRunId,
      sessionKey,
      errorMessage: errorMsg,
    });
  } else if (!hasBeforeAgentRunGate) {
    await emitUserTranscriptUpdate().catch(...);
  }
}

RAW_BUFFERClick to expand / collapse

Bug Description

Reproduction

Start an agent session via gateway (e.g., through ACP bridge / Ki-Agents)
The agent begins processing — reads skills, makes tool calls (so agentRunStarted = true)
On a subsequent LLM call, the model fails to produce any token within the idle timeout window (default 120s)
The timeout error is logged to the session JSONL file but never reaches the client

Session Log Evidence

{"type":"custom","customType":"openclaw:prompt-error","data":{"error":"LLM idle timeout (120s): no response from model | LLM idle timeout (120s): no response from model",...}}

The session ends here — no final/error event is broadcast.

Root Cause

File: src/gateway/server-methods/chat.ts (.then() handler, ~line 2692 in main)

if (!agentRunStarted) {
  // Agent never started → processes deliveredReplies, broadcasts final/error ✅
  broadcastChatFinal(...);
} else if (!hasBeforeAgentRunGate) {
  // Agent started → only updates transcript, NO broadcast ❌
  await emitUserTranscriptUpdate();
}

The timeout error flows like this:

run.ts handles the timeout by returning an error payload ({ text: "...", isError: true }), not throwing an exception
The error payload is collected in deliveredReplies via the deliver callback
The .then() handler checks agentRunStarted — since the agent had started (it made tool calls), it's true
The code only calls emitUserTranscriptUpdate() — no broadcastChatError() or broadcastChatFinal() is called
Meanwhile, .catch() (which does call broadcastChatError()) is never reached because run.ts returned normally, not threw

Result: The error payload sits in deliveredReplies but is never broadcast. Connected clients (ACP bridges, etc.) never receive any error event.

Expected Behavior

Clients should receive a state: "error" chat event with the timeout error message, the same as other error scenarios.

Suggested Fix

In the .then() handler, when agentRunStarted = true, check deliveredReplies for payloads with isError: true. If found, call broadcastChatError() to notify connected clients:

} else {
  // Agent started — check for error payloads that weren't streamed
  const errorPayloads = deliveredReplies
    .filter((entry) => entry.payload.isError);
  if (errorPayloads.length > 0) {
    const errorMsg = errorPayloads
      .map((entry) => entry.payload.text)
      .filter(Boolean)
      .join(" | ");
    broadcastChatError({
      context,
      runId: clientRunId,
      sessionKey,
      errorMessage: errorMsg,
    });
  } else if (!hasBeforeAgentRunGate) {
    await emitUserTranscriptUpdate().catch(...);
  }
}

Environment

OpenClaw version: main branch (bde07ddb)
Model: glm-5-turbo (via anthropic-messages API)
Connection: ACP bridge (Ki-Agents gateway)
Idle timeout: 120s (default)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix LLM idle timeout error silently dropped when agentRunStarted is true [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Bug Description

Reproduction

Session Log Evidence

Root Cause

Expected Behavior

Suggested Fix

Environment

Still need to ship something?

TRENDING