openclaw - 💡(How to fix) Fix [Feature] Error-recovery plugin hook (on_run_error / on_failed_reply) for silent run death [1 participants]

openclaw2026-04-08 12:20:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#63140•Fetched 2026-04-09 07:57:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Hugo0

Participants

Hugo0

Error Message

// Plugin hook — fires when an embedded run errors before producing a reply on_run_error(ctx: { sessionKey: string, // 'agent:main:discord:channel:1234' channelId?: string, // extracted from sessionKey when possible runId: string, error: Error, errorCategory: 'auth' | 'timeout' | 'overload' | 'unknown', attempts: number, // how many failover attempts were made elapsedMs: number, }): Promise<{ reply?: string } | void>

Fix Action

Fix / Workaround

Checked docs.openclaw.ai/automation/hooks and plugins/architecture on 2026.4.8: the 28 provider runtime hooks are all about model/transport/auth lifecycle (refreshOAuth, classifyFailoverReason, buildMissingAuthMessage, etc.) — none fire on a failed reply path. before_agent_reply (#20067) is success-path only. reply_dispatch is interception, not error recovery.

As a workaround we built a log-tailing bash watchdog that greps for Embedded agent failed before reply, extracts the channel ID from the preceding lane=session:... line, and posts a recovery message via openclaw message send --channel discord --target <id>. Works, but it's fragile (depends on log format) and should not be user-space code.

Code Example

// Plugin hook — fires when an embedded run errors before producing a reply
on_run_error(ctx: {
  sessionKey: string,        // 'agent:main:discord:channel:1234'
  channelId?: string,        // extracted from sessionKey when possible
  runId: string,
  error: Error,
  errorCategory: 'auth' | 'timeout' | 'overload' | 'unknown',
  attempts: number,          // how many failover attempts were made
  elapsedMs: number,
}): Promise<{ reply?: string } | void>

RAW_BUFFERClick to expand / collapse

Problem

When an embedded agent run fails before producing a reply (logged as Embedded agent failed before reply: <reason>), the error is surfaced only to the gateway log. On Discord specifically, users see the typing indicator disappear and assume the agent is still working — no recovery message is ever posted to the channel.

Observed in production

Today on a 2026.4.8 Discord deployment:

4 silent deaths in 5 hours (users had to reprompt)
1 confirmed case where [diagnostic] lane task error: lane=session:agent:main:discord:channel:<id> was logged but the channel received no message — user waited 25+ minutes

Proposed hook shape

// Plugin hook — fires when an embedded run errors before producing a reply
on_run_error(ctx: {
  sessionKey: string,        // 'agent:main:discord:channel:1234'
  channelId?: string,        // extracted from sessionKey when possible
  runId: string,
  error: Error,
  errorCategory: 'auth' | 'timeout' | 'overload' | 'unknown',
  attempts: number,          // how many failover attempts were made
  elapsedMs: number,
}): Promise<{ reply?: string } | void>

Return a { reply } object to post a recovery message to the originating channel; return void to leave the silence in place (current behavior).

#54964 (session zombie state after init failure — similar silent-death class)
#43661 (session hang during compaction timeout)
before_agent_reply hook (#20067) — success-path counterpart

extent analysis

TL;DR

Implement the proposed on_run_error plugin hook to handle embedded agent run failures and provide a recovery message to the user.

Guidance

Review the proposed on_run_error hook shape and implement it in the plugin to catch embedded agent run errors before reply.
Use the provided ctx object to extract relevant information, such as channelId and errorCategory, to determine the appropriate recovery message.
Return a { reply } object from the on_run_error hook to post a recovery message to the originating channel.
Consider logging the error and recovery message for auditing and debugging purposes.

Example

// Example implementation of the on_run_error hook
on_run_error(ctx: {
  sessionKey: string,
  channelId?: string,
  runId: string,
  error: Error,
  errorCategory: 'auth' | 'timeout' | 'overload' | 'unknown',
  attempts: number,
  elapsedMs: number,
}): Promise<{ reply?: string } | void> {
  const recoveryMessage = `Error occurred during agent run: ${ctx.errorCategory}`;
  return { reply: recoveryMessage };
}

Notes

The proposed on_run_error hook provides a way to handle embedded agent run failures and provide a recovery message to the user. However, the implementation details may vary depending on the specific requirements and constraints of the system.

Recommendation

Apply the proposed on_run_error hook workaround to handle embedded agent run failures and provide a recovery message to the user, as it provides a more robust and maintainable solution compared to the current log-tailing bash watchdog.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#agent execution #callback error #memory management #API rate limit #retriever error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Feature] Error-recovery plugin hook (on_run_error / on_failed_reply) for silent run death [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Problem

Observed in production

Proposed hook shape

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Feature] Error-recovery plugin hook (on_run_error / on_failed_reply) for silent run death [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Problem

Observed in production

Proposed hook shape

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING