openclaw - ✅(Solved) Fix [Feature]: Emit first-class plugin hooks for model failover and terminal all-models-failed outcomes [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70976Fetched 2026-04-24 10:37:13
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×2

Add first-class plugin hooks for model failover decisions and terminal "all models failed before reply" outcomes.

Root Cause

Problem to solve

OpenClaw already detects when a run fails over to another model and when a run collapses before reply because no usable model remains. Today those signals are mainly exposed through logs such as embedded_run_failover_decision and Embedded agent failed before reply: ....

Fix Action

Fixed

PR fix notes

PR #70990: feat(plugins): add model failover and terminal failure hooks

Description (problem / solution / changelog)

PR Draft — #70976 model failover hooks

Title

feat(plugins): add model failover and terminal failure hooks

Summary

  • Problem: OpenClaw exposes model failover and terminal all-models-failed outcomes mostly through logs, so plugins and operators cannot react in-process without log scraping.
  • Why it matters: This makes alerting and automation brittle, delayed, and hard to build on top of the existing failover path.
  • What changed: Added two fire-and-forget plugin hooks, model_failover and model_failure_terminal, and emitted them from the existing failover-decision and terminal before-reply failure seams.
  • What did NOT change (scope boundary): This PR does not add non-verbose fallback notices, workspace-context re-injection, or circuit-breaker/policy behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #70976
  • Related #65824
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: N/A
  • Missing detection / guardrail: N/A
  • Contributing context (if known): N/A

Regression Test Plan (if applicable)

N/A

User-visible / Behavior Changes

  • Plugins can now register model_failover hooks and receive structured failover decision events.
  • Plugins can now register model_failure_terminal hooks and receive structured before-reply terminal failure events, including fallback-attempt summaries for all-models-failed outcomes.
  • No default user-facing fallback messaging behavior changed.

Diagram (if applicable)

Before:
[model error] -> [log failover decision] -> [fallback or final failure]
                     \-> plugins must scrape logs

After:
[model error] -> [log failover decision + model_failover hook] -> [fallback or final failure]
                                                               \-> [model_failure_terminal hook on before-reply collapse]

Security Impact (required)

  • New permissions/capabilities? (Yes)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:
    • This adds new plugin observability hooks. To reduce leakage risk, the new event surface does not expose raw provider errors and uses the sanitized terminal message path for model_failure_terminal.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local source checkout
  • Model/provider: N/A (hook plumbing)
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default local test config

Steps

  1. Register a plugin hook for model_failover or model_failure_terminal using the hook runner test helpers.
  2. Trigger the corresponding hook runner path or failover seam.
  3. Verify the handler receives the structured event payload and that hook failures do not break the main flow.

Expected

  • model_failover fires on failover decisions.
  • model_failure_terminal fires on terminal before-reply failures.
  • Hook execution is fire-and-forget / fail-open.

Actual

  • Added targeted tests for both hook types and failover-observation hook emission.
  • Full plugin Vitest suite passed after the change.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • reviewed the source seams where failover decisions and terminal failures are emitted
    • verified the new hook names are registered in the plugin hook surface
    • verified targeted tests pass for runner plumbing and failover-observation emission
    • verified the full plugin Vitest suite passes
  • Edge cases checked:
    • no-op when no hook is registered
    • fail-open behavior when a hook throws
    • repeated decision emissions still invoke hooks correctly
  • What you did not verify:
    • full-project tsc --noEmit on this host, because it OOMs due to an existing machine limitation
    • a live end-to-end provider outage scenario outside the test suite

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: Hook payload scope may be seen as too narrow for v1 because it omits richer source-model context.
    • Mitigation: Keep this PR intentionally minimal and source-backed; richer payload fields can be added in follow-up work if maintainers want them.
  • Risk: Hook emission could accidentally interfere with failover paths if blocking.
    • Mitigation: Both hooks are wired through existing fire-and-forget runVoidHook behavior.
  • Risk: Terminal failure event content could leak too much provider detail.
    • Mitigation: model_failure_terminal uses the sanitized message path and structured attempt summaries instead of raw provider error blobs.

Changed files

  • src/agents/pi-embedded-runner/run.ts (modified, +8/-0)
  • src/agents/pi-embedded-runner/run/failover-observation.model-failover-hook.test.ts (added, +138/-0)
  • src/agents/pi-embedded-runner/run/failover-observation.ts (modified, +51/-1)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +36/-1)
  • src/plugins/hook-types.ts (modified, +45/-0)
  • src/plugins/hooks.model-failover.test.ts (added, +231/-0)
  • src/plugins/hooks.ts (modified, +30/-0)
RAW_BUFFERClick to expand / collapse

[Feature]: Emit first-class plugin hooks for model failover and terminal all-models-failed outcomes

Summary

Add first-class plugin hooks for model failover decisions and terminal "all models failed before reply" outcomes.

Problem to solve

OpenClaw already detects when a run fails over to another model and when a run collapses before reply because no usable model remains. Today those signals are mainly exposed through logs such as embedded_run_failover_decision and Embedded agent failed before reply: ....

That leaves a real automation gap:

  • alerting depends on polling or tailing gateway logs
  • plugins cannot react in-process to failover decisions
  • existing hooks like agent_end are too coarse to explain which model failed, whether fallback was attempted, or why the chain was exhausted

This is currently one part of the broader #65824 item about silent model fallback, but the hook/event gap is narrow enough to track separately.

Proposed solution

Add a dedicated plugin hook surface for:

  • model failover decisions during a run
  • terminal before-reply failures when no usable model remains

The goal is to let plugins and operators react to failover activity through structured in-process events instead of log parsing.

Alternatives considered

  • Keep relying on logs: works today, but it is brittle, delayed, and depends on parsing stderr text.
  • Reuse agent_end: too coarse, loses the actual failover chain and reason details.
  • Add more log lines only: improves observability but still does not create a real automation surface.

Impact

Affected: plugin authors, operators, alerting workflows, provider-health automation Severity: annoying to workflow-blocking during provider outages Frequency: intermittent but high-value when it happens Consequence: silent failures, delayed alerts, brittle monitoring, poor user-facing explanations

Evidence/examples

OpenClaw already has the relevant internal seam:

  • structured failover logging in src/agents/pi-embedded-runner/run/failover-observation.ts
  • terminal before-reply failure logging in src/agents/pi-embedded-runner/run.ts

That means the runtime already knows the right facts. The missing piece is a first-class plugin/event surface.

Related references:

  • #65824 meta issue, item #1 explicitly notes there is no externally exposed fallback event for hooks
  • #63821 asks for better failover observability via logs, which is related but weaker than a real hook surface
  • #61485 expands plugin hook capability in nearby lifecycle areas, showing precedent for growing the hook system
  • #64085 wants provider health / circuit-breaker behavior, which also benefits from structured failover events

Additional information

This issue is intentionally scoped only to the hook/event gap. It does not try to solve the other parts of #65824 item #1, such as non-verbose fallback notices or workspace-context re-injection.

extent analysis

TL;DR

Implement a dedicated plugin hook surface for model failover decisions and terminal before-reply failures to enable in-process event handling.

Guidance

  • Identify the existing internal seams in failover-observation.ts and run.ts that already log relevant failover information.
  • Design a new plugin hook surface that exposes structured events for model failover decisions and terminal failures.
  • Consider the requirements of plugin authors, operators, and alerting workflows to ensure the new hook surface provides sufficient information.
  • Review related issues (#65824, #63821, #61485, #64085) to understand the broader context and potential use cases.

Example

No specific code example can be provided without further implementation details, but the new hook surface might involve creating a custom event emitter or callback interface.

Notes

This solution focuses on addressing the hook/event gap and does not attempt to solve other aspects of #65824 item #1. The implementation should be mindful of the existing logging mechanisms and ensure a smooth transition to the new hook surface.

Recommendation

Apply a workaround by implementing the proposed dedicated plugin hook surface, as it provides a more robust and scalable solution than relying on log parsing or reusing existing hooks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Feature]: Emit first-class plugin hooks for model failover and terminal all-models-failed outcomes [1 pull requests, 1 participants]