openclaw - 💡(How to fix) Fix Plugin hooks register twice per gateway run (gateway loader + embedded acpx loader) — first ~7s window has telemetry-null hooks [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72943Fetched 2026-04-28 06:29:50
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
closed ×1commented ×1

Error Message

  • before_agent_start fires → handler closure tries to telemetry.tracer.startSpan(...)telemetry is null/undefined → silent failure or thrown error caught by the hook dispatcher.

Root Cause

In plain English: the same plugin's hooks register twice during gateway startup — once via the gateway's plugin loader at ~+2 seconds, and a second time via a separate "plugins" loader at ~+9 seconds (after the embedded acpx runtime starts). The plugin's service.start() callback only runs on the second pass. So during the first ~7 seconds after the gateway logs "ready", any agent invocation has hooks fire but produce no spans (the closure-resolved telemetry is null because service.start() hasn't run yet). It's a timing-dependent silent dropout — invisible under normal use, breaks first-traffic-after-restart workflows, and doubles diagnostic-log noise.

Fix Action

Fix / Workaround

  • before_agent_start fires → handler closure tries to telemetry.tracer.startSpan(...)telemetry is null/undefined → silent failure or thrown error caught by the hook dispatcher.
  • The hook handlers from the first (gateway-loader) registration don't have telemetry wired at all.
  • Even after telemetry init, both loaders' handlers fire for each event, doubling work.

Code Example

2026-04-27T15:51:29.946+00:00 [gateway] [otel] Registered message_received hook (via api.on)
2026-04-27T15:51:29.947+00:00 [gateway] [otel] Registered message_sent hook (via api.on)
... (full hook list, source = [gateway])

2026-04-27T15:51:38.084+00:00 [plugins] [otel] Subscribed to OpenClaw diagnostic events
2026-04-27T15:51:38.085+00:00 [plugins] [otel]Integrated with OpenClaw diagnostics (cost tracking enabled)
2026-04-27T15:51:38.085+00:00 [plugins] [otel]Observe PoC pipeline active
...
2026-04-27T15:51:38.454+00:00 [plugins] [otel] Registered message_received hook (via api.on)
... (full hook list AGAIN, source = [plugins])
RAW_BUFFERClick to expand / collapse

In plain English: the same plugin's hooks register twice during gateway startup — once via the gateway's plugin loader at ~+2 seconds, and a second time via a separate "plugins" loader at ~+9 seconds (after the embedded acpx runtime starts). The plugin's service.start() callback only runs on the second pass. So during the first ~7 seconds after the gateway logs "ready", any agent invocation has hooks fire but produce no spans (the closure-resolved telemetry is null because service.start() hasn't run yet). It's a timing-dependent silent dropout — invisible under normal use, breaks first-traffic-after-restart workflows, and doubles diagnostic-log noise.

Problem

In a single gateway start, the deep-observability plugin's hook-registration block is logged twice:

2026-04-27T15:51:29.946+00:00 [gateway] [otel] Registered message_received hook (via api.on)
2026-04-27T15:51:29.947+00:00 [gateway] [otel] Registered message_sent hook (via api.on)
... (full hook list, source = [gateway])

2026-04-27T15:51:38.084+00:00 [plugins] [otel] Subscribed to OpenClaw diagnostic events
2026-04-27T15:51:38.085+00:00 [plugins] [otel] ✅ Integrated with OpenClaw diagnostics (cost tracking enabled)
2026-04-27T15:51:38.085+00:00 [plugins] [otel] ✅ Observe PoC pipeline active
...
2026-04-27T15:51:38.454+00:00 [plugins] [otel] Registered message_received hook (via api.on)
... (full hook list AGAIN, source = [plugins])

The first registration is from the gateway's plugin loader at +2.0s after gateway boot. The second is from the embedded acpx runtime backend's plugin loader at +9.5s. Both register hook handlers under api.on(...). But the plugin's service.start() callback (which calls initTelemetry()) only fires on the SECOND pass — see the [otel] ✅ Observe PoC pipeline active line which only appears once (under [plugins] source, +9.5s).

Implication

During the gap between +2s (hooks registered) and +9.5s (telemetry initialized), agent invocations have their hooks fire but produce no spans:

  • before_agent_start fires → handler closure tries to telemetry.tracer.startSpan(...)telemetry is null/undefined → silent failure or thrown error caught by the hook dispatcher.
  • The hook handlers from the first (gateway-loader) registration don't have telemetry wired at all.
  • Even after telemetry init, both loaders' handlers fire for each event, doubling work.

This is a real source of dropped first-after-restart traffic, particularly for short-lived sessions or restart-resilience tests. It's also probably contributing to the partial sub-agent span coverage we see in outshift-open/openclaw-deep-observability#38 — sub-agent spawning races with the second loader's init.

Reproducer

Start the gateway in any sandbox or host environment with a plugin that has both:

  • A synchronous register() that calls api.registerHook() and api.on().
  • A service.start() callback that finishes async setup (e.g. OTel SDK init).

Tail the gateway log; observe two registration blocks at different timestamps with [gateway] then [plugins] source prefixes.

Tested-against: OpenClaw v2026.4.9 with outshift-open/openclaw-deep-observability (deep-observability main HEAD 2026-04-26).

Proposed fix

Option A — Single-load the plugin under one consistent registry.

Investigate whether the gateway's plugin loader and the embedded acpx runtime backend's plugin loader can share state. The double-load looks like an artifact of the embedded runtime being a separate plugin host. Coalescing them is the right shape but may require non-trivial refactoring.

Option B — Make hook registration idempotent.

The plugin loader should track already-registered hooks for each plugin id and refuse to double-register. Cheap to implement; eliminates the double-handler-fire side effect even if the timing gap remains.

Option C — Document the two-phase load explicitly so plugin authors know when they can rely on services.

Plugins would learn to put hook registration in register() (idempotent under double-load if option B lands) and ALL business logic that depends on services in service.start(). This matches what the deep-observability plugin already does — but the contract is undocumented, so most plugin authors won't know to follow it.

A + B + C together is the right shape; B + C alone closes the user-visible bug at lower cost.

Alternatives considered

  • Delay agent acceptance until both registration passes complete. Heavy; would require tracking "are we in the boot window" and queueing requests. Doesn't fit the gateway's startup model.
  • Disable the embedded acpx runtime backend by default. Out of scope — that runtime exists for valid reasons.

Test plan

  • Repro test: start gateway with the deep-observability plugin (or a minimal repro plugin), invoke an agent immediately after gateway logs "ready", assert that spans are produced (not dropped due to telemetry-not-yet-initialized).
  • Idempotency test (option B): assert hook registration is no-op on second register call for the same plugin/hook combination.
  • Regression: existing single-load plugins continue to work.

Risk / blast radius

  • Option B (idempotent hook registration) is the lowest-risk fix and addresses the user-visible double-handler-fire side effect. Backwards-compatible.
  • Option A (single-load coalescing) has higher refactor cost and may require coordination with the embedded runtime owners.
  • Option C (docs) is risk-free but only helps plugin authors who read the docs.

Open questions

  1. Is the double-load expected / by design, or accidental? If by design, what's the intent?
  2. Are the gateway plugin loader and embedded acpx runtime backend's plugin loader code paths the same module under different invocation contexts, or genuinely different code?
  3. Do other plugins (besides deep-observability) hit this? If so, are they known to be partially broken in the boot window?

Sibling findings filed/being filed: plugin install path (#72938), mDNS crash (#72939), async register truncated (#72941), gateway stop no-op (#72942).

extent analysis

TL;DR

Implement idempotent hook registration to prevent double-handler-fire side effects and ensure telemetry is properly initialized before agent invocations.

Guidance

  • Investigate the feasibility of Option B (idempotent hook registration), which involves tracking already-registered hooks for each plugin id and refusing to double-register.
  • Consider implementing Option C (document the two-phase load explicitly) to inform plugin authors about the expected behavior and ensure they put hook registration in register() and business logic in service.start().
  • Evaluate the potential benefits and risks of Option A (single-load the plugin under one consistent registry), which may require non-trivial refactoring.
  • Test the proposed fixes using the provided reproducer and test plan to ensure that spans are produced correctly and idempotency is maintained.

Example

No code snippet is provided as the issue does not contain sufficient information to generate a specific example.

Notes

The proposed fixes aim to address the user-visible bug and improve the overall reliability of the system. However, the root cause of the double-load issue and the intent behind the current design are still unclear, which may impact the effectiveness of the proposed solutions.

Recommendation

Apply Option B (idempotent hook registration) as it is the lowest-risk fix and addresses the user-visible double-handler-fire side effect, while also being backwards-compatible. This solution can be implemented independently of the other options and provides a clear improvement to the current behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Plugin hooks register twice per gateway run (gateway loader + embedded acpx loader) — first ~7s window has telemetry-null hooks [1 comments, 1 participants]