openclaw - ✅(Solved) Fix CLI silently falls back to embedded mode when gateway is unreachable [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71416Fetched 2026-04-26 05:13:03
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Participants
Timeline (top)
commented ×1cross-referenced ×1

When the gateway is unreachable, openclaw agent silently falls back to embedded mode and writes a single line to stderr that's easy to miss. There's no banner on stdout, no exit-code change, and the JSON result looks superficially like a normal gateway response. Operators who don't tail .err will believe a gateway-side run happened when it didn't.

Error Message

Gateway agent failed; falling back to embedded: Error: gateway closed (1006 abnormal closure (no close frame)): no close reason

Root Cause

Embedded vs. gateway runs diverge on observability (no OTel spans), on tool/plugin loadout, and on session-state semantics. Silent fallback creates a class of "I thought I tested X but I tested embedded-X" bugs that are very hard to diagnose. We hit this in our 2026-04-25 retest after the gateway crashed (separately filed).

Fix Action

Fixed

PR fix notes

PR #71478: fix(heartbeat): clamp scheduler delay to Node setTimeout cap (#71414)

Description (problem / solution / changelog)

Closes #71414.

Bug

When agents.defaults.heartbeat.every resolves to >2_147_483_647 ms (~24.85d), scheduleNext() in src/infra/heartbeat-runner.ts called setTimeout(fn, delay) with the raw oversized delay. Node clamps any delay > 2^31-1 to 1 ms, fires the callback, and the heartbeat re-arms with the same oversized value — a tight loop that floods logs with TimeoutOverflowWarning: ... Timeout duration was set to 1. and crashes the gateway with exit code 1.

Reproduces with the reporter's recipe: { "agents": { "defaults": { "heartbeat": { "every": "365d" } } } }.

Fix

Clamp the computed delay to HEARTBEAT_MAX_TIMEOUT_MS = 2_147_483_647 ms before calling setTimeout. Worst case is now one heartbeat every ~24.85d instead of crash-loop. Warn once per process when the clamp fires, so a misconfigured 365d is still visible without flooding logs.

This is a defense-in-depth fix at the scheduler layer. loadConfig-level rejection (suggested in the issue) is a broader change with more blast radius and a separate semantic question — some users likely want every: 365d to mean "effectively never", and the clamped behaviour matches that intent better than a hard error does.

Test

New src/infra/heartbeat-runner.scheduler.test.ts case: sets heartbeat.every: \"365d\" with fake timers, advances 60s, and asserts runSpy was never invoked. With the bug present, runSpy would have been called tens of thousands of times during the advance.

Lint clean: pnpm oxlint src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.scheduler.test.ts — 0 warnings, 0 errors.

Out of scope (deliberately)

  • Wrapper/supervisor auto-respawn after gateway exit code 1 (mentioned in the issue) — that lives in container/wrapper code, separate concern.
  • CLI silent embedded-mode fallback — tracked separately at #71416.

🤖 generated with assistance from Claude Code Co-authored-by: HCL [email protected]

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/runtime-auth-refresh.ts (modified, +2/-2)
  • src/gateway/call.ts (modified, +2/-1)
  • src/gateway/client.ts (modified, +3/-2)
  • src/gateway/probe.ts (modified, +3/-2)
  • src/gateway/server-chat.ts (modified, +3/-3)
  • src/gateway/server-methods/agent-job.ts (modified, +4/-5)
  • src/gateway/server-methods/agent-wait-dedupe.ts (modified, +2/-2)
  • src/infra/heartbeat-runner.scheduler.test.ts (modified, +18/-0)
  • src/infra/heartbeat-runner.timeout-warning.test.ts (added, +70/-0)
  • src/infra/heartbeat-runner.ts (modified, +11/-1)
  • src/utils/timer-delay.test.ts (added, +34/-0)
  • src/utils/timer-delay.ts (added, +19/-0)

Code Example

Gateway agent failed; falling back to embedded: Error: gateway closed (1006 abnormal closure (no close frame)): no close reason
RAW_BUFFERClick to expand / collapse

Summary

When the gateway is unreachable, openclaw agent silently falls back to embedded mode and writes a single line to stderr that's easy to miss. There's no banner on stdout, no exit-code change, and the JSON result looks superficially like a normal gateway response. Operators who don't tail .err will believe a gateway-side run happened when it didn't.

Reproduction

  1. Stop the gateway (e.g. kill the openclaw-gateway process inside the container, or unset its port).
  2. Run any openclaw agent invocation.
  3. Inspect stdout vs. stderr.

Observed behaviour

stdout (the --json payload) contains a normal-looking run result. stderr contains exactly:

Gateway agent failed; falling back to embedded: Error: gateway closed (1006 abnormal closure (no close frame)): no close reason

Exit code is 0. The result is not the same as a gateway run (no telemetry/observability spans land in the otel pipeline, sub-agent context can differ, tool registries may differ).

Expected behaviour

At least one of:

  • Print a clearly-marked banner on stdout (e.g. ⚠ EMBEDDED FALLBACK — gateway unreachable) before the JSON payload.
  • Emit a meta.transport: "embedded" field in the JSON result so downstream consumers can detect.
  • Exit non-zero unless --allow-embedded-fallback is passed (opt-in, not opt-out).

Why this matters

Embedded vs. gateway runs diverge on observability (no OTel spans), on tool/plugin loadout, and on session-state semantics. Silent fallback creates a class of "I thought I tested X but I tested embedded-X" bugs that are very hard to diagnose. We hit this in our 2026-04-25 retest after the gateway crashed (separately filed).

Environment

OpenClaw 2026.4.12 (1c0672b).

extent analysis

TL;DR

To address the silent fallback issue, modify the openclaw agent to print a clear banner on stdout or include a meta.transport field in the JSON result when falling back to embedded mode.

Guidance

  • Verify the current behavior by stopping the gateway and running the openclaw agent to inspect stdout and stderr.
  • Consider adding a --allow-embedded-fallback option to opt-in to silent fallbacks, with a default behavior of exiting non-zero when the gateway is unreachable.
  • Modify the openclaw agent to include a meta.transport: "embedded" field in the JSON result when in embedded mode.
  • Update the openclaw agent to print a clear banner on stdout before the JSON payload when falling back to embedded mode.

Example

No code snippet is provided due to the lack of specific implementation details.

Notes

The solution may require modifications to the openclaw agent codebase, and the exact implementation will depend on the underlying technology stack and requirements.

Recommendation

Apply a workaround by modifying the openclaw agent to print a clear banner or include a meta.transport field, as this will provide a clear indication of when the agent is operating in embedded mode.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix CLI silently falls back to embedded mode when gateway is unreachable [1 pull requests, 1 comments, 2 participants]