openclaw - ✅(Solved) Fix Gateway service pinned to old install path after upgrade; newer path breaks /v1/chat/completions reliability [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74047Fetched 2026-04-30 06:29:23
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
cross-referenced ×2commented ×1

A staged migration to a newer OpenClaw install exposed a split-install / service-path problem:

  • a newer user install existed at /home/ubuntu/.npm-global/lib/node_modules/openclaw (2026.4.26)
  • the active gateway service was still pinned to the older system install at /usr/lib/node_modules/openclaw (2026.4.21)
  • local/runtime tests against the newer install could recognize and use openai-codex/gpt-5.5
  • but when the gateway service was repointed to the newer install, /v1/chat/completions became unreliable / timed out
  • rolling the service back to the older path restored stability

This makes upgrades confusing and blocks safe promotion of newer models, because CLI/runtime behavior and service behavior diverge depending on which install path is active.

Error Message

ExecStart="/usr/bin/node" --unhandled-rejections=warn "/usr/lib/node_modules/openclaw/dist/entry.js" gateway --port 18789

  • fail with a clearer compatibility/installation error
  1. whether the gateway service should auto-track the preferred install, or at least warn when it is pinned to an older one

Root Cause

This makes upgrades confusing and blocks safe promotion of newer models, because CLI/runtime behavior and service behavior diverge depending on which install path is active.

Fix Action

Fixed

PR fix notes

PR #70306: fix(acp+gateway): clean final emit, fallback visibility, legacy unit resolve

Description (problem / solution / changelog)

Problem

Three related rough edges in ACP/Codex orchestration and gateway ops:

  1. Parent sessions could fail to surface a coherent final answer after sessions_spawn(runtime="acp"). Mid-flight snippet flushes compacted whitespace and truncated aggressively, so short multi-line Codex key/value output could collapse into a single line before any clean final answer reached the parent.
  2. Harness fallback visibility was weak. Operators could not easily answer "did this actually run in Codex or fall back?" without grepping logs.
  3. Legacy hosts running clawdbot-gateway.service exposed CLAWDBOT_SYSTEMD_UNIT, but the relevant resolver paths only honored OPENCLAW_SYSTEMD_UNIT, so status/restart tooling could target the canonical openclaw-gateway.service instead of the actual running unit.

Solution

  • src/agents/acp-spawn-parent-stream.ts
    • Accumulate non-commentary child output into a dedicated final buffer.
    • On phase === "end", emit a single normalized <agent> final: system event with preserved newlines, line-aware truncation, and whitespace cleanup.
    • On phase === "error", surface the partial transcript before the error text.
    • Keep the existing mid-flight snippet behavior for progress chatter.
  • src/agents/harness/selection.ts
    • Record a 16-slot in-memory ring of harness selection diagnostics with requested runtime, selected harness, fallback usage, and reason.
  • src/auto-reply/reply/commands-acp/diagnostics.ts
    • Extend /acp doctor to show recent harness selections, recent harness fallbacks, and the last 5 diagnostic entries.
  • src/daemon/systemd.ts and src/cli/update-cli/restart-helper.ts
    • Fall back to CLAWDBOT_SYSTEMD_UNIT when OPENCLAW_SYSTEMD_UNIT is unset so legacy hosts resolve the actual running unit correctly.

Files changed

  • src/agents/acp-spawn-parent-stream.ts
  • src/agents/acp-spawn-parent-stream.final-output.test.ts (new)
  • src/agents/harness/selection.ts
  • src/agents/harness/selection.test.ts
  • src/auto-reply/reply/commands-acp/diagnostics.ts
  • src/cli/update-cli/restart-helper.ts
  • src/cli/update-cli/restart-helper.test.ts
  • src/daemon/systemd.ts

Tests run

  • pnpm test src/agents/acp-spawn-parent-stream.test.ts
  • pnpm test src/agents/acp-spawn-parent-stream.final-output.test.ts
  • pnpm test src/agents/harness/selection.test.ts
  • pnpm test src/auto-reply/reply/commands-acp.test.ts
  • pnpm test src/cli/update-cli/restart-helper.test.ts
  • pnpm test src/daemon/inspect.test.ts
  • pnpm tsgo
  • pnpm format:check on touched files
  • repo pre-commit hook (oxlint / oxfmt) clean

Limitations

  • Parent follow-through is mitigated, not fully fixed.
  • sessions_spawn(runtime="acp") still returns an accepted envelope immediately; the clean child final reaches the parent via the relay side-channel rather than the tool result itself.
  • A true attach-and-wait or resume-on-child-completion primitive would be a larger behavior change and should land separately.
  • The new smoke coverage is unit-test based, not a live Codex roundtrip.
  • The harness diagnostic ring is process-local and resets on gateway restart.
  • Docs / changelog are not updated here.

Explicit note on parent follow-through

This change substantially improves the normal-path symptom where the parent appeared to "move on" without surfacing a clean final answer. The relay now emits a coherent final system event on completion, and the compact-whitespace collapse bug is covered by tests. However, the underlying architecture is unchanged: if the parent stops generating before the child-completion event arrives, the parent can still appear to move on. A true fix requires a larger orchestration change.

Changed files

  • src/agents/acp-spawn-parent-stream.final-output.test.ts (added, +224/-0)
  • src/agents/acp-spawn-parent-stream.ts (modified, +73/-2)
  • src/agents/harness/selection.test.ts (modified, +57/-1)
  • src/agents/harness/selection.ts (modified, +58/-0)
  • src/auto-reply/reply/commands-acp/diagnostics.ts (modified, +26/-0)
  • src/cli/update-cli/restart-helper.test.ts (modified, +14/-0)
  • src/cli/update-cli/restart-helper.ts (modified, +8/-0)
  • src/daemon/systemd.ts (modified, +11/-0)

Code Example

ExecStart="/usr/bin/node" --unhandled-rejections=warn "/usr/lib/node_modules/openclaw/dist/entry.js" gateway --port 18789
RAW_BUFFERClick to expand / collapse

Summary

A staged migration to a newer OpenClaw install exposed a split-install / service-path problem:

  • a newer user install existed at /home/ubuntu/.npm-global/lib/node_modules/openclaw (2026.4.26)
  • the active gateway service was still pinned to the older system install at /usr/lib/node_modules/openclaw (2026.4.21)
  • local/runtime tests against the newer install could recognize and use openai-codex/gpt-5.5
  • but when the gateway service was repointed to the newer install, /v1/chat/completions became unreliable / timed out
  • rolling the service back to the older path restored stability

This makes upgrades confusing and blocks safe promotion of newer models, because CLI/runtime behavior and service behavior diverge depending on which install path is active.

Environment / evidence

Current machine state after rollback:

  • /home/ubuntu/.npm-global/lib/node_modules/openclaw2026.4.26
  • /usr/lib/node_modules/openclaw2026.4.21
  • /usr/bin/openclaw currently resolves to the system install (/usr/lib/node_modules/openclaw/openclaw.mjs)
  • clawdbot-gateway.service is pinned to:
ExecStart="/usr/bin/node" --unhandled-rejections=warn "/usr/lib/node_modules/openclaw/dist/entry.js" gateway --port 18789
  • active model after rollback:
    • primary: openai-codex/gpt-5.4
    • fallback: openai-codex/gpt-5.4
  • clawdbot-gateway.service is active again
  • /v1/chat/completions smoke test returned OK after rollback
  • config validates after rollback

What I expected

  • upgrading OpenClaw in user space should not leave the live gateway effectively pinned to an older runtime without a clear warning or migration path
  • if the newer runtime can successfully resolve and use openai-codex/gpt-5.5 in local/native tests, the gateway-path migration should be either:
    • safe, or
    • fail with a clearer compatibility/installation error
  • the active install used by the CLI and the active install used by the gateway service should be easier to understand and reconcile

What happened instead

During the staged migration:

  1. the user install was updated to 2026.4.26
  2. the newer runtime could recognize openai-codex/gpt-5.5
  3. openai-codex/gpt-5.5 tested successfully through local/native paths
  4. the running gateway service was found to be hard-coded to the older /usr/lib/node_modules/openclaw install
  5. repointing the service to the updated install made /v1/chat/completions unreliable / timeout
  6. the migration had to be rolled back to restore a known-good gateway

User impact

  • newer models may appear to work in one path but not be safe to promote in the actual gateway service
  • upgrades can leave a split-brain state between user CLI install vs system service install
  • recovery currently requires manual service-path inspection and rollback

Final stable state after rollback

  • active model: openai-codex/gpt-5.4
  • fallback: openai-codex/gpt-5.4
  • temporary 5.5 config entries removed
  • clawdbot-gateway.service active
  • /v1/chat/completions smoke test OK
  • config validates

Backups created during rollback

  • /home/ubuntu/.openclaw/openclaw.json.bak-pre-gpt55-retry-20260429T030508Z
  • /home/ubuntu/.config/systemd/user/clawdbot-gateway.service.bak-pre-openclaw-20260429T0305

Questions / ask

Could you clarify or fix:

  1. what the supported upgrade path is when both a user install and a system install exist
  2. whether the gateway service should auto-track the preferred install, or at least warn when it is pinned to an older one
  3. why the newer service path can break /v1/chat/completions reliability even when local/native tests against that runtime succeed
  4. whether there is a recommended migration procedure for safely promoting openai-codex/gpt-5.5 through the gateway path

extent analysis

TL;DR

The issue can be resolved by ensuring the gateway service uses the same OpenClaw install as the user install, potentially by updating the clawdbot-gateway.service to point to the newer install.

Guidance

  • Verify the clawdbot-gateway.service file to ensure it is pointing to the correct OpenClaw install, and update it if necessary to point to the newer install at /home/ubuntu/.npm-global/lib/node_modules/openclaw.
  • Check the OpenClaw documentation for a supported upgrade path when both a user install and a system install exist, to ensure a smooth transition.
  • Test the gateway service with the newer install to identify any potential issues with /v1/chat/completions reliability.
  • Consider implementing a warning system to notify when the gateway service is pinned to an older install, to prevent similar issues in the future.

Example

No code example is provided as the issue is related to configuration and service management rather than code.

Notes

The issue seems to be related to the coexistence of two OpenClaw installs and the gateway service being pinned to the older one. The solution involves ensuring the gateway service uses the same install as the user install. However, the exact steps may vary depending on the specific OpenClaw configuration and environment.

Recommendation

Apply a workaround by updating the clawdbot-gateway.service to point to the newer OpenClaw install, and test the gateway service to ensure reliability. This should resolve the immediate issue, but it is recommended to investigate the supported upgrade path and implement a warning system to prevent similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Gateway service pinned to old install path after upgrade; newer path breaks /v1/chat/completions reliability [1 pull requests, 1 comments, 2 participants]