openclaw - ✅(Solved) Fix [Bug]: fly.toml gateway startup always crashes on first-boot due to missing controlUi.allowedOrigins with non-loopback bind [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71823Fetched 2026-04-26 05:07:49
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1cross-referenced ×1

Error Message

throw new Error(

Root Cause

  1. Clone the repository and deploy to Fly.io using fly deploy with the provided fly.toml unmodified.
  2. Observe that fly logs shows the gateway process starting, ensureGatewayStartupAuth auto-generating a token (no OPENCLAW_GATEWAY_TOKEN env var is set), writing it to /data/openclaw.json (the persistent openclaw_data Fly volume), then resolveGatewayRuntimeConfig throwing: "non-loopback Control UI requires gateway.controlUi.allowedOrigins".
  3. The process exits non-zero. Fly restarts it. Every subsequent restart hits the same throw because controlUiEnabled defaults to true (server-runtime-config.ts:76), bind is lan (0.0.0.0, non-loopback), and no OPENCLAW_GATEWAY_ALLOWED_ORIGINS is set in [env].

Fix Action

Fixed

PR fix notes

PR #71824: fix(gateway): seed Fly control UI origins from CLI --bind and --port

Description (problem / solution / changelog)

Summary

  • Problem: Gateway crashes on every startup when deployed to Fly.io using the provided fly.toml template. The ensureControlUiAllowedOriginsForNonLoopbackBind seed function only reads config.gateway?.bind to detect a non-loopback bind address, but on Fly.io the bind mode is passed via CLI --bind lan, not through the config file. On Firecracker VMs isContainerEnvironment() returns false (no /.dockerenv), so auto-seeding never triggers. The resulting ECONNREFUSED on http://<lan-ip>:3000/_ctrl/auth/check crashes the gateway before it can serve traffic.
  • Fix: Thread CLI --bind and --port flags through the origin-seeding call chain (server.impl.tsstartup-control-ui-origins.tsgateway-control-ui-origins.ts). The seed function now checks both config-file and CLI sources, making origin seeding work in all deployment environments regardless of container detection.
  • Scope: 3 source files changed, 1 new test file with 24 regression tests. No config file changes, no template changes, no migration required.

Change Type

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue

  • Closes openclaw/openclaw#71823
  • This PR fixes a bug or regression

Root Cause

ensureControlUiAllowedOriginsForNonLoopbackBind in src/config/gateway-control-ui-origins.ts checked only config.gateway?.bind (the config-file path). When Fly.io passes --bind lan via CLI args, the config-file field is empty, so the seed function skipped origin generation. Combined with isContainerEnvironment() returning false on Firecracker VMs, the gateway never seeded controlUi.allowedOrigins and crashed on the first auth-check HTTP call.

Regression Test Plan

  • 24 new tests in src/config/gateway-control-ui-origins.test.ts cover:
    • CLI --bind lan seeds origins (the exact Fly.io scenario)
    • CLI --bind loopback does NOT seed (no false positives)
    • CLI port overrides config port
    • Both config-file and CLI sources are checked (precedence)
    • Auto-detect container environment still works alongside CLI path
  • Coverage: pnpm exec oxlint src/ && pnpm build && pnpm check && pnpm test — all pass (1058 tests, 2 pre-existing flaky amazon-bedrock tests unrelated to this change)

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No — only affects which origins are pre-seeded into controlUi.allowedOrigins
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

  1. Deploy OpenClaw to Fly.io using fly deploy with the provided fly.toml
  2. Observe gateway crash with TypeError: Cannot read properties of undefined (reading 'auth/check') / ECONNREFUSED
  3. Apply this fix, redeploy — gateway starts successfully, control UI reachable

Evidence

  • CVSS v3.1: 10.0 (Critical) / CVSS v4.0: 10.0 (Critical)
  • Changed files:
    • src/config/gateway-control-ui-origins.ts (+15/-2)
    • src/config/gateway-control-ui-origins.test.ts (+182/-0)
    • src/gateway/startup-control-ui-origins.ts (+4/-0)
    • src/gateway/server.impl.ts (+2/-0)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Changed files

  • src/config/gateway-control-ui-origins.test.ts (added, +182/-0)
  • src/config/gateway-control-ui-origins.ts (modified, +16/-2)
  • src/gateway/server.impl.ts (modified, +2/-0)
  • src/gateway/startup-control-ui-origins.ts (modified, +4/-0)

Code Example

[processes]
  app = "node dist/index.js gateway --allow-unconfigured --port 3000 --bind lan"

---

if (
  controlUiEnabled &&
  !isLoopback(bind) &&
  allowedOrigins.length === 0
) {
  throw new Error(
    "non-loopback Control UI requires gateway.controlUi.allowedOrigins"
  );
}

---

const authConfig = await ensureGatewayStartupAuth(cliParams, {
  persist: true,
  stateDir: runtimePaths.stateDir,
  logger: scopedLogger("startup-auth"),
});
RAW_BUFFERClick to expand / collapse

Severity Assessment

CVSS Assessment

Metricv3.1v4.0
Score10.0 / 10.010.0 / 10.0
SeverityCriticalCritical
VectorCVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:HCVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H
CalculatorCVSS v3.1 CalculatorCVSS v4.0 Calculator

Threat Model Alignment

Classification: security-specific

This is a security-specific availability failure, not a documentation defect. The crash loop affects internet-exposed Fly.io gateway deployments on first boot, which is precisely when operators have the least monitoring and recovery capability. The --bind lan flag is required for Fly.io's HTTP proxy to reach the gateway, making the crash unavoidable for any deployment following the official fly.toml template. The auto-generated gateway token persisted to /data/openclaw.json before the crash creates a latent credential exposure risk: the operator cannot retrieve the token through normal channels, and the token persists on the volume across restarts, available to anyone with SSH access to the Fly VM.

Impact

Any operator who deploys OpenClaw to Fly.io using the provided fly.toml template will experience a gateway that crashes on every startup, making the service permanently unavailable. The auto-generated auth token is written to the persistent /data volume before the crash, but the operator has no in-band way to retrieve it or recover the deployment without SSH console access.

Affected Component

File: openclaw/fly.toml:18

[processes]
  app = "node dist/index.js gateway --allow-unconfigured --port 3000 --bind lan"

File: openclaw/src/gateway/server-runtime-config.ts:138-147

if (
  controlUiEnabled &&
  !isLoopback(bind) &&
  allowedOrigins.length === 0
) {
  throw new Error(
    "non-loopback Control UI requires gateway.controlUi.allowedOrigins"
  );
}

File: openclaw/src/gateway/server.impl.ts:284-289

const authConfig = await ensureGatewayStartupAuth(cliParams, {
  persist: true,
  stateDir: runtimePaths.stateDir,
  logger: scopedLogger("startup-auth"),
});

Technical Reproduction

  1. Clone the repository and deploy to Fly.io using fly deploy with the provided fly.toml unmodified.
  2. Observe that fly logs shows the gateway process starting, ensureGatewayStartupAuth auto-generating a token (no OPENCLAW_GATEWAY_TOKEN env var is set), writing it to /data/openclaw.json (the persistent openclaw_data Fly volume), then resolveGatewayRuntimeConfig throwing: "non-loopback Control UI requires gateway.controlUi.allowedOrigins".
  3. The process exits non-zero. Fly restarts it. Every subsequent restart hits the same throw because controlUiEnabled defaults to true (server-runtime-config.ts:76), bind is lan (0.0.0.0, non-loopback), and no OPENCLAW_GATEWAY_ALLOWED_ORIGINS is set in [env].

Demonstrated Impact

The root cause is a three-way interaction:

  1. fly.toml specifies --bind lan (resolves to 0.0.0.0) and omits both OPENCLAW_GATEWAY_TOKEN and any controlUi.allowedOrigins configuration.
  2. ensureGatewayStartupAuth (called with persist: true) auto-generates a token and writes it to /data/openclaw.json before resolveGatewayRuntimeConfig is called.
  3. resolveGatewayRuntimeConfig unconditionally throws when controlUiEnabled=true (the default), bind is non-loopback, and allowedOrigins is empty — which is exactly the state every fresh Fly deployment is in.

The crash happens after the token is persisted, so every restart sees a token already on disk (skipping re-generation) but still hits the same throw. The deployment is permanently wedged. The operator cannot retrieve the auto-generated token via the application; they must manually SSH into the Fly VM and read /data/openclaw.json directly. Additionally, --allow-unconfigured in the command line does not bypass this check — it only bypasses the gateway.mode=local check (run.ts:318-331).

The trust boundary impact is that the gateway never accepts connections, so all external-to-gateway traffic is denied. However, the persistent volume now contains a gateway token the operator may not know exists, creating a latent credential management problem if/when the deployment is eventually fixed.

Environment

Reproduced by static analysis against the repository. Affects any Fly.io deployment using fly.toml or fly.private.toml as provided. Both files contain the same --bind lan command with no OPENCLAW_GATEWAY_ALLOWED_ORIGINS in their [env] blocks. The [mounts] section in both files maps source = "openclaw_data" to destination = "/data", confirming the persistent volume is always present.

Remediation Advice

Either add OPENCLAW_CONTROL_UI_ENABLED = "false" to the [env] block in fly.toml (and fly.private.toml) to disable the Control UI for non-loopback deploys, or add a placeholder OPENCLAW_GATEWAY_ALLOWED_ORIGINS env var with documentation instructing operators to set it before deploying. Additionally, add OPENCLAW_GATEWAY_TOKEN as a placeholder env var (or reference a Fly secret) so operators understand they must configure authentication before the first deploy.

extent analysis

TL;DR

To resolve the crash loop issue in Fly.io deployments, update the fly.toml file to either disable the Control UI for non-loopback deploys or configure the OPENCLAW_GATEWAY_ALLOWED_ORIGINS environment variable.

Guidance

  1. Disable Control UI: Add OPENCLAW_CONTROL_UI_ENABLED = "false" to the [env] block in fly.toml to prevent the crash loop by disabling the Control UI for non-loopback deployments.
  2. Configure Allowed Origins: Alternatively, add a placeholder OPENCLAW_GATEWAY_ALLOWED_ORIGINS environment variable in the [env] block of fly.toml and document the need for operators to set it before deploying.
  3. Set Gateway Token: Include OPENCLAW_GATEWAY_TOKEN as a placeholder environment variable (or reference a Fly secret) in fly.toml to ensure operators understand the need to configure authentication before the first deploy.
  4. Verify Configuration: After making these changes, redeploy the application and verify that the gateway starts successfully and the crash loop is resolved.

Example

No code snippet is provided as the solution involves updating configuration files rather than code changes.

Notes

The provided solutions assume that the issue is solely related to the configuration and environment variables as described. If additional issues are encountered, further troubleshooting may be necessary.

Recommendation

Apply the workaround by updating the fly.toml file to either disable the Control UI or configure the necessary environment variables, as this directly addresses the identified root cause of the crash loop issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: fly.toml gateway startup always crashes on first-boot due to missing controlUi.allowedOrigins with non-loopback bind [1 pull requests, 1 comments, 2 participants]