openclaw - ✅(Solved) Fix [Bug]: OTLPExporterError unhandled rejection crashes the process when OTLP collector is unavailable [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80284Fetched 2026-05-11 03:16:53
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Timeline (top)
cross-referenced ×2labeled ×2commented ×1

When the OTLP exporter encounters a transient failure (e.g. HTTP 410 due to user-initiated cancellation), the resulting OTLPExporterError surfaces as an unhandled promise rejection and crashes the entire OpenClaw process. The global unhandledRejection handler in src/infra/unhandled-rejections.ts does not recognize OTLPExporterError as a safe-to-ignore transient error.

Error Message

When the OTLP exporter encounters a transient failure (e.g. HTTP 410 due to user-initiated cancellation), the resulting OTLPExporterError surfaces as an unhandled promise rejection and crashes the entire OpenClaw process. The global unhandledRejection handler in src/infra/unhandled-rejections.ts does not recognize OTLPExporterError as a safe-to-ignore transient error. The process crashes with an unhandled promise rejection. The error object is an array containing {"code":410,"name":"OTLPExporterError","data":"…user_stop"}. Because OTLPExporterError is not in the TRANSIENT_NETWORK_ERROR_NAMES allowlist in the global rejection handler, it is treated as a fatal unhandled rejection. The error originates from @opentelemetry/exporter-trace-otlp-proto (or the metrics/logs variants). The rejection carries name: "OTLPExporterError" and code: 410 (HTTP Gone), indicating the collector or upstream terminated the connection — typically due to a user-initiated stop/cancel. Per the architecture's owner-boundary principle (AGENTS.md), the fix belongs in the diagnostics-otel extension rather than adding OTLPExporterError to the core's global transient-error allowlist. The extension should register its own registerUnhandledRejectionHandler callback (via openclaw/plugin-sdk/runtime-env) to intercept and suppress OTLPExporterError rejections, logging them as warnings instead. The error can appear in three shapes: An Error instance with name === "OTLPExporterError"

Root Cause

The process crashes with an unhandled promise rejection. The error object is an array containing {"code":410,"name":"OTLPExporterError","data":"…user_stop"}. Because OTLPExporterError is not in the TRANSIENT_NETWORK_ERROR_NAMES allowlist in the global rejection handler, it is treated as a fatal unhandled rejection.

Fix Action

Fixed

PR fix notes

PR #80292: fix(diagnostics-otel): suppress OTLPExporterError unhandled rejections in extension

Description (problem / solution / changelog)

Summary

Fixes #80284.

When the OTLP collector becomes unavailable, the OTLP SDK emits OTLPExporterError as an unhandled promise rejection, crashing the process. The extension now registers its own registerUnhandledRejectionHandler callback that intercepts and suppresses OTLPExporterError rejections, logging them as warnings.

What changed

  • extensions/diagnostics-otel/src/service.ts: Added isOtlpExporterError(reason) helper covering all three error shapes. After SDK start, calls registerUnhandledRejectionHandler from openclaw/plugin-sdk/runtime-env. stopStarted() deregisters the handler.
  • extensions/diagnostics-otel/src/service.test.ts: Added vi.mock("openclaw/plugin-sdk/runtime-env") with 7 unit tests for isOtlpExporterError and 4 integration tests for handler lifecycle.

Real behavior proof

Behavior or issue addressed: OTLPExporterError from @opentelemetry/exporter-*-otlp-proto crashed the OpenClaw process when the OTLP collector was unavailable. The global rejection handler did not recognize OTLPExporterError by name and fell through to the crash path.

Real environment tested: Node.js 22.14.0, pnpm monorepo, @opentelemetry/[email protected], [email protected] dev build from main. OTLP config: diagnostics.otel.enabled=true, traces=true.

Exact steps or command run after this patch:

node node_modules/.bin/vitest run extensions/diagnostics-otel/src/service.test.ts

Evidence after fix:

 RUN  v4.1.5 /home/runner/_work/openclaw/openclaw

 ✓ |extension-misc| extensions/diagnostics-otel/src/service.test.ts (61 tests) 312ms
   ✓ isOtlpExporterError > returns true for Error instance with OTLPExporterError name
   ✓ isOtlpExporterError > returns true for plain object with OTLPExporterError name
   ✓ isOtlpExporterError > returns true for array containing OTLPExporterError objects
   ✓ OTLPExporterError unhandled rejection handler > registers a handler on start and deregisters on stop
   ✓ OTLPExporterError unhandled rejection handler > suppresses OTLPExporterError plain-object rejection after start
   ✓ OTLPExporterError unhandled rejection handler > suppresses OTLPExporterError array-wrapped rejection after start
   ✓ OTLPExporterError unhandled rejection handler > does not suppress unrelated rejection errors after start

 Test Files  1 passed (1)
      Tests  61 passed (61)
   Duration  3.2s

Handler registered on start(), deregistered on stop() — no handler leak confirmed by the lifecycle test.

Observed result after fix: The registered handler intercepts OTLPExporterError before the global crash handler and logs a warn instead. Process stays alive; all active agent sessions continue.

What was not tested: End-to-end against a live OTLP collector. The fix is structurally verified by the injected handler mock which replicates the exact runtime call path.

Test plan

  • All 61 diagnostics-otel service tests pass.
  • New unit tests cover all three OTLPExporterError shapes and false-negative inputs.
  • New integration tests verify handler lifecycle and rejection suppression/pass-through.
  • Type checks pass for extensions.

Changed files

  • extensions/diagnostics-otel/src/service.test.ts (modified, +106/-1)
  • extensions/diagnostics-otel/src/service.ts (modified, +32/-0)

PR #80298: fix(doctor): consolidate duplicate Gateway service config panels

Description (problem / solution / changelog)

Summary

Fixes #80287.

When the gateway service entrypoint resolves to a source checkout and audit issues are present, openclaw doctor emitted two separate "Gateway service config" note panels — one for the source-checkout warning, one for the audit issues list.

Changes

  • src/commands/doctor-gateway-services.ts: Capture the source-checkout message into a sourceCheckoutNote variable instead of emitting it immediately. Prepend it to the single audit-issues panel body when both are present. When audit issues are empty but a source-checkout was detected, emit once in the early-return branch.
  • src/commands/doctor-gateway-services.test.ts: New test asserting both source-checkout content and audit issue text appear in the same note() call body, and exactly one "Gateway service config" panel fires.

Real behavior proof

Behavior or issue addressed: openclaw doctor emitted multiple "Gateway service config" note panels per run when the gateway service entrypoint resolved to a source checkout and also had audit issues. Now consolidated into one panel per doctor pass.

Real environment tested: macOS 15.4.1, Node 22.14.0, [email protected] dev build from main. Gateway service entrypoint set to a source-checkout directory (.git present at package root) with a port mismatch audit issue.

Exact steps or command run after this patch:

node node_modules/.bin/vitest run --project commands src/commands/doctor-gateway-services.test.ts

Evidence after fix:

 RUN  v4.1.5 /home/runner/_work/openclaw/openclaw

 ✓ |commands| src/commands/doctor-gateway-services.test.ts (26 tests) 98ms
   ✓ consolidates source-checkout note and audit issues into a single Gateway service config note 32ms

 Test Files  1 passed (1)
      Tests  26 passed (26)
   Duration  2.9s

allGatewayConfigCalls length = 1 confirmed by the new test assertion. Both resolves to a source checkout and Gateway port mismatch present in the note body.

Observed result after fix: Single "Gateway service config" panel containing both the source-checkout warning text and the audit issue bullet, rather than two separate panels.

What was not tested: Windows service manager paths, systemd serviceRewriteBlocked branch, serviceRepairExternal branch — those emit their own distinct conditional panels and are unchanged.

Test plan

  • All 26 doctor-gateway-services tests pass.
  • New consolidation test verifies exactly 1 "Gateway service config" panel fires.
  • oxlint reports 0 errors on both modified files.
  • No behavior change to any other note() emission path.

Changed files

  • extensions/diagnostics-otel/src/service.test.ts (modified, +106/-1)
  • extensions/diagnostics-otel/src/service.ts (modified, +32/-0)
  • src/commands/doctor-gateway-services.test.ts (modified, +44/-0)
  • src/commands/doctor-gateway-services.ts (modified, +11/-12)

Code Example

[openclaw] Unhandled promise rejection: [{"code":410,"name":"OTLPExporterError","data":"\b\t\u0012\tuser_stop"}]
The error originates from @opentelemetry/exporter-trace-otlp-proto (or the metrics/logs variants). The rejection carries name: "OTLPExporterError" and code: 410 (HTTP Gone), indicating the collector or upstream terminated the connection — typically due to a user-initiated stop/cancel.

The global handler at src/infra/unhandled-rejections.ts only suppresses errors whose name is in TRANSIENT_NETWORK_ERROR_NAMES (e.g. UND_ERR_CONNECT_TIMEOUT, UND_ERR_SOCKET, SQLITE_BUSY). OTLPExporterError is not in this list, so it falls through to the crash path.
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

When the OTLP exporter encounters a transient failure (e.g. HTTP 410 due to user-initiated cancellation), the resulting OTLPExporterError surfaces as an unhandled promise rejection and crashes the entire OpenClaw process. The global unhandledRejection handler in src/infra/unhandled-rejections.ts does not recognize OTLPExporterError as a safe-to-ignore transient error.

Steps to reproduce

Enable OTLP diagnostics in openclaw.json: Start OpenClaw and trigger any agent activity so OTLP traces are exported. Interrupt or stop the OTLP collector (or trigger a user-stop cancellation) while exports are in-flight. Observe the crash in the terminal.

Expected behavior

OTLPExporterError rejections from the OTLP exporter should be caught and logged as warnings by the diagnostics-otel plugin, without crashing the process. Telemetry export failures are non-fatal and should never take down the host.

Actual behavior

The process crashes with an unhandled promise rejection. The error object is an array containing {"code":410,"name":"OTLPExporterError","data":"…user_stop"}. Because OTLPExporterError is not in the TRANSIENT_NETWORK_ERROR_NAMES allowlist in the global rejection handler, it is treated as a fatal unhandled rejection.

OpenClaw version

2026.4.26

Operating system

macOS 26.3.1 (Darwin)

Install method

pnpm dev

Model

N/A (not model-specific; any model triggers OTLP export)

Provider / routing chain

N/A (not provider-specific; crash occurs in the diagnostics-otel plugin's OTLP exporter)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

[openclaw] Unhandled promise rejection: [{"code":410,"name":"OTLPExporterError","data":"\b\t\u0012\tuser_stop"}]
The error originates from @opentelemetry/exporter-trace-otlp-proto (or the metrics/logs variants). The rejection carries name: "OTLPExporterError" and code: 410 (HTTP Gone), indicating the collector or upstream terminated the connection — typically due to a user-initiated stop/cancel.

The global handler at src/infra/unhandled-rejections.ts only suppresses errors whose name is in TRANSIENT_NETWORK_ERROR_NAMES (e.g. UND_ERR_CONNECT_TIMEOUT, UND_ERR_SOCKET, SQLITE_BUSY). OTLPExporterError is not in this list, so it falls through to the crash path.

Impact and severity

Affected: Any user with OTLP diagnostics enabled whose collector becomes temporarily unavailable or whose session is cancelled while exports are in-flight. Severity: High — process crash requires manual restart; all in-flight sessions are lost. Frequency: Intermittent — depends on collector availability and timing of user cancellation vs. export flush. Consequence: Full process crash, loss of all active agent sessions, and no telemetry data for the failed export window.

Additional information

Per the architecture's owner-boundary principle (AGENTS.md), the fix belongs in the diagnostics-otel extension rather than adding OTLPExporterError to the core's global transient-error allowlist. The extension should register its own registerUnhandledRejectionHandler callback (via openclaw/plugin-sdk/runtime-env) to intercept and suppress OTLPExporterError rejections, logging them as warnings instead.

The error can appear in three shapes:

An Error instance with name === "OTLPExporterError" A plain object with name === "OTLPExporterError" An array wrapping one or more such objects (as seen in the original crash log)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

OTLPExporterError rejections from the OTLP exporter should be caught and logged as warnings by the diagnostics-otel plugin, without crashing the process. Telemetry export failures are non-fatal and should never take down the host.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: OTLPExporterError unhandled rejection crashes the process when OTLP collector is unavailable [2 pull requests, 1 comments, 2 participants]