openclaw - 💡(How to fix) Fix Correlate Slack/channel message diagnostics into a single trace

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw's diagnostics/logging docs describe request-level trace correlation across gateway handling, diagnostic events, agent runs, model usage, and model calls. In a Slack-agent run on OpenClaw 2026.5.28, the relevant diagnostics arrived in Tempo as separate root traces instead of one correlated trace waterfall.

This makes it hard to answer the basic lifecycle question: "How long did one inbound Slack message take from receipt to reply, and where was the time spent?"

Error Message

Environment, sanitized:

Root Cause

Without trace correlation, operators can see isolated spans but cannot reliably inspect one message lifecycle as a single waterfall from receipt through dispatch, harness/model execution, and reply delivery.

Fix Action

Fix / Workaround

Without trace correlation, operators can see isolated spans but cannot reliably inspect one message lifecycle as a single waterfall from receipt through dispatch, harness/model execution, and reply delivery.

  • Create or preserve a per-inbound-message trace context at message.received / dispatch start.

  • Run message dispatch, session turn creation, harness execution, model usage, model calls, and reply delivery inside that active diagnostic trace context.

  • Ensure emitted diagnostics events carry traceId, spanId, and parentSpanId consistently.

  • Verify the OTel diagnostics plugin preserves parentage when converting diagnostic events to spans.

  • src/infra/diagnostic-events.ts

  • src/infra/diagnostic-trace-context.ts

  • gateway HTTP/WebSocket request scope setup

  • Slack/channel monitor or dispatch code

  • auto-reply/session turn creation path

  • embedded agent runner model diagnostics

  • diagnostics OTel exporter span conversion

RAW_BUFFERClick to expand / collapse

Summary

OpenClaw's diagnostics/logging docs describe request-level trace correlation across gateway handling, diagnostic events, agent runs, model usage, and model calls. In a Slack-agent run on OpenClaw 2026.5.28, the relevant diagnostics arrived in Tempo as separate root traces instead of one correlated trace waterfall.

This makes it hard to answer the basic lifecycle question: "How long did one inbound Slack message take from receipt to reply, and where was the time spent?"

Expected behavior

docs/logging.md describes trace correlation as follows:

  • Gateway HTTP requests and WebSocket frames establish an internal request trace scope.
  • Logs and diagnostic events in that async scope inherit the request trace if no explicit context is provided.
  • Agent-run and model-call traces should be children of the active request trace.
  • Local logs, diagnostic snapshots, OTel spans, and provider trace headers should be joinable by traceId.

For one inbound Slack message, I would therefore expect a single trace containing spans/events such as:

  • openclaw.message.processed
  • openclaw.harness.run
  • openclaw.model.usage
  • openclaw.model.call

Observed behavior

Environment, sanitized:

  • OpenClaw version: 2026.5.28
  • Build/revision observed: e932160
  • Runtime shape: OpenClaw agent with Slack channel enabled and @openclaw/diagnostics-otel enabled
  • OTel exporter: OTLP/HTTP to Tempo
  • OTel settings: traces enabled, metrics enabled, logs disabled, sample rate 1.0

After sending one Slack message and receiving a reply, Tempo accepted new spans and the expected OpenClaw span names were present. However, the lifecycle spans appeared as separate root traces rather than one parented trace.

Observed spans from the same Slack interaction:

Span nameDurationKey attributes
openclaw.message.processed5.009sopenclaw.channel=slack, openclaw.outcome=completed
openclaw.model.usage4.441sopenclaw.channel=slack, openclaw.provider=openai-codex, openclaw.model=gpt-5.5, token counts present
openclaw.harness.run4.212sopenclaw.harness.id=codex, openclaw.harness.plugin=codex, openclaw.outcome=completed
openclaw.model.call2.837sopenclaw.api=openai-codex-responses, openclaw.transport=stdio, openclaw.model=gpt-5.5
openclaw.message.processed6msopenclaw.channel=slack, openclaw.outcome=skipped, openclaw.reason=duplicate

Additional records:

  • The completed openclaw.message.processed span began at approximately 2026-05-31T18:24:21.587-04:00 and lasted about 5.009s.
  • The gateway delivered the Slack reply at approximately 2026-05-31T18:24:26-04:00, matching the message-processing span duration.
  • openclaw.model.usage began within the message-processing window and lasted about 4.441s.
  • openclaw.harness.run began within the message-processing window and lasted about 4.212s.
  • openclaw.model.call began within the harness/model window and lasted about 2.837s.
  • Tempo search tags included the expected values for openclaw.channel=slack, openclaw.provider, openclaw.model, openclaw.harness.*, gen_ai.*, and the span names above.

The data is internally consistent as one Slack turn, but trace parentage/correlation is missing.

Why this matters

Without trace correlation, operators can see isolated spans but cannot reliably inspect one message lifecycle as a single waterfall from receipt through dispatch, harness/model execution, and reply delivery.

Likely area to investigate

This may be specific to Slack/channel message ingestion rather than HTTP request handling. Slack socket-mode callbacks and other long-lived channel callbacks may not naturally run inside the same gateway HTTP/WebSocket request trace scope described in docs/logging.md.

Potential implementation shape:

  • Create or preserve a per-inbound-message trace context at message.received / dispatch start.
  • Run message dispatch, session turn creation, harness execution, model usage, model calls, and reply delivery inside that active diagnostic trace context.
  • Ensure emitted diagnostics events carry traceId, spanId, and parentSpanId consistently.
  • Verify the OTel diagnostics plugin preserves parentage when converting diagnostic events to spans.

Potential files/areas:

  • src/infra/diagnostic-events.ts
  • src/infra/diagnostic-trace-context.ts
  • gateway HTTP/WebSocket request scope setup
  • Slack/channel monitor or dispatch code
  • auto-reply/session turn creation path
  • embedded agent runner model diagnostics
  • diagnostics OTel exporter span conversion

Success criteria

  • A single inbound Slack message produces one OTel trace containing the message lifecycle, harness, model usage, and model call spans.
  • openclaw.message.processed, openclaw.harness.run, openclaw.model.usage, and openclaw.model.call share the same traceId.
  • Child spans have meaningful parentSpanId relationships instead of appearing as separate roots.
  • The duplicate/skipped message event is either clearly parented to the same inbound-message trace or intentionally documented as a separate trace.
  • File logs emitted during the same async lifecycle include the same top-level traceId where diagnostic trace context is available.
  • A regression test covers the channel/Slack-style inbound message path and asserts trace correlation across lifecycle and model diagnostics.
  • Live verification with @openclaw/diagnostics-otel and Tempo shows one trace waterfall for one Slack reply lifecycle.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

docs/logging.md describes trace correlation as follows:

  • Gateway HTTP requests and WebSocket frames establish an internal request trace scope.
  • Logs and diagnostic events in that async scope inherit the request trace if no explicit context is provided.
  • Agent-run and model-call traces should be children of the active request trace.
  • Local logs, diagnostic snapshots, OTel spans, and provider trace headers should be joinable by traceId.

For one inbound Slack message, I would therefore expect a single trace containing spans/events such as:

  • openclaw.message.processed
  • openclaw.harness.run
  • openclaw.model.usage
  • openclaw.model.call

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Correlate Slack/channel message diagnostics into a single trace