openclaw - 💡(How to fix) Fix observability: emit per-MCP-server sub-spans inside bundle-tools so cold-start cost is attributable [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81595Fetched 2026-05-14 03:30:36
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Author
Timeline (top)
commented ×1cross-referenced ×1

The [trace:embedded-run] prep stages line and the timeline diagnostic (openclaw.diagnostics.v1 envelope) both expose bundle-tools as a single aggregate duration. On hosts with multiple stdio MCP servers configured, this stage routinely accounts for the dominant share of per-turn prep time — but it's a black box: there's no way to attribute the cost to a specific MCP server, plugin tool registry, or sub-step.

This makes regressions and duplicate-child situations (e.g. #75621) much harder to diagnose than they need to be, and makes proposed fixes like #80131 (tool bundling memoization) harder to verify after the fact.

Error Message

{"span":"mcp.tools_list","server":"desktop-automation","durationMs":11420,"toolCount":6,"warn":"slow"},

Root Cause

The [trace:embedded-run] prep stages line and the timeline diagnostic (openclaw.diagnostics.v1 envelope) both expose bundle-tools as a single aggregate duration. On hosts with multiple stdio MCP servers configured, this stage routinely accounts for the dominant share of per-turn prep time — but it's a black box: there's no way to attribute the cost to a specific MCP server, plugin tool registry, or sub-step.

This makes regressions and duplicate-child situations (e.g. #75621) much harder to diagnose than they need to be, and makes proposed fixes like #80131 (tool bundling memoization) harder to verify after the fact.

Fix Action

Fix / Workaround

Happy to test patches against this workload — I already have a CSV parser of the prep-stages line and a baseline corpus across two days of bimodal traffic.

Code Example

[trace:embedded-run] prep stages: phase=stream-ready totalMs=21441
  workspace-sandbox:1ms, skills:0ms, core-plugin-tools:320ms,
  bootstrap-context:27ms,
  bundle-tools:20931ms,            ← black box
  system-prompt:5ms, session-resource-loader:137ms,
  agent-session:0ms, stream-setup:20ms

---

{"span":"bundle-tools","durationMs":20931,"children":[
  {"span":"mcp.tools_list","server":"filesystem","durationMs":3120,"toolCount":11},
  {"span":"mcp.tools_list","server":"github","durationMs":1840,"toolCount":29},
  {"span":"mcp.tools_list","server":"desktop-automation","durationMs":11420,"toolCount":6,"warn":"slow"},
  {"span":"mcp.tools_list","server":"graphify-orchestrator","durationMs":1505,"toolCount":7},
  {"span":"mcp.tools_list","server":"graphify-trader","durationMs":1490,"toolCount":7},
  {"span":"mcp.tools_list","server":"graphify-openclaw","durationMs":1456,"toolCount":7},
  {"span":"core-plugin-tools.assemble","durationMs":80}
]}

---

[trace:embedded-run] bundle-tools detail: filesystem:3120ms(11) github:1840ms(29) desktop-automation:11420ms(6,slow) graphify-orchestrator:1505ms(7) graphify-trader:1490ms(7) graphify-openclaw:1456ms(7) core-plugin-tools.assemble:80ms
RAW_BUFFERClick to expand / collapse

Summary

The [trace:embedded-run] prep stages line and the timeline diagnostic (openclaw.diagnostics.v1 envelope) both expose bundle-tools as a single aggregate duration. On hosts with multiple stdio MCP servers configured, this stage routinely accounts for the dominant share of per-turn prep time — but it's a black box: there's no way to attribute the cost to a specific MCP server, plugin tool registry, or sub-step.

This makes regressions and duplicate-child situations (e.g. #75621) much harder to diagnose than they need to be, and makes proposed fixes like #80131 (tool bundling memoization) harder to verify after the fact.

Concrete observation

On a Linux/WSL host running OpenClaw 2026.5.7 with 6 stdio MCP servers configured (filesystem, github, desktop-automation via WSL→cmd.exe, and 3× graphify.serve), bundle-tools shows a bimodal distribution:

  • Fast mode: ~8 s (event loop quiet)
  • Slow mode: ~17–20 s (event loop contended by concurrent worker subagents)
  • Outliers >50 s (one timeout at the 30 s mark on filesystem MCP)
[trace:embedded-run] prep stages: phase=stream-ready totalMs=21441
  workspace-sandbox:1ms, skills:0ms, core-plugin-tools:320ms,
  bootstrap-context:27ms,
  bundle-tools:20931ms,            ← black box
  system-prompt:5ms, session-resource-loader:137ms,
  agent-session:0ms, stream-setup:20ms

All evidence points at sequential tools/list cold-start across the MCP children (re-spawned per prep cycle), with one slow child dominating, but there is no log signal that names which child is slow. Diagnosis required ad-hoc ps/strace/timing experiments.

Proposal

Emit nested sub-spans inside bundle-tools to the timeline diagnostic (openclaw.diagnostics.v1), one per MCP server / tool source. Also include a compact summary line in the existing aggregate [trace:embedded-run] prep stages log when diagnostics.flags includes timeline or a new bundle-tools.detail flag.

Suggested shape (JSONL timeline):

{"span":"bundle-tools","durationMs":20931,"children":[
  {"span":"mcp.tools_list","server":"filesystem","durationMs":3120,"toolCount":11},
  {"span":"mcp.tools_list","server":"github","durationMs":1840,"toolCount":29},
  {"span":"mcp.tools_list","server":"desktop-automation","durationMs":11420,"toolCount":6,"warn":"slow"},
  {"span":"mcp.tools_list","server":"graphify-orchestrator","durationMs":1505,"toolCount":7},
  {"span":"mcp.tools_list","server":"graphify-trader","durationMs":1490,"toolCount":7},
  {"span":"mcp.tools_list","server":"graphify-openclaw","durationMs":1456,"toolCount":7},
  {"span":"core-plugin-tools.assemble","durationMs":80}
]}

Compact one-liner (for the existing log channel):

[trace:embedded-run] bundle-tools detail: filesystem:3120ms(11) github:1840ms(29) desktop-automation:11420ms(6,slow) graphify-orchestrator:1505ms(7) graphify-trader:1490ms(7) graphify-openclaw:1456ms(7) core-plugin-tools.assemble:80ms

Either format is enough to identify the slowest server at a glance.

Why this matters now

Two open performance issues would directly benefit:

  • #80131 (per-request auth and tool bundling dominate TTFT) — proposes memoizing bundle-tools. With sub-spans we could quantify the actual saving per MCP server and prove the fix lands.
  • #75621 (duplicate stdio MCP children) — currently diagnosed by inspecting ps after the fact. Per-server timing would surface the duplicate (two children responding to the same (server, config) key) as duplicated sub-spans or anomalous totals.

It also helps users with many MCP servers decide which servers are pulling weight vs being dead-weight cold-start cost.

Backwards compatibility

  • New sub-spans should only emit when diagnostics.flags includes timeline (already gated) or a new dedicated flag — no change to default behavior.
  • The aggregate bundle-tools value in the existing [trace:embedded-run] prep stages line stays as-is, so downstream parsers keep working.

Environment

  • OpenClaw 2026.5.7 (eeef486)
  • Node v22.22.2
  • WSL2 / Linux 6.6.87.2-microsoft-standard-WSL2
  • 6 stdio MCP servers configured (filesystem, github, desktop-automation via cmd.exe, 3× graphify.serve)
  • Channels: Discord, Telegram
  • Gateway up ~52 h, ~300 prep traces/day, no regression — the cost is structural

Offer

Happy to test patches against this workload — I already have a CSV parser of the prep-stages line and a baseline corpus across two days of bimodal traffic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix observability: emit per-MCP-server sub-spans inside bundle-tools so cold-start cost is attributable [1 comments, 2 participants]