openclaw - 💡(How to fix) Fix Agent runtime per-turn startup overhead is ~17s on CPU-only systems, dominating end-to-end latency for fast cloud models [1 participants]

openclaw2026-04-08 20:44:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#63357•Fetched 2026-04-09 07:54:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

tmote

Participants

tmote

Timeline (top)

cross-referenced ×4

On a CPU-only host, every agent turn pays a ~17 second per-turn initialization cost between the channel adapter receiving the inbound message and the user message being written to the session JSONL. This happens BEFORE the LLM is called. With a fast cloud model like Claude Haiku 4.5 (whose own gateway-side LLM call measures ~1-2s), this overhead dominates end-to-end latency to the point where Teams users see ~12s minimum response times for trivial messages.

Root Cause

Fix Action

Fix / Workaround

Configure an agent with the minimal config above (or any agent on a CPU-only host).
Send a Teams "Hi" message via the configured msteams binding.

Observe in /tmp/openclaw/openclaw-<date>.log:

2026-04-08T19:49:04  [msteams]  received message
2026-04-08T19:49:04  [msteams]  dispatching to agent
...                  (~17 seconds of silence)

Observe in ~/.openclaw/agents/router/sessions/<session>.jsonl:

"timestamp": "2026-04-08T19:49:21.522Z"   # user message written
"timestamp": "2026-04-08T19:49:23.215Z"   # assistant reply written (LLM took 1.7s)

Observe [msteams] dispatch complete at 19:49:24.

Total dispatch time: ~20 seconds. LLM call: ~2 seconds. Per-turn startup overhead: ~17 seconds.

Not memorySearch: disabling memorySearch.enabled on the agent saved ~10s (22s → 12s) but a ~10s residual remained. Even with memorySearch fully off, the per-turn overhead is still ~17s before LLM call.
Not MCP servers: removed all MCP servers (microsoft-azure, microsoft-mssql, microsoft-office365) — they were each adding ~30s of bundle-mcp timeout per call. After removing them, the residual ~17s overhead remained.
Not tool schemas: tools.profile=minimal + explicit deny of 23 tools (only sessions_spawn allowed) — verified via tcpdump that the structured tools list passed to the LLM is small. Per-turn overhead unchanged.
Not bootstrap files: trimmed the agent's workspace bootstrap from 5668 chars (~1417 tokens) to 1447 chars (~362 tokens). Per-turn overhead unchanged.
Not humanDelay / blockStreamingDefault / typingMode: set humanDelay.mode=off, blockStreamingDefault=off, typingMode=instant. Per-turn overhead unchanged.
Not the LLM call itself: Direct LiteLLM curl to claude-haiku-4-5 returns in 0.6-2s. Direct curl to ollama for the local Qwen wrapper returns in ~2s warm. The LLM is fast.
Not BFS/cloudflared inbound transit: tcpdump on loopback :3978 shows the request reaching the gateway within ~1s of the user hitting send in Teams. The gap is INSIDE the gateway after received message → dispatching to agent, before any LLM call.
Not CLI cold start: tested via the gateway path (no --local, no CLI invocation per call). The per-turn overhead is observed via real Teams traffic going through cloudflared → msteams provider → gateway.

Code Example

{
  "id": "router",
  "name": "router",
  "model": "litellm/claude-haiku-4-5",
  "memorySearch": { "enabled": false },
  "skills": [],
  "tools": {
    "profile": "minimal",
    "alsoAllow": ["sessions_spawn"],
    "deny": ["read","edit","write","exec","process","canvas","nodes","cron","message","tts","gateway","agents_list","sessions_list","sessions_history","sessions_send","sessions_yield","subagents","session_status","web_search","web_fetch","image","pdf","browser"]
  },
  "subagents": {
    "allowAgents": ["pleres","family","code","research"],
    "requireAgentId": true
  }
}

---

2026-04-08T19:49:04  [msteams]  received message
   2026-04-08T19:49:04  [msteams]  dispatching to agent
   ...                  (~17 seconds of silence)

---

"timestamp": "2026-04-08T19:49:21.522Z"   # user message written
   "timestamp": "2026-04-08T19:49:23.215Z"   # assistant reply written (LLM took 1.7s)

RAW_BUFFERClick to expand / collapse

Bug: Agent runtime per-turn startup overhead is ~17s on CPU-only systems, dominating end-to-end latency for fast cloud models

Summary

Environment

OpenClaw 2026.4.5 (3e72c03)
Ubuntu 24.04.4 LTS, Node v24.14.1 via nvm
Hardware: 8 vCPU Intel Xeon Gold 6130 @ 2.10 GHz (2017 Skylake, AVX-512), no GPU, 16 GB RAM, no swap
Gateway runs as system-systemd unit (User=pleresadmin, LoadCredentialEncrypted= for secrets)
LiteLLM proxy on 127.0.0.1:4000 fronting Anthropic + other providers
Cloudflare Tunnel inbound for msteams

The router agent (where this manifests)

A router agent configured to triage inbound messages and delegate via sessions_spawn:

{
  "id": "router",
  "name": "router",
  "model": "litellm/claude-haiku-4-5",
  "memorySearch": { "enabled": false },
  "skills": [],
  "tools": {
    "profile": "minimal",
    "alsoAllow": ["sessions_spawn"],
    "deny": ["read","edit","write","exec","process","canvas","nodes","cron","message","tts","gateway","agents_list","sessions_list","sessions_history","sessions_send","sessions_yield","subagents","session_status","web_search","web_fetch","image","pdf","browser"]
  },
  "subagents": {
    "allowAgents": ["pleres","family","code","research"],
    "requireAgentId": true
  }
}

Steps to reproduce

Configure an agent with the minimal config above (or any agent on a CPU-only host).
Send a Teams "Hi" message via the configured msteams binding.

Observe in /tmp/openclaw/openclaw-<date>.log:

2026-04-08T19:49:04  [msteams]  received message
2026-04-08T19:49:04  [msteams]  dispatching to agent
...                  (~17 seconds of silence)

Observe in ~/.openclaw/agents/router/sessions/<session>.jsonl:

"timestamp": "2026-04-08T19:49:21.522Z"   # user message written
"timestamp": "2026-04-08T19:49:23.215Z"   # assistant reply written (LLM took 1.7s)

Observe [msteams] dispatch complete at 19:49:24.

Total dispatch time: ~20 seconds. LLM call: ~2 seconds. Per-turn startup overhead: ~17 seconds.

What we ruled out

We isolated this carefully across many iterations:

Not memorySearch: disabling memorySearch.enabled on the agent saved ~10s (22s → 12s) but a ~10s residual remained. Even with memorySearch fully off, the per-turn overhead is still ~17s before LLM call.
Not MCP servers: removed all MCP servers (microsoft-azure, microsoft-mssql, microsoft-office365) — they were each adding ~30s of bundle-mcp timeout per call. After removing them, the residual ~17s overhead remained.
Not tool schemas: tools.profile=minimal + explicit deny of 23 tools (only sessions_spawn allowed) — verified via tcpdump that the structured tools list passed to the LLM is small. Per-turn overhead unchanged.
Not bootstrap files: trimmed the agent's workspace bootstrap from 5668 chars (~1417 tokens) to 1447 chars (~362 tokens). Per-turn overhead unchanged.
Not humanDelay / blockStreamingDefault / typingMode: set humanDelay.mode=off, blockStreamingDefault=off, typingMode=instant. Per-turn overhead unchanged.
Not the LLM call itself: Direct LiteLLM curl to claude-haiku-4-5 returns in 0.6-2s. Direct curl to ollama for the local Qwen wrapper returns in ~2s warm. The LLM is fast.
Not BFS/cloudflared inbound transit: tcpdump on loopback :3978 shows the request reaching the gateway within ~1s of the user hitting send in Teams. The gap is INSIDE the gateway after received message → dispatching to agent, before any LLM call.
Not CLI cold start: tested via the gateway path (no --local, no CLI invocation per call). The per-turn overhead is observed via real Teams traffic going through cloudflared → msteams provider → gateway.

What seems to be happening in the 17s

We don't have visibility past the dispatching to agent log line at the default log level. The default log shows nothing during the gap between dispatch start and dispatch completion. Suspected sources (in priority order):

System prompt / agent context build per turn — re-reading workspace bootstrap files, computing tool schemas, building the structured system message
Subagent capability registration for sessions_spawn
Provider client warmup / token refresh
Some other synchronous initialization step

A debug-level log of the agent run pipeline (or a [agent.run] initialized in <ms>ms line bracketing the per-turn setup vs the LLM call) would make this trivial to diagnose for users.

Severity

Medium-high for CPU-only deployments. The 17s per-turn overhead caps the achievable end-to-end latency at ~20s for any message, regardless of how fast the LLM is. This makes OpenClaw effectively unusable for interactive personal assistant use cases on CPU-only hardware (even with cloud models). It also makes the value of fast/cheap cloud models like Haiku 4.5 invisible to operators on CPU hosts.

Workarounds attempted (none worked)

All the tunables listed in "What we ruled out" above
Restarting the gateway between calls — the overhead is per-turn, not first-call only
Using a streaming-disabled model entry (streaming: false) — no effect on the pre-LLM overhead

Suggested next actions

Add structured timing logs at info level for the per-turn agent run pipeline:
- [agent.run] context built in <ms>ms
- [agent.run] tool schemas in <ms>ms
- [agent.run] llm call started
- [agent.run] llm call completed in <ms>ms This would let users diagnose without filing a vague issue like this one.
Profile a single agent turn on a CPU-only host to find the actual hot spot
Cache per-turn invariants (system prompt template, tool schemas) so they're built once at agent registration time, not on every turn

Hardware note

This is a 2017-era Skylake CPU with no GPU. Faster CPUs (Sapphire Rapids, Genoa) would likely amortize this cost differently, but the issue is still that per-turn work scales with CPU clock instead of being effectively constant. For CPU-bound operators, 17s/turn is a ceiling that no model selection can fix.

Where we landed

We pivoted our router from local Qwen 2.5 7B (which was even slower on this CPU) to cloud Claude Haiku 4.5 expecting sub-3s end-to-end latency. We achieved sub-2s LLM time but the OpenClaw runtime adds 17s on top, giving us ~12s end-to-end perceived latency. This is acceptable but well below what's possible if the per-turn overhead were addressed.

extent analysis

TL;DR

The most likely fix for the 17-second per-turn startup overhead in OpenClaw on CPU-only systems is to add structured timing logs and profile a single agent turn to identify the actual hot spot, then cache per-turn invariants to reduce the overhead.

Guidance

Add structured timing logs at the info level for the per-turn agent run pipeline to diagnose the issue without filing a vague issue.
Profile a single agent turn on a CPU-only host to find the actual hot spot and identify the source of the 17-second overhead.
Cache per-turn invariants, such as system prompt templates and tool schemas, so they're built once at agent registration time, not on every turn, to reduce the overhead.
Consider upgrading to a faster CPU, such as Sapphire Rapids or Genoa, to amortize the cost of per-turn work, although this may not completely eliminate the issue.

Example

No specific code snippet is provided, but an example of how to add structured timing logs could be:

{
  "logLevel": "info",
  "agentRunLogs": [
    "[agent.run] context built in <ms>ms",
    "[agent.run] tool schemas in <ms>ms",
    "[agent.run] llm call started",
    "[agent.run] llm call completed in <ms>ms"
  ]
}

Notes

The issue is specific to CPU-only systems, and the 17-second overhead is a significant bottleneck for interactive personal assistant use cases. The suggested next actions are designed to help diagnose and address the issue, but may not completely eliminate the overhead.

Recommendation

Apply the suggested workarounds, including adding structured timing logs and caching per-turn invariants, to reduce the per-turn overhead and improve the overall performance of OpenClaw on CPU-only systems.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt template #dependency conflict #environment setup #docker error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Agent runtime per-turn startup overhead is ~17s on CPU-only systems, dominating end-to-end latency for fast cloud models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug: Agent runtime per-turn startup overhead is ~17s on CPU-only systems, dominating end-to-end latency for fast cloud models

Summary

Environment

The router agent (where this manifests)

Steps to reproduce

What we ruled out

What seems to be happening in the 17s

Severity

Workarounds attempted (none worked)

Suggested next actions

Hardware note

Where we landed

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Agent runtime per-turn startup overhead is ~17s on CPU-only systems, dominating end-to-end latency for fast cloud models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug: Agent runtime per-turn startup overhead is ~17s on CPU-only systems, dominating end-to-end latency for fast cloud models

Summary

Environment

The router agent (where this manifests)

Steps to reproduce

What we ruled out

What seems to be happening in the 17s

Severity

Workarounds attempted (none worked)

Suggested next actions

Hardware note

Where we landed

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING