openclaw - โœ…(Solved) Fix [Bug]: Severe chat latency (30โ€“90s) on Docker VPS while direct OpenAI completes in <1s โ€” OpenClaw 2026.4.26 [1 pull requests, 3 comments, 4 participants]

Official PRs (โ€ฆ)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful ยท Quick feedback

Loadingโ€ฆ
GitHub stats
openclaw/openclaw#73428โ€ขFetched 2026-04-29 06:20:02
View on GitHub
Comments
3
Participants
4
Timeline
10
Reactions
0
Author
Timeline (top)
cross-referenced ร—5commented ร—3labeled ร—1subscribed ร—1

Every chat turn through OpenClaw WebChat takes 30โ€“90 seconds end-to-end on a 2 vCPU / 4 GB Docker VPS, while the same gpt-4o-mini call from inside the same container via curl returns in 0.83โ€“1.73 seconds.

Root Cause

Every chat turn through OpenClaw WebChat takes 30โ€“90 seconds end-to-end on a 2 vCPU / 4 GB Docker VPS, while the same gpt-4o-mini call from inside the same container via curl returns in 0.83โ€“1.73 seconds.

Fix Action

Fix / Workaround

Happy to provide more logs, run tracing patches, or test fixes.

PR fix notes

PR #73540: fix(gateway): resolve tools.effective cold misses synchronously

Description (problem / solution / changelog)

Summary

  • Problem: tools.effective cold cache misses used the same deferred setImmediate refresh path as stale background refreshes.
  • Why it matters: Control UI/WebChat can hit tools.effective on an interactive path; cold misses should start resolving immediately, and stale refreshes should not depend solely on the immediate queue.
  • What changed: cold misses now resolve synchronously through the shared cache/update helper; stale-cache requests still return stale data immediately but their background refresh has a bounded timer fallback; refresh logs redact session identifiers and sanitize errors.
  • What did NOT change (scope boundary): no Gateway startup prewarm. Local E2E showed blocking startup prewarm moved ~5s of first-request work into Gateway readiness, so this PR avoids startup-speed regression.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Related #73428
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: cold effective tool inventory cache misses deferred inventory resolution through setImmediate, adding an avoidable event-loop turn before the expensive work could begin.
  • Missing detection / guardrail: tests covered cache hits/coalescing/stale refreshes, but not cold-miss synchronicity or delayed-setImmediate fallback behavior.
  • Contributing context (if known): first inventory computation can still be expensive; that underlying cost should be profiled separately rather than moved into startup.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/server-methods/tools-effective.test.ts
  • Scenario the test should lock in:
    • cold cache misses call resolveEffectiveToolInventory synchronously
    • repeated requests still hit the fresh cache
    • stale cache responses still return immediately
    • stale background refresh falls back when setImmediate is delayed
  • Why this is the smallest reliable guardrail: it isolates cache/scheduler semantics without full Gateway startup or provider runtime noise.
  • Existing test that already covers this (if any): existing cache hit/coalescing/stale refresh tests remain in place.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Cold tools.effective cache misses begin resolving immediately instead of waiting for setImmediate. This avoids an extra deferral but does not claim to eliminate the underlying first inventory computation cost.

Diagram (if applicable)

Before:
cold tools.effective -> setImmediate -> resolve inventory -> cache -> response
stale tools.effective -> stale response + setImmediate refresh

After:
cold tools.effective -> resolve inventory -> cache -> response
stale tools.effective -> stale response + setImmediate refresh (timer fallback)

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS local dev host
  • Runtime/container: Node/pnpm workspace
  • Model/provider: N/A for targeted cache tests
  • Integration/channel (if any): Gateway Control UI/WebChat tools.effective path
  • Relevant config (redacted): local isolated Gateway fixture used for timing checks; plugins restricted to OpenAI + memory-core for parity with #73428 investigation.

Steps

  1. Seed an isolated Gateway state with agent:main:main.
  2. Start a real foreground Gateway with startup tracing.
  3. Connect over WebSocket as an operator/backend client.
  4. Call tools.effective repeatedly and compare first vs cached timings.
  5. Run targeted unit tests and changed gate.

Expected

  • No sidecars.tools-effective-prewarm startup trace.
  • Cold miss starts resolving synchronously.
  • Repeated requests hit cache.
  • Stale refresh fallback still runs when setImmediate is delayed.

Actual

  • No startup prewarm path remains in the diff.
  • Targeted tests, changed gate, and build pass.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Validation:

pnpm test src/gateway/server-methods/tools-effective.test.ts -- --reporter=verbose
# 1 file passed, 15 tests passed

pnpm check:changed
# passed

pnpm build
# passed

Local E2E comparison notes from the discarded startup-prewarm prototype:

origin/main (3 runs): first tools.effective p50 โ‰ˆ 5.85s, no startup prewarm
blocking startup-prewarm prototype: first tools.effective p50 โ‰ˆ 369ms, but sidecars.tools-effective-prewarm p50 โ‰ˆ 5.3s and Gateway ready regressed similarly
current PR: startup prewarm removed; no startup-speed regression from this change

Human Verification (required)

What I personally verified (not just CI), and how:

  • Verified scenarios:
    • targeted Gateway tests for cold miss, cache hit, stale refresh, delayed-setImmediate fallback
    • changed-gate typecheck/lint/import-cycle/runtime-loader guards
    • production build after production-source changes
    • local real-Gateway timing before removing the blocking startup-prewarm prototype
  • Edge cases checked:
    • unknown/invalid session handling remains rejected
    • admin-scoped callers still pass senderIsOwner=true
    • stale cached value is returned immediately while refresh happens later
    • delayed setImmediate fallback updates the cache
  • What you did not verify:
    • live 2 vCPU / 4 GB Docker VPS reproduction with WebChat and OpenAI

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: this does not remove the underlying cost of first inventory computation.
    • Mitigation: this PR avoids startup regression and narrows the fix to safe scheduler/cache semantics; deeper inventory laziness should be handled separately with profiling.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/gateway/server-methods/tools-effective.test.ts (modified, +79/-0)
  • src/gateway/server-methods/tools-effective.ts (modified, +86/-34)
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

Every chat turn through OpenClaw WebChat takes 30โ€“90 seconds end-to-end on a 2 vCPU / 4 GB Docker VPS, while the same gpt-4o-mini call from inside the same container via curl returns in 0.83โ€“1.73 seconds.

Steps to reproduce

  1. Deploy ghcr.io/openclaw/openclaw:2026.4.26 on Ubuntu 24.04 VPS (2 vCPU / 4 GB) via Docker Compose.
  2. Configure with OpenAI provider and gpt-4o-mini as primary model. Disable bonjour, browser, phone-control, talk-voice, acpx in plugins.entries. Leave openai and memory-core enabled.
  3. Open Control UI, wait for connection (chat.history and models.list complete in ~1.3s after Caddy WebSocket fix).
  4. Send the message: "What is 7+7?"
  5. Measure time from Send to full reply.
  6. From inside the same container, run curl directly to https://api.openai.com/v1/chat/completions with the same prompt and same model.
  7. Compare both timings.

Expected behavior

Direct OpenAI baseline from inside the same container, same model, same prompt: Run 1: 1.73s HTTP:200 Run 2: 0.83s HTTP:200 Run 3: 0.87s HTTP:200 Median: ~0.87s

Expected: chat turn through OpenClaw should complete within a small multiple of the underlying model call (a few seconds of orchestration overhead is reasonable).

Actual behavior

Measured chat turns through OpenClaw WebChat (Control UI v2026.4.26): "2+2" A=immediate, B=34s "5+5" A=immediate, B=36s "7+7" A=immediate, B>60s, sometimes never completes "hi" A=immediate, B=36s "who are you?" A=immediate, B=47s

Where A = time to first response chunk, B = time to full reply.

Recurring diagnostic in gateway logs during these turns: [diagnostic] stuck session: sessionId=unknown sessionKey=agent:main:main state=processing age=216s queueDepth=1

During the wait, the gateway main process state via /proc: state: S (sleeping) wchan: ep_poll syscall: 281 (epoll_pwait) I/O counter delta over 5s: rchar=0, wchar=0, read_bytes=0, write_bytes=0

So the gateway is not CPU-bound, not disk-bound, not in a busy loop โ€” it appears to be sleeping in epoll_wait while the chat turn is reported as "processing" with queueDepth=1.

OpenClaw version

2026.4.26

Operating system

Ubuntu 24.04.3 LTS, kernel 6.8.0-110-generic

Install method

docker (ghcr.io/openclaw/openclaw image via Docker Compose v5.1.3 / Docker 29.4.0)

Model

openai/gpt-4o-mini (also tested openai/gpt-4.1-mini โ€” no measurable improvement)

Provider / routing chain

openclaw -> openai (direct, https://api.openai.com/v1) No proxies, no routers, no LiteLLM, no OpenRouter.

Additional provider/model setup details

Representative chat-turn log timeline:

00:10:49 webchat connected 00:10:51 commands.list โœ“ 1054ms 00:10:52 device.pair.list โœ“ 1753ms 00:10:52 node.list โœ“ 1763ms 00:12:02 node.list โœ“ 27161ms 00:13:33 chat.history โœ“ 163489ms (before Caddy WS fix) 00:13:33 models.list โœ“ 163478ms 00:15:36 [diagnostic] stuck session: sessionKey=agent:main:main state=processing age=216s queueDepth=1

After Caddy WebSocket transport tuning (read_timeout 1h, flush_interval -1): chat.history: 163s -> 1.3s models.list: 163s -> 1.3s But agent-turn latency unchanged (still 30-90s).

Direct OpenAI baseline from inside container: Run 1: 1.73s HTTP:200 Run 2: 0.83s HTTP:200 Run 3: 0.87s HTTP:200

Process state during a stuck turn (cat /proc/<gateway-pid>/{wchan,status,syscall,stack,io}): state: S (sleeping) wchan: ep_poll syscall 281 (epoll_pwait) I/O counters over 5s: 0 byte delta on rchar/wchar/read_bytes/write_bytes

Open file descriptors during turn: runs.sqlite{,-wal,-shm}, gateway lock, eventfds, eventpolls. No suspicious activity.

Things attempted that did NOT resolve latency (only the first two affected behavior visibly):

  1. Disabled Bonjour via plugins.entries.bonjour.enabled=false (reduced startup noise, did not affect chat latency)
  2. Caddy WebSocket transport tuning (fixed UI bootstrap from 163s -> 1.3s, did not affect chat latency)
  3. Cleared plugin-runtime-deps cache (843MB -> 0, regenerated to 634MB on next start, no effect on chat latency)
  4. Blocked OpenRouter / LiteLLM via extra_hosts (pricing fetches fail in <50ms now, no effect on chat)
  5. Upgraded 2026.4.25 -> 2026.4.26 (fixed UI history-loss bug, did not fix chat latency)
  6. Tested via direct SSH tunnel bypassing Caddy entirely (same latency, so Caddy is not the cause)
  7. Disabled all bundled plugins except openai and memory-core
  8. Set mem_limit=1.5G, healthcheck start_period=200s
  9. Multiple full restarts and fresh state attempts
  10. Switched gpt-4o-mini -> gpt-4.1-mini (no measurable change)
  11. Workspace bootstrap files simplified (AGENTS.md 7850 -> 251 bytes, HEARTBEAT.md emptied) โ€” no effect on latency, but agent lost personality (read BOOTSTRAP.md and ran first-run dialogue), so reverted.

Cold-start time: 130โ€“150s consistently. Acceptable; not the issue.

Resource state during the issue: Memory: 3.8 GiB total, 2.7 GiB available Swap: 2.0 GiB total, ~100 MiB used CPU idle: ~95% Container memory: ~600 MB Disk: 32% used of 77 GB

Logs, screenshots, and evidence

Impact and severity

Affected: Single-user self-hosted setup (personal AI assistant on small Docker VPS). Likely affects any small-VPS Docker deployment based on observed pattern.

Severity: High for interactive chat use case. System is technically functional (messages eventually arrive) but unusable as a real-time assistant given 30โ€“90s per turn.

Frequency: Every single chat turn, including trivial single-token replies. 100% reproducible.

Consequence: OpenClaw cannot serve as an interactive chat companion in this configuration. Direct OpenAI usage and a co-hosted n8n agent on the same VPS both perform fast, so the VPS itself is not the bottleneck.

Additional information

Both 2026.4.25 and 2026.4.26 exhibit the same latency. 4.26 fixed the unrelated UI history-loss bug (chat history disappearing on reload โ€” confirmed fixed).

Open questions for maintainers:

  1. What tracing or debug envs would help pinpoint where time is spent inside an agent turn?
  2. Is there a way to disable memory-core entirely on a fresh install? It's default-enabled and not in the usual config flow.
  3. Could OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY=1 (mentioned in release notes as a deprecated break-glass) help here?
  4. Are there documented tunables for the agent runtime queue / session manager?

Happy to provide more logs, run tracing patches, or test fixes.

extent analysis

TL;DR

The most likely fix for the high latency in OpenClaw WebChat is to investigate and optimize the agent runtime queue and session manager, potentially by disabling the memory-core plugin or using environment variables like OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY.

Guidance

  • Investigate the agent runtime queue and session manager to identify where time is spent during an agent turn, potentially using tracing or debug environments.
  • Attempt to disable the memory-core plugin entirely, if possible, to see if it affects the latency.
  • Consider using the OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY environment variable, despite being deprecated, to see if it improves performance.
  • Review the OpenClaw documentation for tunables related to the agent runtime queue and session manager to optimize their configuration.

Example

No specific code snippet is provided as the issue seems to be related to configuration and plugin management rather than code-level changes.

Notes

The provided information suggests that the issue is not related to the VPS resources or the OpenAI model, as direct OpenAI usage and other applications on the same VPS perform well. The focus should be on optimizing the OpenClaw configuration and plugin management.

Recommendation

Apply a workaround by attempting to disable the memory-core plugin or using the OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY environment variable to see if it improves the performance, as these are the most direct leads from the provided information.

Vote matrix ยท Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loadingโ€ฆ

FAQ

Expected behavior

Direct OpenAI baseline from inside the same container, same model, same prompt: Run 1: 1.73s HTTP:200 Run 2: 0.83s HTTP:200 Run 3: 0.87s HTTP:200 Median: ~0.87s

Expected: chat turn through OpenClaw should complete within a small multiple of the underlying model call (a few seconds of orchestration overhead is reasonable).

Still need to ship something?

ร—6

Another batch ranked right after the header list โ€” different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - โœ…(Solved) Fix [Bug]: Severe chat latency (30โ€“90s) on Docker VPS while direct OpenAI completes in <1s โ€” OpenClaw 2026.4.26 [1 pull requests, 3 comments, 4 participants]