openclaw - ✅(Solved) Fix [Bug]: Severe chat latency (30–90s) on Docker VPS while direct OpenAI completes in <1s — OpenClaw 2026.4.26 [1 pull requests, 3 comments, 4 participants]

Dimaoggg · 2026-04-28T08:41:41Z

[openclaw] Every chat turn through OpenClaw WebChat takes 30–90 seconds end-to-end on a 2 vCPU / 4 GB Docker VPS, while the same gpt-4o-mini call from inside t… Every chat turn through OpenClaw WebChat takes 30–90 seconds end-to-end on a 2 vCPU / 4 GB Docker VPS, while the same gpt-4o-mini call from inside the same container via curl returns in 0.83–1.73 seconds. # PR #73540: fix(gateway): resolve tools.effective cold misses synchronously - Repository: openclaw/openclaw - Author: amknight - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/73540 ## Description (problem / solution / changelog) ## Summary - Problem: `tools.effective` cold cache misses used the same deferred `setImmediate` refresh path as stale background refreshes. - Why it matters: Control UI/WebChat can hit `tools.effective` on an interactive path; cold misses should start resolving immediately, and stale refreshes should not depend solely on the immediate queue. - What changed: cold misses now resolve synchronously through the shared cache/update helper; stale-cache requests still return stale data immediately but their background refresh has a bounded timer fallback; refresh logs redact session identifiers and sanitize errors. - What did NOT change (scope boundary): no Gateway startup prewarm. Local E2E showed blocking startup prewarm moved ~5s of first-request work into Gateway readiness, so this PR avoids startup-speed regression. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related #73428 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - Root cause: cold effective tool inventory cache misses deferred inventory resolution through `setImmediate`, adding an avoidable event-loop turn before the expensive work could begin. - Missing detection / guardrail: tests covered cache hits/coalescing/stale refreshes, but not cold-miss synchronicity or delayed-`setImmediate` fallback behavior. - Contributing context (if known): first inventory computation can still be expensive; that underlying cost should be profiled separately rather than moved into startup. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [ ] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `src/gateway/server-methods/tools-effective.test.ts` - Scenario the test should lock in: - cold cache misses call `resolveEffectiveToolInventory` synchronously - repeated requests still hit the fresh cache - stale cache responses still return immediately - stale background refresh falls back when `setImmediate` is delayed - Why this is the smallest reliable guardrail: it isolates cache/scheduler semantics without full Gateway startup or provider runtime noise. - Existing test that already covers this (if any): existing cache hit/coalescing/stale refresh tests remain in place. - If no new test is added, why not: N/A ## User-visible / Behavior Changes Cold `tools.effective` cache misses begin resolving immediately instead of waiting for `setImmediate`. This avoids an extra deferral but does not claim to eliminate the underlying first inventory computation cost. ## Diagram (if applicable) ```text Before: cold tools.effective -> setImmediate -> resolve inventory -> cache -> response stale tools.effective -> stale response + setImmediate refresh After: cold tools.effective -> resolve inventory -> cache -> response stale tools.effective -> stale response + setImmediate refresh (timer fallback) ``` ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) No - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: N/A ## Repro + Verification ### Environment - OS: macOS local dev host - Runtime/container: Node/pnpm workspace - Model/provider: N/A for targeted cache tests - Integration/channel (if any): Gateway Control UI/WebChat `tools.effective` path - Relevant config (redacted): local isolated Gateway fixture used for timing checks; plugins restricted to OpenAI + memory-core for parity with #73428 investigation. ### Steps 1. Seed an isolated Gateway state with `agent:main:main`. 2. Start a real foreground Gateway with startup tracing. 3. Connect over WebSocket as an operator/backend client. 4. Call `tools.effective` repeatedly and compare first vs cached timings. 5. Run targeted unit tests and changed gate. ### Expected - No `sidecars.tools-effective-prewarm` startup trace. - Cold miss starts resolving synchronously

openclaw2026-04-28 08:41:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#73428•Fetched 2026-04-29 06:20:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×5commented ×3labeled ×1subscribed ×1

Every chat turn through OpenClaw WebChat takes 30–90 seconds end-to-end on a 2 vCPU / 4 GB Docker VPS, while the same gpt-4o-mini call from inside the same container via curl returns in 0.83–1.73 seconds.

Root Cause

Fix Action

Fix / Workaround

Happy to provide more logs, run tracing patches, or test fixes.

PR fix notes

PR #73540: fix(gateway): resolve tools.effective cold misses synchronously

Repository: openclaw/openclaw
Author: amknight
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/73540

Description (problem / solution / changelog)

Summary

Problem: tools.effective cold cache misses used the same deferred setImmediate refresh path as stale background refreshes.
Why it matters: Control UI/WebChat can hit tools.effective on an interactive path; cold misses should start resolving immediately, and stale refreshes should not depend solely on the immediate queue.
What changed: cold misses now resolve synchronously through the shared cache/update helper; stale-cache requests still return stale data immediately but their background refresh has a bounded timer fallback; refresh logs redact session identifiers and sanitize errors.
What did NOT change (scope boundary): no Gateway startup prewarm. Local E2E showed blocking startup prewarm moved ~5s of first-request work into Gateway readiness, so this PR avoids startup-speed regression.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Related #73428
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: cold effective tool inventory cache misses deferred inventory resolution through setImmediate, adding an avoidable event-loop turn before the expensive work could begin.
Missing detection / guardrail: tests covered cache hits/coalescing/stale refreshes, but not cold-miss synchronicity or delayed-setImmediate fallback behavior.
Contributing context (if known): first inventory computation can still be expensive; that underlying cost should be profiled separately rather than moved into startup.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/gateway/server-methods/tools-effective.test.ts
Scenario the test should lock in:
- cold cache misses call resolveEffectiveToolInventory synchronously
- repeated requests still hit the fresh cache
- stale cache responses still return immediately
- stale background refresh falls back when setImmediate is delayed
Why this is the smallest reliable guardrail: it isolates cache/scheduler semantics without full Gateway startup or provider runtime noise.
Existing test that already covers this (if any): existing cache hit/coalescing/stale refresh tests remain in place.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Cold tools.effective cache misses begin resolving immediately instead of waiting for setImmediate. This avoids an extra deferral but does not claim to eliminate the underlying first inventory computation cost.

Diagram (if applicable)

Before:
cold tools.effective -> setImmediate -> resolve inventory -> cache -> response
stale tools.effective -> stale response + setImmediate refresh

After:
cold tools.effective -> resolve inventory -> cache -> response
stale tools.effective -> stale response + setImmediate refresh (timer fallback)

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS local dev host
Runtime/container: Node/pnpm workspace
Model/provider: N/A for targeted cache tests
Integration/channel (if any): Gateway Control UI/WebChat tools.effective path
Relevant config (redacted): local isolated Gateway fixture used for timing checks; plugins restricted to OpenAI + memory-core for parity with #73428 investigation.

Steps

Seed an isolated Gateway state with agent:main:main.
Start a real foreground Gateway with startup tracing.
Connect over WebSocket as an operator/backend client.
Call tools.effective repeatedly and compare first vs cached timings.
Run targeted unit tests and changed gate.

Expected

No sidecars.tools-effective-prewarm startup trace.
Cold miss starts resolving synchronously.
Repeated requests hit cache.
Stale refresh fallback still runs when setImmediate is delayed.

Actual

No startup prewarm path remains in the diff.
Targeted tests, changed gate, and build pass.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Validation:

pnpm test src/gateway/server-methods/tools-effective.test.ts -- --reporter=verbose
# 1 file passed, 15 tests passed

pnpm check:changed
# passed

pnpm build
# passed

Local E2E comparison notes from the discarded startup-prewarm prototype:

origin/main (3 runs): first tools.effective p50 ≈ 5.85s, no startup prewarm
blocking startup-prewarm prototype: first tools.effective p50 ≈ 369ms, but sidecars.tools-effective-prewarm p50 ≈ 5.3s and Gateway ready regressed similarly
current PR: startup prewarm removed; no startup-speed regression from this change

Human Verification (required)

What I personally verified (not just CI), and how:

Verified scenarios:
- targeted Gateway tests for cold miss, cache hit, stale refresh, delayed-setImmediate fallback
- changed-gate typecheck/lint/import-cycle/runtime-loader guards
- production build after production-source changes
- local real-Gateway timing before removing the blocking startup-prewarm prototype
Edge cases checked:
- unknown/invalid session handling remains rejected
- admin-scoped callers still pass senderIsOwner=true
- stale cached value is returned immediately while refresh happens later
- delayed setImmediate fallback updates the cache
What you did not verify:
- live 2 vCPU / 4 GB Docker VPS reproduction with WebChat and OpenAI

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: this does not remove the underlying cost of first inventory computation.
- Mitigation: this PR avoids startup regression and narrows the fix to safe scheduler/cache semantics; deeper inventory laziness should be handled separately with profiling.

Changed files

CHANGELOG.md (modified, +1/-0)
src/gateway/server-methods/tools-effective.test.ts (modified, +79/-0)
src/gateway/server-methods/tools-effective.ts (modified, +86/-34)

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

Steps to reproduce

Deploy ghcr.io/openclaw/openclaw:2026.4.26 on Ubuntu 24.04 VPS (2 vCPU / 4 GB) via Docker Compose.
Configure with OpenAI provider and gpt-4o-mini as primary model. Disable bonjour, browser, phone-control, talk-voice, acpx in plugins.entries. Leave openai and memory-core enabled.
Open Control UI, wait for connection (chat.history and models.list complete in ~1.3s after Caddy WebSocket fix).
Send the message: "What is 7+7?"
Measure time from Send to full reply.
From inside the same container, run curl directly to https://api.openai.com/v1/chat/completions with the same prompt and same model.
Compare both timings.

Expected behavior

Direct OpenAI baseline from inside the same container, same model, same prompt: Run 1: 1.73s HTTP:200 Run 2: 0.83s HTTP:200 Run 3: 0.87s HTTP:200 Median: ~0.87s

Expected: chat turn through OpenClaw should complete within a small multiple of the underlying model call (a few seconds of orchestration overhead is reasonable).

Actual behavior

Measured chat turns through OpenClaw WebChat (Control UI v2026.4.26): "2+2" A=immediate, B=34s "5+5" A=immediate, B=36s "7+7" A=immediate, B>60s, sometimes never completes "hi" A=immediate, B=36s "who are you?" A=immediate, B=47s

Where A = time to first response chunk, B = time to full reply.

Recurring diagnostic in gateway logs during these turns: [diagnostic] stuck session: sessionId=unknown sessionKey=agent:main:main state=processing age=216s queueDepth=1

During the wait, the gateway main process state via /proc: state: S (sleeping) wchan: ep_poll syscall: 281 (epoll_pwait) I/O counter delta over 5s: rchar=0, wchar=0, read_bytes=0, write_bytes=0

So the gateway is not CPU-bound, not disk-bound, not in a busy loop — it appears to be sleeping in epoll_wait while the chat turn is reported as "processing" with queueDepth=1.

OpenClaw version

2026.4.26

Operating system

Ubuntu 24.04.3 LTS, kernel 6.8.0-110-generic

Install method

docker (ghcr.io/openclaw/openclaw image via Docker Compose v5.1.3 / Docker 29.4.0)

Model

openai/gpt-4o-mini (also tested openai/gpt-4.1-mini — no measurable improvement)

Provider / routing chain

openclaw -> openai (direct, https://api.openai.com/v1) No proxies, no routers, no LiteLLM, no OpenRouter.

Additional provider/model setup details

Representative chat-turn log timeline:

00:10:49 webchat connected 00:10:51 commands.list ✓ 1054ms 00:10:52 device.pair.list ✓ 1753ms 00:10:52 node.list ✓ 1763ms 00:12:02 node.list ✓ 27161ms 00:13:33 chat.history ✓ 163489ms (before Caddy WS fix) 00:13:33 models.list ✓ 163478ms 00:15:36 [diagnostic] stuck session: sessionKey=agent:main:main state=processing age=216s queueDepth=1

After Caddy WebSocket transport tuning (read_timeout 1h, flush_interval -1): chat.history: 163s -> 1.3s models.list: 163s -> 1.3s But agent-turn latency unchanged (still 30-90s).

Direct OpenAI baseline from inside container: Run 1: 1.73s HTTP:200 Run 2: 0.83s HTTP:200 Run 3: 0.87s HTTP:200

Process state during a stuck turn (cat /proc/<gateway-pid>/{wchan,status,syscall,stack,io}): state: S (sleeping) wchan: ep_poll syscall 281 (epoll_pwait) I/O counters over 5s: 0 byte delta on rchar/wchar/read_bytes/write_bytes

Open file descriptors during turn: runs.sqlite{,-wal,-shm}, gateway lock, eventfds, eventpolls. No suspicious activity.

Things attempted that did NOT resolve latency (only the first two affected behavior visibly):

Disabled Bonjour via plugins.entries.bonjour.enabled=false (reduced startup noise, did not affect chat latency)
Caddy WebSocket transport tuning (fixed UI bootstrap from 163s -> 1.3s, did not affect chat latency)
Cleared plugin-runtime-deps cache (843MB -> 0, regenerated to 634MB on next start, no effect on chat latency)
Blocked OpenRouter / LiteLLM via extra_hosts (pricing fetches fail in <50ms now, no effect on chat)
Upgraded 2026.4.25 -> 2026.4.26 (fixed UI history-loss bug, did not fix chat latency)
Tested via direct SSH tunnel bypassing Caddy entirely (same latency, so Caddy is not the cause)
Disabled all bundled plugins except openai and memory-core
Set mem_limit=1.5G, healthcheck start_period=200s
Multiple full restarts and fresh state attempts
Switched gpt-4o-mini -> gpt-4.1-mini (no measurable change)
Workspace bootstrap files simplified (AGENTS.md 7850 -> 251 bytes, HEARTBEAT.md emptied) — no effect on latency, but agent lost personality (read BOOTSTRAP.md and ran first-run dialogue), so reverted.

Cold-start time: 130–150s consistently. Acceptable; not the issue.

Resource state during the issue: Memory: 3.8 GiB total, 2.7 GiB available Swap: 2.0 GiB total, ~100 MiB used CPU idle: ~95% Container memory: ~600 MB Disk: 32% used of 77 GB

Logs, screenshots, and evidence

Impact and severity

Affected: Single-user self-hosted setup (personal AI assistant on small Docker VPS). Likely affects any small-VPS Docker deployment based on observed pattern.

Severity: High for interactive chat use case. System is technically functional (messages eventually arrive) but unusable as a real-time assistant given 30–90s per turn.

Frequency: Every single chat turn, including trivial single-token replies. 100% reproducible.

Consequence: OpenClaw cannot serve as an interactive chat companion in this configuration. Direct OpenAI usage and a co-hosted n8n agent on the same VPS both perform fast, so the VPS itself is not the bottleneck.

Additional information

Both 2026.4.25 and 2026.4.26 exhibit the same latency. 4.26 fixed the unrelated UI history-loss bug (chat history disappearing on reload — confirmed fixed).

Open questions for maintainers:

What tracing or debug envs would help pinpoint where time is spent inside an agent turn?
Is there a way to disable memory-core entirely on a fresh install? It's default-enabled and not in the usual config flow.
Could OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY=1 (mentioned in release notes as a deprecated break-glass) help here?
Are there documented tunables for the agent runtime queue / session manager?

Happy to provide more logs, run tracing patches, or test fixes.

extent analysis

TL;DR

The most likely fix for the high latency in OpenClaw WebChat is to investigate and optimize the agent runtime queue and session manager, potentially by disabling the memory-core plugin or using environment variables like OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY.

Guidance

Investigate the agent runtime queue and session manager to identify where time is spent during an agent turn, potentially using tracing or debug environments.
Attempt to disable the memory-core plugin entirely, if possible, to see if it affects the latency.
Consider using the OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY environment variable, despite being deprecated, to see if it improves performance.
Review the OpenClaw documentation for tunables related to the agent runtime queue and session manager to optimize their configuration.

Example

No specific code snippet is provided as the issue seems to be related to configuration and plugin management rather than code-level changes.

Notes

The provided information suggests that the issue is not related to the VPS resources or the OpenAI model, as direct OpenAI usage and other applications on the same VPS perform well. The focus should be on optimizing the OpenClaw configuration and plugin management.

Recommendation

Apply a workaround by attempting to disable the memory-core plugin or using the OPENCLAW_DISABLE_PERSISTED_PLUGIN_REGISTRY environment variable to see if it improves the performance, as these are the most direct leads from the provided information.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Direct OpenAI baseline from inside the same container, same model, same prompt: Run 1: 1.73s HTTP:200 Run 2: 0.83s HTTP:200 Run 3: 0.87s HTTP:200 Median: ~0.87s

Expected: chat turn through OpenClaw should complete within a small multiple of the underlying model call (a few seconds of orchestration overhead is reasonable).

#api #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Severe chat latency (30–90s) on Docker VPS while direct OpenAI completes in <1s — OpenClaw 2026.4.26 [1 pull requests, 3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #73540: fix(gateway): resolve tools.effective cold misses synchronously

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING