openclaw - 💡(How to fix) Fix [Bug]: OpenClaw 5.22: ~10 sec per-call inference overhead in infer model run (both --gateway and --local) vs ~1.3 sec direct provider call

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Every model invocation through OpenClaw carries ~10–13 seconds of overhead independent of execution mode or model choice. With agents making 3+ model calls per response, this produces 30–40 second response times for trivial prompts (e.g. "Hi") on a fully-resourced Hostinger VPS template (ghcr.io/hostinger/hvps-openclaw). A direct curl to the same upstream provider with the same model returns in 1.3 sec, confirming the bottleneck is in OpenClaw's inference path, not network, provider, or model.

Root Cause

Every model invocation through OpenClaw carries ~10–13 seconds of overhead independent of execution mode or model choice. With agents making 3+ model calls per response, this produces 30–40 second response times for trivial prompts (e.g. "Hi") on a fully-resourced Hostinger VPS template (ghcr.io/hostinger/hvps-openclaw). A direct curl to the same upstream provider with the same model returns in 1.3 sec, confirming the bottleneck is in OpenClaw's inference path, not network, provider, or model.

Fix Action

Fix / Workaround

Partial mitigation by clearing two plugin warnings (brave plugin ownership fix + oxylabs duplicate install rename) brought --gateway timing down to 10.86 sec, indicating that plugin registry validation is running per-invocation rather than at startup. The remaining ~9.6 sec of overhead per call is unexplained by user-visible config and persists after openclaw doctor --fix shows zero warnings.

Production AI assistant deployed for business operations. Affects all model invocations on the affected version (not a single-feature break). Workarounds available (model swap to faster models, clearing plugin warnings) reduce but do not eliminate the overhead. Not data-loss or security, so not Critical.

Code Example

$ time curl -sS -o /tmp/resp.json -w "\nHTTP %{http_code}\nDNS lookup: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS handshake: %{time_appconnect}s\nFirst byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" -X POST https://api.anthropic.com/v1/messages -H "x-api-key: $KEY" -H "anthropic-version: 2023-06-01" -H "content-type: application/json" -d '{"model":"claude-sonnet-4-6","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'

HTTP 200
DNS lookup: 0.001315s
Connect: 0.008815s
TLS handshake: 0.029779s
First byte: 1.324463s
Total: 1.325297s
real    0m1.340s

$ time docker exec openclaw-b4ba-openclaw-1 openclaw infer model run --prompt "Hi, respond with just OK" --gateway
model.run via gateway
provider: anthropic
model: claude-sonnet-4-6
outputs: 1
OK
real    0m14.476s

$ time docker exec openclaw-b4ba-openclaw-1 openclaw infer model run --prompt "Hi, respond with just OK" --local
model.run via local
provider: anthropic
model: claude-sonnet-4-6
outputs: 1
OK
real    0m15.995s

$ docker exec -u root openclaw-b4ba-openclaw-1 chown -R root:root /data/.openclaw/npm/node_modules/@openclaw/brave-plugin
$ docker exec -u root openclaw-b4ba-openclaw-1 mv /data/.openclaw/extensions/oxylabs-ai-studio-openclaw /data/.openclaw/extensions/oxylabs-ai-studio-openclaw.disabled

$ time docker exec openclaw-b4ba-openclaw-1 openclaw infer model run --prompt "Hi, respond with just OK" --gateway
model.run via gateway
provider: anthropic
model: claude-sonnet-4-6
outputs: 1
OK
real    0m10.859s

2026-05-29T14:27:55.380-04:00 [diagnostic] liveness warning: reasons=event_loop_delay interval=39s eventLoopDelayP99Ms=21.2 eventLoopDelayMaxMs=11014.2 eventLoopUtilization=0.517 cpuCoreRatio=0.528 active=1 waiting=0 queued=2 recentPhases=post-attach.update-check:1ms,sidecars.restart-sentinel:52ms,post-attach.update-sentinel:46ms,sidecars.session-locks:92ms,post-ready.maintenance:359ms,sidecars.model-prewarm:5689ms work=[active=agent:main:main(processing/embedded_run,q=1,age=15s last=embedded_run:started) queued=agent:pat:main(idle,q=2,age=86423s last=embedded_run:ended)]

Config warnings ────────────────────────────────────────────────────────────╮
│                                                                              │
- plugins: plugin brave: blocked plugin candidate: suspicious               │
ownership (/data/.openclaw/npm/node_modules/@openclaw/brave-plugin,│    uid=1000, expected uid=0 or root)- plugins: plugin oxylabs-ai-studio-openclaw: duplicate plugin id           │
│    detected; global plugin will be overridden by global plugin               │
    (/data/.openclaw/npm/node_modules/oxylabs-ai-studio-openclaw/dist/index.  
│  js)│                                                                              │
├──────────────────────────────────────────────────────────────────────────────╯
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Every model invocation through OpenClaw carries ~10–13 seconds of overhead independent of execution mode or model choice. With agents making 3+ model calls per response, this produces 30–40 second response times for trivial prompts (e.g. "Hi") on a fully-resourced Hostinger VPS template (ghcr.io/hostinger/hvps-openclaw). A direct curl to the same upstream provider with the same model returns in 1.3 sec, confirming the bottleneck is in OpenClaw's inference path, not network, provider, or model.

Steps to reproduce

1. In a running OpenClaw container (this example is on the Hostinger VPS template, OpenClaw 2026.5.22),

with a valid ANTHROPIC_API_KEY available, time a direct curl to Anthropic for a trivial prompt:

time curl -sS -o /tmp/resp.json -w "Total: %{time_total}s\n"
-X POST https://api.anthropic.com/v1/messages
-H "x-api-key: $ANTHROPIC_API_KEY"
-H "anthropic-version: 2023-06-01"
-H "content-type: application/json"
-d '{"model":"claude-sonnet-4-6","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'

2. Then time the equivalent call via OpenClaw, both modes:

time openclaw infer model run --prompt "Hi, respond with just OK" --gateway time openclaw infer model run --prompt "Hi, respond with just OK" --local

Expected behavior

A single model invocation through OpenClaw should add minimal latency on top of the upstream provider call — typically in the range of 1–2 seconds of overhead, accounting for auth resolution, request marshalling, response handling, and gateway communication. For a claude-sonnet-4-6 prompt that returns in 1.3 seconds via direct curl to Anthropic, the equivalent call via openclaw infer model run should complete in approximately 2–3 seconds end-to-end, regardless of execution mode (--gateway or --local). This matches the bot's pre-regression behavior: agent responses to trivial Telegram messages completed in under 10 seconds on this same VPS template prior to the auto-updates that advanced OpenClaw from 2026.5.12 → 2026.5.22.

Actual behavior

A single model invocation through OpenClaw on 2026.5.22 takes 14–16 seconds for the same trivial prompt, representing approximately 10–13 seconds of unaccounted overhead per call.

The overhead is present in both --gateway and --local execution modes, ruling out the gateway daemon as the cause:

ModeTimeDirect curl to api.anthropic.com1.33 secopenclaw infer model run --gateway14.48 secopenclaw infer model run --local15.99 sec

Partial mitigation by clearing two plugin warnings (brave plugin ownership fix + oxylabs duplicate install rename) brought --gateway timing down to 10.86 sec, indicating that plugin registry validation is running per-invocation rather than at startup. The remaining ~9.6 sec of overhead per call is unexplained by user-visible config and persists after openclaw doctor --fix shows zero warnings.

Real-world impact: agents in this deployment make 3–4 model calls per response (planning, memory, response generation, optional tool use). At ~10.9 sec per call, this produces 40-second end-to-end response times for trivial Telegram messages like "Hi" — making the bot unusable for the production workload it was deployed for.

OpenClaw version

2026.5.22 (a374c3a)

Operating system

Ubuntu 24.04 LTS (Hostinger VPS, Docker host) Container runs Debian-based Linux (per ghcr.io/hostinger/hvps-openclaw image)

Install method

Hostinger VPS template (managed). Docker image: ghcr.io/hostinger/hvps-openclaw Container name: openclaw-b4ba-openclaw-1 Auto-updates via internal openclaw update cron, last update advanced version from 2026.5.12 → 2026.5.22 without user intervention.

Model

anthropic/claude-sonnet-4-6 (alias "Claude Sonnet 4.6")

Provider / routing chain

Direct: OpenClaw gateway → Anthropic API (api.anthropic.com) No OpenRouter / no proxy / no intermediate model router on the path being timed. Auth: ANTHROPIC_API_KEY environment variable, resolved at request time.

Additional provider/model setup details

  • Config primary set to anthropic/claude-sonnet-4-6 in agents.defaults.model.primary (set by an update; pre-update value was openrouter/anthropic/claude-haiku-4-5).
  • openrouter:default auth profile also configured in /data/.openclaw/agents/main/agent/auth-profiles.json and used separately for embeddings via custom memorySearch.remote.baseUrl=https://openrouter.ai/api/v1/.
  • No custom models.providers.<id> definitions for Anthropic — using OpenClaw's default anthropic adapter.
  • No prompt caching, no streaming-specific flags, no thinking budget set on the test prompts.
  • The same 13-second per-call overhead is reproducible with the direct Anthropic provider via infer model run even when the agent loop and Telegram channel are entirely uninvolved — so this is not a Telegram-layer or agent-orchestration issue, it's in the inference path.

Logs, screenshots, and evidence

$ time curl -sS -o /tmp/resp.json -w "\nHTTP %{http_code}\nDNS lookup: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS handshake: %{time_appconnect}s\nFirst byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" -X POST https://api.anthropic.com/v1/messages -H "x-api-key: $KEY" -H "anthropic-version: 2023-06-01" -H "content-type: application/json" -d '{"model":"claude-sonnet-4-6","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'

HTTP 200
DNS lookup: 0.001315s
Connect: 0.008815s
TLS handshake: 0.029779s
First byte: 1.324463s
Total: 1.325297s
real    0m1.340s

$ time docker exec openclaw-b4ba-openclaw-1 openclaw infer model run --prompt "Hi, respond with just OK" --gateway
model.run via gateway
provider: anthropic
model: claude-sonnet-4-6
outputs: 1
OK
real    0m14.476s

$ time docker exec openclaw-b4ba-openclaw-1 openclaw infer model run --prompt "Hi, respond with just OK" --local
model.run via local
provider: anthropic
model: claude-sonnet-4-6
outputs: 1
OK
real    0m15.995s

$ docker exec -u root openclaw-b4ba-openclaw-1 chown -R root:root /data/.openclaw/npm/node_modules/@openclaw/brave-plugin
$ docker exec -u root openclaw-b4ba-openclaw-1 mv /data/.openclaw/extensions/oxylabs-ai-studio-openclaw /data/.openclaw/extensions/oxylabs-ai-studio-openclaw.disabled

$ time docker exec openclaw-b4ba-openclaw-1 openclaw infer model run --prompt "Hi, respond with just OK" --gateway
model.run via gateway
provider: anthropic
model: claude-sonnet-4-6
outputs: 1
OK
real    0m10.859s

2026-05-29T14:27:55.380-04:00 [diagnostic] liveness warning: reasons=event_loop_delay interval=39s eventLoopDelayP99Ms=21.2 eventLoopDelayMaxMs=11014.2 eventLoopUtilization=0.517 cpuCoreRatio=0.528 active=1 waiting=0 queued=2 recentPhases=post-attach.update-check:1ms,sidecars.restart-sentinel:52ms,post-attach.update-sentinel:46ms,sidecars.session-locks:92ms,post-ready.maintenance:359ms,sidecars.model-prewarm:5689ms work=[active=agent:main:main(processing/embedded_run,q=1,age=15s last=embedded_run:started) queued=agent:pat:main(idle,q=2,age=86423s last=embedded_run:ended)]

◇  Config warnings ────────────────────────────────────────────────────────────╮
│                                                                              │
│  - plugins: plugin brave: blocked plugin candidate: suspicious               │
│    ownership (/data/.openclaw/npm/node_modules/@openclaw/brave-plugin,       │
uid=1000, expected uid=0 or root)│  - plugins: plugin oxylabs-ai-studio-openclaw: duplicate plugin id│    detected; global plugin will be overridden by global plugin               │
(/data/.openclaw/npm/node_modules/oxylabs-ai-studio-openclaw/dist/index.  │
│  js)│                                                                              │
├──────────────────────────────────────────────────────────────────────────────╯

Impact and severity

No response

Additional information

Production AI assistant deployed for business operations. Affects all model invocations on the affected version (not a single-feature break). Workarounds available (model swap to faster models, clearing plugin warnings) reduce but do not eliminate the overhead. Not data-loss or security, so not Critical.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A single model invocation through OpenClaw should add minimal latency on top of the upstream provider call — typically in the range of 1–2 seconds of overhead, accounting for auth resolution, request marshalling, response handling, and gateway communication. For a claude-sonnet-4-6 prompt that returns in 1.3 seconds via direct curl to Anthropic, the equivalent call via openclaw infer model run should complete in approximately 2–3 seconds end-to-end, regardless of execution mode (--gateway or --local). This matches the bot's pre-regression behavior: agent responses to trivial Telegram messages completed in under 10 seconds on this same VPS template prior to the auto-updates that advanced OpenClaw from 2026.5.12 → 2026.5.22.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: OpenClaw 5.22: ~10 sec per-call inference overhead in infer model run (both --gateway and --local) vs ~1.3 sec direct provider call