openclaw - 💡(How to fix) Fix Prompt-assembly overhead scales linearly with tool-schema size; trivial turns pay full cost

openclaw2026-05-12 14:38:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Every agent turn in openclaw appears to prepend the full system prompt and tool schema to the model input, regardless of whether the turn actually needs tools. The reporter of NVIDIA/NemoClaw#2598 measured prompt_eval_count=18,355 tokens for a bare "say hello" prompt on Ollama. Because per-turn latency scales linearly with prompt_eval, this overhead dominates total latency on any backend — it just becomes most visible on fast local inference (DGX Spark with nemotron-3-nano:30b hits P50=10s, max=17s for a no-op turn).

This is a hardware-agnostic structural issue in prompt assembly, not a backend or deployment bug.

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

This is a hardware-agnostic structural issue in prompt assembly, not a backend or deployment bug.

Reproduction

Controlled prompt-length sweep on macOS with Ollama + llama3.2:1b, identical user prompt ("say hello"), identical hardware, varying only the size of injected fake tool-schema tokens:

Configuration	Prompt tokens	Total latency	Slowdown vs. bare
Bare prompt	27	212 ms	1×
+ ~2k schema tokens	2,433	2,980 ms	14×
+ ~10k schema tokens	12,033	22,189 ms	104×
+ ~18k schema tokens (matches observed openclaw)	21,633	36,581 ms	172×

Latency is linear in prompt size. The 18k-token row matches the prompt_eval_count the NemoClaw reporter measured against a real openclaw turn on Brev, confirming the schema is the dominant contributor.

Mechanism

prompt_eval cost is O(prompt_tokens) on every backend tested.
openclaw's per-turn input is dominated by a static tool schema + system message that is sent regardless of whether the user's turn could invoke tools.
Trivial turns ("say hello", short clarifications, acknowledgments) therefore pay full schema cost for zero functional benefit.

What I checked

The prompt-assembly path lives in openclaw's shipped dist/. Relevant artifacts that contribute to the per-turn prefix:

bash-tools.descriptions
openclaw-tools.runtime
system-message
prompts

Together these produce the ~18k-token prefix observed in the wild. NemoClaw pins openclaw 2026.4.24 and ships the upstream image unmodified; it does not own this code path, which is why I'm filing here rather than there.

Suggested directions (pick whichever fits the architecture)

Schema-skip for trivial turns — heuristic or classifier that omits the tool schema when the turn is clearly conversational. Highest impact, requires a fallback path if the model decides mid-turn it needs a tool.
Lazy / on-demand tool registration — send a minimal tool index up front, expand to full schema only when the model signals tool use.
Lite-mode flag — explicit --no-tools or lite: true session mode for known-conversational contexts (greetings, status checks, agent-to-agent handoffs). Lowest-risk; opts in rather than out.
Schema compression — tighten descriptions / dedupe across bash-tools and openclaw-tools registries. Bounded gain but no behavior change.

Context

Downstream report with end-to-end timings on DGX Spark: NVIDIA/NemoClaw#2598
Cross-platform mirror on Brev/NIM: NVIDIA/NemoClaw#2600 (P50 9s, P99 128s — same root cause, different backend tail behavior)
Feature request asking for the same fix: NVIDIA/NemoClaw#3261 ("Lightweight message interceptor hooks for system prompt bypass" — calls out +10k tokens of overhead)
Token count 18,355 for "say hello" measured by the NemoClaw reporter on Brev with nemotron-3-nano:30b
My sweep above was on different hardware and a different model, deliberately, to isolate the prompt-length mechanism from anything Spark- or Nemotron-specific.

Happy to share the sweep script or rerun against a specific openclaw version if useful.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Prompt-assembly overhead scales linearly with tool-schema size; trivial turns pay full cost

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Reproduction

Mechanism

What I checked

Suggested directions (pick whichever fits the architecture)

Context

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Prompt-assembly overhead scales linearly with tool-schema size; trivial turns pay full cost

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Reproduction

Mechanism

What I checked

Suggested directions (pick whichever fits the architecture)

Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING