openclaw - 💡(How to fix) Fix Context budget: framework + tool schemas add ~10K tokens to every prompt [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62196Fetched 2026-04-08 03:07:48
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
closed ×1

Framework instructions and tool JSON schemas add ~10K+ tokens of fixed overhead to every single API call, regardless of conversation complexity. For a typical 31K token prompt, this is one-third of total spend — and it's the same cost whether the user sends "hello" or a complex multi-step request.

Root Cause

Framework instructions and tool JSON schemas add ~10K+ tokens of fixed overhead to every single API call, regardless of conversation complexity. For a typical 31K token prompt, this is one-third of total spend — and it's the same cost whether the user sends "hello" or a complex multi-step request.

Code Example

// pi-embedded-BaSvmUpW.js:173561
function resolvePromptModeForSession(sessionKey) {
    if (!sessionKey) return "full";
    return isSubagentSessionKey(sessionKey) || isCronSessionKey(sessionKey) ? "minimal" : "full";
}

---

{
  "agents": {
    "defaults": {
      "promptMode": "none"
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Framework instructions and tool JSON schemas add ~10K+ tokens of fixed overhead to every single API call, regardless of conversation complexity. For a typical 31K token prompt, this is one-third of total spend — and it's the same cost whether the user sends "hello" or a complex multi-step request.

Breakdown

Full token accounting from a production Slack conversation (Claude Sonnet 4.6, 200K context):

ComponentTokens% of TotalControllable?
User workspace files (SOUL.md, AGENTS.md, TOOLS.md, etc.)~10,30033%Yes
Tool JSON schemas (~29 tools)~7,25023%No
Fresh memory injection (30 memories + 9 conv turns)~6,16520%Yes (plugin config)
Compressed conversation history (15 exchanges)~3,56911%Yes (context engine plugin)
Framework text (safety, tooling, messaging, etc.)~2,3237%No
Skills prompt (12 skills)~1,4795%Yes
Current user prompt~115<1%N/A
Envelope/formatting overhead~3561%No
Total~31,560

The non-controllable overhead is ~9,573 tokens (30%) — tool schemas + framework text + envelope. This is baked into every request.

Cost Impact

At Anthropic's Sonnet 4.6 pricing ($3/M input tokens):

  • ~10K overhead × every message × every agent adds up fast
  • For an agent handling 200 messages/day across channels: ~$6/day just in framework overhead
  • The overhead is amplified by prompt caching misses (cache write is 1.25x input rate)

Requests

1. promptMode as a user-facing config option

The code already supports PromptMode: "full" | "minimal" | "none" in buildAgentSystemPrompt(), and resolvePromptModeForSession() already handles all three modes. But it's hardcoded:

// pi-embedded-BaSvmUpW.js:173561
function resolvePromptModeForSession(sessionKey) {
    if (!sessionKey) return "full";
    return isSubagentSessionKey(sessionKey) || isCronSessionKey(sessionKey) ? "minimal" : "full";
}

Request: expose promptMode as a per-agent config option:

{
  "agents": {
    "defaults": {
      "promptMode": "none"
    }
  }
}

This lets advanced users who have comprehensive SOUL.md files opt out of the framework sections entirely. The "none" path already works — it returns "You are a personal assistant running inside OpenClaw." and lets the user's own files handle everything.

2. Tool schema optimization

29 tools × ~250 tokens each = ~7,250 tokens of JSON Schema sent on every request. Options:

  • Lazy tool loading: Only include tool schemas for tools the agent has actually used in the current session, plus a "discover more tools" meta-tool
  • Schema compression: Tool descriptions are verbose. Many optional parameters could use shorter descriptions or omit descriptions for self-evident fields
  • Tool profiles with schema impact visibility: Show token cost per tool in openclaw tools list so users can make informed decisions about which tools to enable

3. Framework text audit

The 2,323 tokens of framework text include sections that could be trimmed or made conditional:

  • ## Reply Tags (76 tokens) — only needed for channels that support reply tags
  • ## Silent Replies (detailed formatting rules) — could be a single line
  • ## OpenClaw CLI Quick Reference — rarely needed in conversation
  • ## OpenClaw Self-Update — only needed when user asks for updates
  • ## Heartbeats (detailed rules) — only needed for heartbeat runs
  • ## Documentation (links) — static, could be in SOUL.md instead

A "lean" prompt mode that keeps only Safety + Tooling + Workspace + Runtime would save ~800-1,000 tokens.

Environment

  • OpenClaw version: 2026.2.22-2
  • Platform: macOS (darwin arm64)
  • Primary model: Claude Sonnet 4.6 (200K context)
  • Channels: Slack (socket mode), OpenWebUI, direct API
  • Context engine: custom plugin (nightshift-context) with reranker-based compression
  • Memory: memory-lancedb with auto-recall (30 memories) + conversation memory (9 turns)

extent analysis

TL;DR

Exposing promptMode as a user-facing config option and optimizing tool schemas can help reduce the non-controllable overhead of ~9,573 tokens per request.

Guidance

  • Expose promptMode as a per-agent config option to allow advanced users to opt out of framework sections, potentially saving ~2,323 tokens per request.
  • Optimize tool schemas by implementing lazy tool loading, schema compression, or providing tool profiles with schema impact visibility to reduce the ~7,250 tokens of JSON Schema sent on every request.
  • Conduct a framework text audit to trim or make conditional unnecessary sections, potentially saving ~800-1,000 tokens per request.
  • Consider implementing a "lean" prompt mode that keeps only essential sections to reduce overhead.

Example

{
  "agents": {
    "defaults": {
      "promptMode": "none"
    }
  }
}

This example shows how to configure the promptMode as a per-agent config option to opt out of framework sections.

Notes

The effectiveness of these suggestions may vary depending on the specific use case and configuration. Additionally, the implementation of these suggestions may require further modifications to the codebase.

Recommendation

Apply workaround by exposing promptMode as a user-facing config option and optimizing tool schemas to reduce non-controllable overhead. This approach allows for a more targeted reduction of overhead without requiring significant changes to the underlying framework.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING