openclaw - 💡(How to fix) Fix Context budget: framework + tool schemas add ~10K tokens to every prompt [1 participants]

Root Cause

Framework instructions and tool JSON schemas add ~10K+ tokens of fixed overhead to every single API call, regardless of conversation complexity. For a typical 31K token prompt, this is one-third of total spend — and it's the same cost whether the user sends "hello" or a complex multi-step request.

// pi-embedded-BaSvmUpW.js:173561 function resolvePromptModeForSession(sessionKey) { if (!sessionKey) return "full"; return isSubagentSessionKey(sessionKey) || isCronSessionKey(sessionKey) ? "minimal" : "full"; } --- { "agents": { "defaults": { "promptMode": "none" } } }

Summary

Breakdown

Full token accounting from a production Slack conversation (Claude Sonnet 4.6, 200K context):

Component	Tokens	% of Total	Controllable?
User workspace files (SOUL.md, AGENTS.md, TOOLS.md, etc.)	~10,300	33%	Yes
Tool JSON schemas (~29 tools)	~7,250	23%	No
Fresh memory injection (30 memories + 9 conv turns)	~6,165	20%	Yes (plugin config)
Compressed conversation history (15 exchanges)	~3,569	11%	Yes (context engine plugin)
Framework text (safety, tooling, messaging, etc.)	~2,323	7%	No
Skills prompt (12 skills)	~1,479	5%	Yes
Current user prompt	~115	<1%	N/A
Envelope/formatting overhead	~356	1%	No
Total	~31,560

The non-controllable overhead is ~9,573 tokens (30%) — tool schemas + framework text + envelope. This is baked into every request.

Cost Impact

At Anthropic's Sonnet 4.6 pricing ($3/M input tokens):

~10K overhead × every message × every agent adds up fast
For an agent handling 200 messages/day across channels: ~$6/day just in framework overhead
The overhead is amplified by prompt caching misses (cache write is 1.25x input rate)

Requests

1. `promptMode` as a user-facing config option

The code already supports PromptMode: "full" | "minimal" | "none" in buildAgentSystemPrompt(), and resolvePromptModeForSession() already handles all three modes. But it's hardcoded:

// pi-embedded-BaSvmUpW.js:173561
function resolvePromptModeForSession(sessionKey) {
    if (!sessionKey) return "full";
    return isSubagentSessionKey(sessionKey) || isCronSessionKey(sessionKey) ? "minimal" : "full";
}

Request: expose promptMode as a per-agent config option:

{
  "agents": {
    "defaults": {
      "promptMode": "none"
    }
  }
}

This lets advanced users who have comprehensive SOUL.md files opt out of the framework sections entirely. The "none" path already works — it returns "You are a personal assistant running inside OpenClaw." and lets the user's own files handle everything.

2. Tool schema optimization

29 tools × ~250 tokens each = ~7,250 tokens of JSON Schema sent on every request. Options:

Lazy tool loading: Only include tool schemas for tools the agent has actually used in the current session, plus a "discover more tools" meta-tool
Schema compression: Tool descriptions are verbose. Many optional parameters could use shorter descriptions or omit descriptions for self-evident fields
Tool profiles with schema impact visibility: Show token cost per tool in openclaw tools list so users can make informed decisions about which tools to enable

3. Framework text audit

The 2,323 tokens of framework text include sections that could be trimmed or made conditional:

## Reply Tags (76 tokens) — only needed for channels that support reply tags
## Silent Replies (detailed formatting rules) — could be a single line
## OpenClaw CLI Quick Reference — rarely needed in conversation
## OpenClaw Self-Update — only needed when user asks for updates
## Heartbeats (detailed rules) — only needed for heartbeat runs
## Documentation (links) — static, could be in SOUL.md instead

A "lean" prompt mode that keeps only Safety + Tooling + Workspace + Runtime would save ~800-1,000 tokens.

Environment

OpenClaw version: 2026.2.22-2
Platform: macOS (darwin arm64)
Primary model: Claude Sonnet 4.6 (200K context)
Channels: Slack (socket mode), OpenWebUI, direct API
Context engine: custom plugin (nightshift-context) with reranker-based compression
Memory: memory-lancedb with auto-recall (30 memories) + conversation memory (9 turns)

extent analysis

TL;DR

Exposing promptMode as a user-facing config option and optimizing tool schemas can help reduce the non-controllable overhead of ~9,573 tokens per request.

Guidance

Expose promptMode as a per-agent config option to allow advanced users to opt out of framework sections, potentially saving ~2,323 tokens per request.
Optimize tool schemas by implementing lazy tool loading, schema compression, or providing tool profiles with schema impact visibility to reduce the ~7,250 tokens of JSON Schema sent on every request.
Conduct a framework text audit to trim or make conditional unnecessary sections, potentially saving ~800-1,000 tokens per request.
Consider implementing a "lean" prompt mode that keeps only essential sections to reduce overhead.

Example

{
  "agents": {
    "defaults": {
      "promptMode": "none"
    }
  }
}

This example shows how to configure the promptMode as a per-agent config option to opt out of framework sections.

Notes

The effectiveness of these suggestions may vary depending on the specific use case and configuration. Additionally, the implementation of these suggestions may require further modifications to the codebase.

Recommendation

Apply workaround by exposing promptMode as a user-facing config option and optimizing tool schemas to reduce non-controllable overhead. This approach allows for a more targeted reduction of overhead without requiring significant changes to the underlying framework.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Context budget: framework + tool schemas add ~10K tokens to every prompt [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Breakdown

Cost Impact

Requests

1. `promptMode` as a user-facing config option

2. Tool schema optimization

3. Framework text audit

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Context budget: framework + tool schemas add ~10K tokens to every prompt [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Breakdown

Cost Impact

Requests

1. promptMode as a user-facing config option

2. Tool schema optimization

3. Framework text audit

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `promptMode` as a user-facing config option