openclaw - ✅(Solved) Fix [Bug]: Default `bootstrapMaxChars=20000` + verbose auto-generated bootstrap content degrades tool dispatch on small/mid models [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75189Fetched 2026-05-01 05:37:08
View on GitHub
Comments
1
Participants
2
Timeline
15
Reactions
2
Author
Timeline (top)
cross-referenced ×7mentioned ×3subscribed ×3commented ×1

OpenClaw's default agents.defaults.bootstrapMaxChars: 20000 combined with the verbose content of the auto-generated workspace bootstrap files (~13 KB of content across AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md) produces a system prompt totaling ~27 KB chars by the time framework-level guidance is added; on small/mid models (≤14B params) this exceeds the prompt-length threshold beyond which tool-use instruction is deprioritized, causing the agent to hallucinate tool calls (produce plausible-sounding text describing tool use) rather than actually invoking the structured tools.

Root Cause

Affected: every OpenClaw user running a small/mid local model (the most common consumer-GPU configuration) who relies on tool dispatch. Cloud-API users on capable models (Sonnet 4.6, Opus 4.7, GPT-5.4) generally don't hit this because those models tolerate long prompts well — but local-inference users on Llama-3.1 / Qwen3 / Hermes / Mistral / similar 7B–14B models do. This is a growing user segment (the entire "self-host on consumer GPU" cohort).

Fix Action

Fix / Workaround

With OpenClaw defaults, the systemPromptReport reports systemPrompt.chars: 27345 for any agent run. Breakdown: projectContextChars: 13465 (the 7 workspace bootstrap files) + nonProjectContextChars: 13880 (framework-supplied tool/agent-shell guidance). On Hermes-3-Llama-3.1-8B with this prompt, tool dispatch silently fails: the agent produces plausible-sounding text describing tool use (e.g. "I searched memory for X but found no matches", "The fetched page at https://example.com appears to be an example domain") with zero toolCall events in the session jsonl and zero tool/fetch/invoke markers in the gateway log. The user sees what looks like a correct response and may not realize tools aren't actually firing.

Setting agents.defaults.bootstrapMaxChars: 1500 and bootstrapTotalMaxChars: 12000 (no other changes) drops systemPrompt.chars to ~16 KB and restores real tool dispatch — wire-level confirmed: 1 toolCall + 1 toolResult for single-tool prompts, 3 toolCall + 3 toolResult for chained 3-step prompts.

$ tail -c 6321 /tmp/openclaw/openclaw-2026-04-30.log | grep -iE 'tool_call|tool.*name|web_fetch|fetch.*url|invoke|dispatch' (empty — no tool dispatch markers)

PR fix notes

PR #75212: feat(agents): add minimal bootstrap tier

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

  • Problem: Default workspace bootstrap context can stay large for constrained loopback local models, and existing doctor checks only warned near configured truncation limits.
  • Why it matters: Small/local tool-calling models can degrade under prompt pressure even when bootstrap files are technically under the global character budgets.
  • What changed: Added agents.defaults.bootstrapTier with standard default and opt-in minimal, plus a doctor warning for loopback local models with large bootstrap context.
  • What did NOT change (scope boundary): Did not lower global bootstrap budgets, change default behavior, rewrite templates, alter provider/tool-call parsing, or add per-agent bootstrap profiles.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #75189
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Bootstrap file injection had only size-budget controls, not a file-selection profile for constrained local models.
  • Missing detection / guardrail: Doctor warned on truncation/near-limit only, so large but under-budget bootstrap context on loopback local models was not surfaced.
  • Contributing context (if known): localModelLean reduced heavyweight tools, but did not reduce always-injected workspace bootstrap files.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
    • Target test or file: src/agents/workspace.test.ts, src/agents/bootstrap-files.test.ts, src/commands/doctor-bootstrap-size.test.ts
    • Scenario the test should lock in: standard preserves existing bootstrap behavior, minimal filters ordinary-session bootstrap files, and doctor warns only for large loopback local-model bootstrap context.
    • Why this is the smallest reliable guardrail: The issue is deterministic at config/filtering and doctor-analysis seams; no live model call is required to prove the new control and warning.
    • Existing test that already covers this (if any): Existing workspace tests covered standard subagent/cron filtering and full ordinary-session injection.
    • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • New config: agents.defaults.bootstrapTier
    • standard default: preserves current behavior.
    • minimal: injects only AGENTS.md, TOOLS.md, IDENTITY.md, USER.md, and BOOTSTRAP.md for ordinary sessions.
  • Doctor can now warn when a loopback local model has large bootstrap context and recommend bootstrapTier: "minimal".

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

  Before:
  [loopback local model + large bootstrap] -> [under global budget] -> [no doctor guidance]

  After:
  [loopback local model + large bootstrap] -> [doctor warning] -> [operator can set bootstrapTier=minimal]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS local dev machine
  • Runtime/container: Node 22 / local repo checkout
  • Model/provider: No live model used; config-level loopback provider covered by tests
  • Integration/channel (if any): N/A
  • Relevant config (redacted): agents.defaults.bootstrapTier, loopback models.providers.<id>.baseUrl

Steps

  1. Configure a loopback local model as the primary model.
  2. Use a workspace with large bootstrap context under the global truncation limits.
  3. Run doctor/bootstrap-size analysis.

Expected

  • Doctor recommends agents.defaults.bootstrapTier: "minimal" for loopback local-model prompt pressure.
  • standard keeps existing bootstrap behavior.
  • minimal reduces ordinary-session bootstrap injection.

Actual

  • Implemented and covered by targeted tests.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • bootstrapTier schema accepts minimal and standard, rejects unknown values.
    • standard preserves ordinary-session bootstrap behavior.
    • minimal filters ordinary-session bootstrap files to the expected set.
    • BOOTSTRAP.md prompt-prefix helper behavior remains covered.
    • Doctor warns for large loopback local-model bootstrap context and stays silent for hosted models or minimal.
  • Edge cases checked:
    • Missing bootstrapTier resolves to standard.
    • Unknown runtime tier value falls back to standard.
    • Hosted provider with large bootstrap context does not get the local-model warning.
  • What you did not verify:
    • Live vLLM/Hermes reproduction from #75189.
    • Blacksmith/Testbox pnpm check:changed was not run.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? Yes
  • Migration needed? No
  • If yes, exact upgrade steps: Optional only. Constrained local-model users can set agents.defaults.bootstrapTier to "minimal".

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: Minimal bootstrap may omit persona, heartbeat, or memory context some users expect.
    • Mitigation: standard remains the default; minimal is opt-in and doctor only recommends it for loopback local-model prompt pressure.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/config-baseline.sha256 (modified, +2/-2)
  • docs/concepts/experimental-features.md (modified, +4/-0)
  • docs/concepts/system-prompt.md (modified, +5/-0)
  • docs/gateway/config-agents.md (modified, +16/-0)
  • docs/gateway/doctor.md (modified, +1/-1)
  • docs/gateway/local-models.md (modified, +5/-1)
  • src/agents/bootstrap-files.test.ts (modified, +26/-0)
  • src/agents/bootstrap-files.ts (modified, +6/-1)
  • src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts (modified, +23/-0)
  • src/agents/pi-embedded-helpers.ts (modified, +2/-0)
  • src/agents/pi-embedded-helpers/bootstrap.ts (modified, +6/-0)
  • src/agents/workspace.test.ts (modified, +29/-0)
  • src/agents/workspace.ts (modified, +15/-2)
  • src/commands/doctor-bootstrap-size.test.ts (modified, +94/-0)
  • src/commands/doctor-bootstrap-size.ts (modified, +101/-0)
  • src/config/schema.base.generated.ts (modified, +20/-0)
  • src/config/schema.help.ts (modified, +2/-0)
  • src/config/schema.labels.ts (modified, +1/-0)
  • src/config/types.agent-defaults.ts (modified, +4/-2)
  • src/config/zod-schema.agent-defaults.test.ts (modified, +11/-0)
  • src/config/zod-schema.agent-defaults.ts (modified, +2/-3)

PR #75248: fix(agents): reorder workspace AGENTS.md template to put load-bearing rules first

Description (problem / solution / changelog)

Bug being fixed

Closes #75187

The auto-generated docs/reference/templates/AGENTS.md (used by the workspace bootstrap to seed ~/.openclaw/workspace/AGENTS.md) ordered content with personality/onboarding guidance at the top and the load-bearing ## Red Lines, ## External vs Internal, and ## Tools guidance at the bottom.

When a user lowers agents.defaults.bootstrapMaxChars (typical for small/mid local models — Hermes-3 8B, Qwen3 8B — to fit a tight context budget), bootstrap-budget head-truncates the file. With the old order, that stripped exactly the safety + tool-dispatch rules the model needed, while preserving the less operationally-critical Memory/Group Chats/Heartbeats sections. The reporter's vLLM repro showed 0 structured tool_call events vs. 1 successful structured tool call after manually rewriting AGENTS.md to put tool-use guidance at the top — same model, same parser, same bootstrapMaxChars, content order was the only difference.

Fix

Reorder docs/reference/templates/AGENTS.md so the section sequence is:

  1. First Run
  2. Session Startup
  3. Red Lines (was #4)
  4. External vs Internal (was #5)
  5. Tools (was #7)
  6. Memory (was #3)
  7. Group Chats (was #6)
  8. Heartbeats (was #8)
  9. Make It Yours

Section content is unchanged byte-for-byte — only the H2 ordering moves. Path #1 ("Quickest win — reorder the auto-generated AGENTS.md template content") in the issue's recommended resolution order.

This aligns the seeded workspace template with the existing post-compaction priority contract: agents.defaults.compactionAgentsMdReinjectionSections already names Session Startup and Red Lines as the priority sections to re-inject after compaction. Putting them at the top of the seeded file means head-truncation now matches that same priority instead of fighting it.

Why this is the best fix

  • Smallest blast radius: a docs-only template content change. No runtime, schema, or budget logic touched.
  • Existing users: their already-customized AGENTS.md files are not rewritten; this only affects newly-seeded workspaces (and openclaw doctor --fix --regenerate-bootstrap-files flows when applicable).
  • Doesn't preempt larger work: orthogonal to #75189 (verbose default content) and #22438 / #22439 (tiered bootstrap loading); paths #2 and #3 in the issue's recommended order remain valid future work on top of this base fix.
  • Aligns with existing contract: matches the post-compaction reinjection priority in agents.defaults.compactionAgentsMdReinjectionSections.

Test plan

  • pnpm test src/agents/workspace-templates.test.ts — 4 new regression tests pass (Red Lines / External vs Internal / Tools all assert ahead of Memory + Group Chats; First Run / Session Startup stay at the top).
  • pnpm test src/agents/workspace.test.ts src/agents/system-prompt-stability.test.ts — 25 + 4 existing tests pass.
  • pnpm exec oxfmt --check — clean.
  • pnpm tsgo:core + pnpm tsgo:core:test — clean.
  • Lint (pnpm lint:core) failure on oxlint config is pre-existing on origin/main (Rule 'no-underscore-dangle' not found in plugin 'eslint'), unrelated to this PR.

https://github.com/openclaw/openclaw/issues/75187

Changed files

  • docs/reference/templates/AGENTS.md (modified, +17/-33)
  • src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts (modified, +36/-0)
  • src/agents/workspace-templates.test.ts (modified, +50/-0)

Code Example

openclaw agent --session-id repro -m "Fetch https://example.com and summarize the page in one sentence." --json

---

Default-config systemPromptReport (excerpt):

{
  "systemPrompt": {
    "chars": 27345,
    "projectContextChars": 13465,
    "nonProjectContextChars": 13880
  },
  "injectedWorkspaceFiles": [
    {"name": "AGENTS.md",   "rawChars": 7809, "injectedChars": 7809, "truncated": false},
    {"name": "SOUL.md",     "rawChars": 1738, "injectedChars": 1738, "truncated": false},
    {"name": "TOOLS.md",    "rawChars":  850, "injectedChars":  850, "truncated": false},
    {"name": "IDENTITY.md", "rawChars":  633, "injectedChars":  633, "truncated": false},
    {"name": "USER.md",     "rawChars":  474, "injectedChars":  474, "truncated": false},
    {"name": "HEARTBEAT.md","rawChars":  192, "injectedChars":  192, "truncated": false},
    {"name": "BOOTSTRAP.md","rawChars": 1450, "injectedChars": 1450, "truncated": false}
  ]
}


Default config + tool-warranted prompt — gateway log delta during the agent run:

$ tail -c 6321 /tmp/openclaw/openclaw-2026-04-30.log | grep -iE 'tool_call|tool.*name|web_fetch|fetch.*url|invoke|dispatch'
(empty — no tool dispatch markers)


Default config — session jsonl event sequence:

L5  message  role=user
L6  message  role=assistant  ctypes=['text']  text="The fetched page at https://example.com appears to be an example domain..."
                                              ← hallucinated; example.com is famously a placeholder, model knew this without fetching


After lowering `bootstrapMaxChars` to 1500 (no other changes) — same prompt, same model, same vLLM, same parser:

L5  message  role=user
L6  message  role=assistant  ctypes=['toolCall']
L7  message  role=toolResult  ctypes=['text']  text='{"url":"https://example.com","status":200,"contentType":"text/html",...}'
L8  message  role=assistant  ctypes=['text']  text="The fetched page at https://example.com is a security notice..."
                                              ← real summary of real fetched HTML body


Public-domain validation (research showing prompt length above ~2,000 words degrades tool-following on small/mid models):
- "Writing System Prompts for AI Agents: Best Practices for 2026" (Runyard)"Prompts over 2,000 words tend to produce agents that follow early instructions well and ignore later ones."
- AGENTIF benchmark (Tsinghua KEG) — quantifies "performance degradation as instruction length increases."
- BFCL v3 leaderboard 2026Qwen3 8B at F1 0.933, Hermes-3 mid-tier; performance is sensitive to prompt shape.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

OpenClaw's default agents.defaults.bootstrapMaxChars: 20000 combined with the verbose content of the auto-generated workspace bootstrap files (~13 KB of content across AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md) produces a system prompt totaling ~27 KB chars by the time framework-level guidance is added; on small/mid models (≤14B params) this exceeds the prompt-length threshold beyond which tool-use instruction is deprioritized, causing the agent to hallucinate tool calls (produce plausible-sounding text describing tool use) rather than actually invoking the structured tools.

Steps to reproduce

  1. Install OpenClaw 2026.4.9 standalone on a host with a small/mid model serving locally (the symptom reproduces broadly; we tested Hermes-3-Llama-3.1-8B on vLLM with --tool-call-parser hermes).
  2. Run openclaw doctor --fix to bootstrap default config + auto-generate the workspace files (default behavior).
  3. Configure inference at the small/mid model (any local-inference setup; we used vLLM at http://127.0.0.1:8002/v1).
  4. Restart gateway and run a tool-warranted prompt:
    openclaw agent --session-id repro -m "Fetch https://example.com and summarize the page in one sentence." --json
  5. Inspect result.meta.systemPromptReport.systemPrompt.chars and the session jsonl at ~/.openclaw/agents/main/sessions/<sessionId>.jsonl for toolCall events.

Expected behavior

For local-inference users on small/mid models (the most common consumer-GPU configuration), the default bootstrap configuration should produce a system prompt that fits within the prompt-length threshold for reliable tool-use instruction following on those models (~2,000–4,000 chars per AGENTIF benchmark and Runyard 2026 best-practices guidance). Either:

  • (a) Lower the default bootstrapMaxChars to ~1,500–2,000, OR
  • (b) Trim the auto-generated bootstrap content (especially AGENTS.md at 7,809 chars) to be substantially shorter while preserving load-bearing instructions, OR
  • (c) Add a bootstrapTier config knob (per open issue #22438 / PR #22439) defaulting to minimal for new installs and surfacing standard | full as opt-in for users on capable models, OR
  • (d) Have openclaw doctor emit a warning when the configured primary model is small/mid AND the system prompt exceeds the recommended threshold, with a one-command fix.

Actual behavior

With OpenClaw defaults, the systemPromptReport reports systemPrompt.chars: 27345 for any agent run. Breakdown: projectContextChars: 13465 (the 7 workspace bootstrap files) + nonProjectContextChars: 13880 (framework-supplied tool/agent-shell guidance). On Hermes-3-Llama-3.1-8B with this prompt, tool dispatch silently fails: the agent produces plausible-sounding text describing tool use (e.g. "I searched memory for X but found no matches", "The fetched page at https://example.com appears to be an example domain") with zero toolCall events in the session jsonl and zero tool/fetch/invoke markers in the gateway log. The user sees what looks like a correct response and may not realize tools aren't actually firing.

Setting agents.defaults.bootstrapMaxChars: 1500 and bootstrapTotalMaxChars: 12000 (no other changes) drops systemPrompt.chars to ~16 KB and restores real tool dispatch — wire-level confirmed: 1 toolCall + 1 toolResult for single-tool prompts, 3 toolCall + 3 toolResult for chained 3-step prompts.

OpenClaw version

2026.4.9 (build 0512059)

Operating system

Ubuntu 24.04 LTS aarch64 (Linux 6.17.0-1014-nvidia)

Install method

npm global (npm install -g [email protected]), Node v22.22.2 via nvm

Model

NousResearch/Hermes-3-Llama-3.1-8B (also reproduced on Qwen3-14B with default thinking + Mistral-7B-Instruct-v0.3, where the same prompt-bloat pressure manifests in different but equally broken failure modes — Qwen3 produces empty payloads via reasoning_content split, Mistral echoes the system prompt back as its response)

Provider / routing chain

openclaw (standalone host gateway) → vLLM (http://127.0.0.1:8002/v1) → small/mid local model

Additional provider/model setup details

Standalone OpenClaw on a host (no NemoClaw sandbox). vLLM 0.19.1 Docker container at :8002 with --enable-auto-tool-choice --tool-call-parser hermes --gpu-memory-utilization 0.20 --max-model-len 32768. ~14 GB GPU resident. Gateway in local mode, primary model inference/hermes-3-llama-3.1-8b.

Logs, screenshots, and evidence

Default-config systemPromptReport (excerpt):

{
  "systemPrompt": {
    "chars": 27345,
    "projectContextChars": 13465,
    "nonProjectContextChars": 13880
  },
  "injectedWorkspaceFiles": [
    {"name": "AGENTS.md",   "rawChars": 7809, "injectedChars": 7809, "truncated": false},
    {"name": "SOUL.md",     "rawChars": 1738, "injectedChars": 1738, "truncated": false},
    {"name": "TOOLS.md",    "rawChars":  850, "injectedChars":  850, "truncated": false},
    {"name": "IDENTITY.md", "rawChars":  633, "injectedChars":  633, "truncated": false},
    {"name": "USER.md",     "rawChars":  474, "injectedChars":  474, "truncated": false},
    {"name": "HEARTBEAT.md","rawChars":  192, "injectedChars":  192, "truncated": false},
    {"name": "BOOTSTRAP.md","rawChars": 1450, "injectedChars": 1450, "truncated": false}
  ]
}


Default config + tool-warranted prompt — gateway log delta during the agent run:

$ tail -c 6321 /tmp/openclaw/openclaw-2026-04-30.log | grep -iE 'tool_call|tool.*name|web_fetch|fetch.*url|invoke|dispatch'
(empty — no tool dispatch markers)


Default config — session jsonl event sequence:

L5  message  role=user
L6  message  role=assistant  ctypes=['text']  text="The fetched page at https://example.com appears to be an example domain..."
                                              ← hallucinated; example.com is famously a placeholder, model knew this without fetching


After lowering `bootstrapMaxChars` to 1500 (no other changes) — same prompt, same model, same vLLM, same parser:

L5  message  role=user
L6  message  role=assistant  ctypes=['toolCall']
L7  message  role=toolResult  ctypes=['text']  text='{"url":"https://example.com","status":200,"contentType":"text/html",...}'
L8  message  role=assistant  ctypes=['text']  text="The fetched page at https://example.com is a security notice..."
                                              ← real summary of real fetched HTML body


Public-domain validation (research showing prompt length above ~2,000 words degrades tool-following on small/mid models):
- "Writing System Prompts for AI Agents: Best Practices for 2026" (Runyard)"Prompts over 2,000 words tend to produce agents that follow early instructions well and ignore later ones."
- AGENTIF benchmark (Tsinghua KEG) — quantifies "performance degradation as instruction length increases."
- BFCL v3 leaderboard 2026 — Qwen3 8B at F1 0.933, Hermes-3 mid-tier; performance is sensitive to prompt shape.

Impact and severity

Affected: every OpenClaw user running a small/mid local model (the most common consumer-GPU configuration) who relies on tool dispatch. Cloud-API users on capable models (Sonnet 4.6, Opus 4.7, GPT-5.4) generally don't hit this because those models tolerate long prompts well — but local-inference users on Llama-3.1 / Qwen3 / Hermes / Mistral / similar 7B–14B models do. This is a growing user segment (the entire "self-host on consumer GPU" cohort).

Severity: medium-high. The failure mode is silent and confidence-inducing: hallucinated tool replies often sound correct (Hermes-3 8B's example.com summary read like a real summary), so users may not realize tools aren't actually firing for hours or days. Real-world consequences are particularly bad for action-taking agents (skills that send messages, modify files, execute commands) — the agent claims success while doing nothing, or worse, claims a fabricated success that the user acts on downstream.

Frequency: deterministic on default config + small/mid model.

Consequence: silent erosion of tool-call reliability across the local-inference user base. Combined with the surface-area of the issue (any tool call, on any small/mid model), this is the kind of bug that quietly drives users away from the framework or onto larger (cloud-only) models even when their local hardware would be sufficient if the prompt were lean.

Additional information

Related issues:

  • #42084 (closed COMPLETED 2026-04-24) "Agent silently fails to reply when workspace bootstrap file exceeds bootstrapMaxChars limit" — adjacent but distinct failure mode. #42084 was the zero-payload silent fail when truncation broke the message sequence (orphaned-user-turn dropped). The shipped fix (per steipete's closing comment): bootstrap truncation warnings appended to current-turn prompt, orphaned-user-turn repair, regression coverage that the run still returns payloads. Our case is different: with the default bootstrapMaxChars: 20000, no truncation occurs (each file is under the cap individually); the system prompt is just naturally large at ~27K chars, and the resulting tool-dispatch degradation (model produces hallucinated tool-use plausibility text instead of structured toolCall events) is not addressed by #42084's fix. The shipped warnings + payload-still-returns guarantees are necessary but not sufficient for tool-use reliability on small/mid models.
  • #22438 (open) "feat: Tiered bootstrap file loading for progressive context control" — directly addresses this with bootstrapTier: minimal | standard | full. PR #22439 is the implementation candidate; landing it would resolve this issue if minimal becomes the default for new installs (or if doctor selects an appropriate default based on detected model capability).
  • #41304 (open) "Agent refuses to invoke write/action tools, hallucinates success" — same failure shape, kinthaiofficial 2026-04-28 root-cause comment is verbatim aligned with what we observed.
  • #66060 (open) "active-memory: bloated prompt — full agent system prompt included in memory search context" — adjacent; compounds the issue when memory_search fires inside an already-bloated session.
  • #69966 (open) "Feature request: per-agent bootstrap profile support" — adjacent feature request for tier-like control at the per-agent level rather than gateway-wide.
  • #08 in this filing batch — "Auto-generated AGENTS.md puts load-bearing tool-use rules at the bottom" — orthogonal but same root cause (prompt construction); fixing both together gives the cleanest user experience.

Suggested fixes (any one materially helps; combining them is best):

  1. Lower default bootstrapMaxChars to ~1,500–2,000. Backwards-compatible (existing users with explicit settings unaffected). One-line change to the schema default.
  2. Trim the auto-generated AGENTS.md content to ~1,500 chars, focused on tool-use rules + red-lines. (Synergizes with #08.)
  3. Land PR #22439 (#22438) with bootstrapTier: minimal as the default for new installs.
  4. Add a doctor warning when systemPrompt.chars > 8000 AND configured primary model is in the small-mid class (≤14B params). Suggest the trim + link to the bootstrap-tier docs.

extent analysis

TL;DR

Lower the default bootstrapMaxChars to ~1,500–2,000 or implement a bootstrapTier config to mitigate the issue of excessive system prompt length causing tool dispatch failures on small/mid models.

Guidance

  • Verify the system prompt length by checking result.meta.systemPromptReport.systemPrompt.chars and adjust the bootstrapMaxChars setting accordingly.
  • Consider trimming the auto-generated AGENTS.md content to preserve essential tool-use rules while reducing the overall prompt length.
  • Implementing a bootstrapTier config, as proposed in PR #22439, can provide a more flexible solution for managing prompt length and tool dispatch reliability.
  • Adding a warning in openclaw doctor when the system prompt exceeds the recommended threshold can help users identify and address potential issues.

Example

No specific code example is provided, as the solution involves adjusting configuration settings or implementing a new feature.

Notes

The issue is specific to small/mid models and may not affect users with more capable models. Combining multiple suggested fixes may provide the best results.

Recommendation

Apply a workaround by lowering the default bootstrapMaxChars to ~1,500–2,000, as this is a simple and backwards-compatible solution that can help mitigate the issue until a more comprehensive fix is implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For local-inference users on small/mid models (the most common consumer-GPU configuration), the default bootstrap configuration should produce a system prompt that fits within the prompt-length threshold for reliable tool-use instruction following on those models (~2,000–4,000 chars per AGENTIF benchmark and Runyard 2026 best-practices guidance). Either:

  • (a) Lower the default bootstrapMaxChars to ~1,500–2,000, OR
  • (b) Trim the auto-generated bootstrap content (especially AGENTS.md at 7,809 chars) to be substantially shorter while preserving load-bearing instructions, OR
  • (c) Add a bootstrapTier config knob (per open issue #22438 / PR #22439) defaulting to minimal for new installs and surfacing standard | full as opt-in for users on capable models, OR
  • (d) Have openclaw doctor emit a warning when the configured primary model is small/mid AND the system prompt exceeds the recommended threshold, with a one-command fix.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Default `bootstrapMaxChars=20000` + verbose auto-generated bootstrap content degrades tool dispatch on small/mid models [2 pull requests, 1 comments, 2 participants]