openclaw - ✅(Solved) Fix Subagent relay turns force full bootstrap re-injection, invalidating prompt cache [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64346Fetched 2026-04-11 06:15:20
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

Root Cause

File: dist/pi-embedded-CNTNdlGw.js Lines: 31273–31274

const isContinuationTurn = resolveContextInjectionMode(params.config) === "continuation-skip"
    && params.bootstrapContextRunKind !== "heartbeat"
    && await hasCompletedBootstrapTurn(params.sessionFile);

const shouldRecordCompletedBootstrapTurn = !isContinuationTurn
    && params.bootstrapContextMode !== "lightweight"
    && params.bootstrapContextRunKind !== "heartbeat";

The relay turn that processes a subagent's task_completion event is invoked with bootstrapContextRunKind: "default". There is no "relay" case, and no code path that detects "this incoming event is a subagent completion that should inherit the parent session's existing context rather than re-inject bootstrap."

As a result:

  1. isContinuationTurn evaluates to false even though the relay follows an already-completed bootstrap turn (because the continuation-skip workaround only fires if the config opts in, AND the subagent event is treated as a new "user message" that requires full-bootstrap)
  2. shouldRecordCompletedBootstrapTurn evaluates to true
  3. A openclaw:bootstrap-context:full custom event fires
  4. The context engine rebuilds the full bootstrap prefix for the relay turn, invalidating the prompt cache

Confirmation in session trace (/path/to/agent/main/sessions/<id>.jsonl):

{"type":"custom","customType":"openclaw.cache-ttl",...}
{"type":"custom","customType":"openclaw:bootstrap-context:full",...}

These two events fire in sequence right before every relay turn, not on any other turn type.

Fix Action

Fix / Workaround

As a result:

  1. isContinuationTurn evaluates to false even though the relay follows an already-completed bootstrap turn (because the continuation-skip workaround only fires if the config opts in, AND the subagent event is treated as a new "user message" that requires full-bootstrap)
  2. shouldRecordCompletedBootstrapTurn evaluates to true
  3. A openclaw:bootstrap-context:full custom event fires
  4. The context engine rebuilds the full bootstrap prefix for the relay turn, invalidating the prompt cache

Partial workaround (what I've tried)

PR fix notes

PR #64468: fix(agents): persist bootstrap marker after clean sessions_yield

Description (problem / solution / changelog)

Summary

  • Problem: clean sessions_yield turns skipped persisting openclaw:bootstrap-context:full, so continuation-skip could miss bootstrap completion and trigger full reinjection on follow-up relay/user turns.
  • Why it matters: unnecessary prompt-cache invalidation and avoidable token/cost overhead on subagent relay flows.
  • What changed: extracted marker persistence gating into shouldPersistCompletedBootstrapTurn(...) and allowed clean yield exits while keeping prompt/abort/compaction safety guards.
  • What did NOT change (scope boundary): no new run kinds, no relay transport/protocol changes, no compaction policy changes.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #64346
  • Related #64347
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: marker persistence was blocked by yieldAborted, even when the abort was an intentional, clean sessions_yield handoff.
  • Missing detection / guardrail: no explicit unit guard around marker persistence semantics for clean yield exits.
  • Contributing context (if known): continuation-skip relies on completed-marker presence in session history.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/pi-embedded-runner/run/attempt.spawn-workspace.bootstrap-marker.test.ts
  • Scenario the test should lock in: marker persists for clean sessions_yield exits and remains blocked for prompt/abort/compaction failure modes.
  • Why this is the smallest reliable guardrail: the regression is in the marker gate decision itself.
  • Existing test that already covers this (if any): none.
  • If no new test is added, why not: N/A

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Human Verification (required)

  • Verified scenarios:
    • pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.bootstrap-marker.test.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-injection.test.ts
    • New marker-gate assertions pass.
    • Existing context-injection tests still pass.
  • Edge cases checked:
    • prompt error
    • aborted run
    • compaction timeout
    • compaction occurred during attempt
  • What you did not verify:
    • live end-to-end relay flow against a real provider session.

Risks and Mitigations

  • Risk:
    • Marker could persist on a yield exit that is not continuation-safe.
    • Mitigation:
      • Existing blocks are retained for prompt errors, hard aborts, compaction timeout, and compaction-occurred attempts.
      • Added focused unit coverage for the gate contract.

Changed files

  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.bootstrap-marker.test.ts (added, +72/-0)
  • src/agents/pi-embedded-runner/run/attempt.thread-helpers.ts (modified, +16/-0)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +8/-11)

Code Example

cacheRead:  0
cacheWrite: ~25,00040,000 tokens

---

const isContinuationTurn = resolveContextInjectionMode(params.config) === "continuation-skip"
    && params.bootstrapContextRunKind !== "heartbeat"
    && await hasCompletedBootstrapTurn(params.sessionFile);

const shouldRecordCompletedBootstrapTurn = !isContinuationTurn
    && params.bootstrapContextMode !== "lightweight"
    && params.bootstrapContextRunKind !== "heartbeat";

---

{"type":"custom","customType":"openclaw.cache-ttl",...}
{"type":"custom","customType":"openclaw:bootstrap-context:full",...}

---

// dist/pi-embedded-CNTNdlGw.js:31273
const isContinuationTurn = 
    (resolveContextInjectionMode(params.config) === "continuation-skip" 
     || params.bootstrapContextRunKind === "relay")
    && params.bootstrapContextRunKind !== "heartbeat"
    && await hasCompletedBootstrapTurn(params.sessionFile);

// dist/pi-embedded-CNTNdlGw.js:31274
const shouldRecordCompletedBootstrapTurn = !isContinuationTurn
    && params.bootstrapContextMode !== "lightweight"
    && params.bootstrapContextRunKind !== "heartbeat"
    && params.bootstrapContextRunKind !== "relay";
RAW_BUFFERClick to expand / collapse

Subagent relay turns pay full cold-cache bootstrap re-injection tax

OpenClaw version: 2026.4.8 (9ece252) Severity: Cost / performance (per-delegation tax, ~$0.10–0.15 on Sonnet) Impact: Every sessions_spawn → subagent → relay cycle rewrites the entire cached prefix on the parent agent, regardless of whether the underlying workspace bootstrap changed.

Symptom

When a parent agent (e.g. main/Reeves) delegates to a subagent via sessions_spawn, yields its turn, and later receives the subagent's task-completion event, the follow-up relay turn shows:

cacheRead:  0
cacheWrite: ~25,000–40,000 tokens

The cache prefix has been fully rewritten, even when:

  • The relay follows the delegation by only ~30–60 seconds
  • Anthropic's ephemeral cache (5-min default, 1-hour with cacheRetention: "long") has not expired
  • The workspace bootstrap files on disk have not changed

A working example of the cost profile:

TurnWhatcacheReadcacheWriteCost (Sonnet 4.5/4.6)
Parent 1Receive user msg, call sessions_spawn, yield39,137490$0.016
Parent 2Receive task_completion event, relay to user037,145$0.144

The second turn's cost is dominated by the cold cache rewrite of ~37K tokens of bootstrap + history that did not change between the two turns.

Root cause

File: dist/pi-embedded-CNTNdlGw.js Lines: 31273–31274

const isContinuationTurn = resolveContextInjectionMode(params.config) === "continuation-skip"
    && params.bootstrapContextRunKind !== "heartbeat"
    && await hasCompletedBootstrapTurn(params.sessionFile);

const shouldRecordCompletedBootstrapTurn = !isContinuationTurn
    && params.bootstrapContextMode !== "lightweight"
    && params.bootstrapContextRunKind !== "heartbeat";

The relay turn that processes a subagent's task_completion event is invoked with bootstrapContextRunKind: "default". There is no "relay" case, and no code path that detects "this incoming event is a subagent completion that should inherit the parent session's existing context rather than re-inject bootstrap."

As a result:

  1. isContinuationTurn evaluates to false even though the relay follows an already-completed bootstrap turn (because the continuation-skip workaround only fires if the config opts in, AND the subagent event is treated as a new "user message" that requires full-bootstrap)
  2. shouldRecordCompletedBootstrapTurn evaluates to true
  3. A openclaw:bootstrap-context:full custom event fires
  4. The context engine rebuilds the full bootstrap prefix for the relay turn, invalidating the prompt cache

Confirmation in session trace (/path/to/agent/main/sessions/<id>.jsonl):

{"type":"custom","customType":"openclaw.cache-ttl",...}
{"type":"custom","customType":"openclaw:bootstrap-context:full",...}

These two events fire in sequence right before every relay turn, not on any other turn type.

Partial workaround (what I've tried)

Setting agents.defaults.contextInjection: "continuation-skip" does reduce the emission frequency of openclaw:bootstrap-context:full — it no longer fires on every CLI call within the same session — but the relay turn itself still pays the cache write because the subagent completion message is treated as new content requiring the full bootstrap prefix, not as a continuation.

Tested result with continuation-skip enabled:

  • Inter-call bootstrap re-emission: eliminated ✅
  • Relay turn cache rewrite: still ~23K tokens cacheWrite, cacheRead=0

Proposed fix

Introduce a dedicated bootstrapContextRunKind: "relay" (or "subagent_return") value and exempt it from full bootstrap re-injection the same way "heartbeat" is exempted:

// dist/pi-embedded-CNTNdlGw.js:31273
const isContinuationTurn = 
    (resolveContextInjectionMode(params.config) === "continuation-skip" 
     || params.bootstrapContextRunKind === "relay")
    && params.bootstrapContextRunKind !== "heartbeat"
    && await hasCompletedBootstrapTurn(params.sessionFile);

// dist/pi-embedded-CNTNdlGw.js:31274
const shouldRecordCompletedBootstrapTurn = !isContinuationTurn
    && params.bootstrapContextMode !== "lightweight"
    && params.bootstrapContextRunKind !== "heartbeat"
    && params.bootstrapContextRunKind !== "relay";

And the server-side subagent completion handler (likely in dist/server.impl-WjqjRArz.js) should set bootstrapContextRunKind: "relay" when invoking the embedded Pi runner to process a task_completion event targeted at a parent session.

Estimated saving

Per delegation, on Sonnet 4.6:

  • Current: ~$0.14 structural tax per relay turn (37K token cache rewrite)
  • After fix: ~$0.01–0.02 (cache hit on unchanged prefix + small cache extension for the relay content)
  • Savings: ~$0.12 per delegation, or ~87% of the relay turn cost

For any fleet doing meaningful parent→subagent orchestration (Chief of Staff → CTO → technician patterns, model comparison protocols, etc.), this compounds fast.

Related

  • Subagent session keys are hardcoded to crypto.randomUUID() (dist/pi-embedded-CNTNdlGw.js:13901), which means every delegation also pays a cold-start bootstrap on the subagent side — see separate issue 02-subagent-session-reuse.md.
  • Together these two issues make parent → subagent delegation ~10× more expensive than it structurally needs to be.

extent analysis

TL;DR

Introduce a dedicated bootstrapContextRunKind: "relay" value to exempt relay turns from full bootstrap re-injection, reducing the cache rewrite cost.

Guidance

  • Identify the server-side subagent completion handler (likely in dist/server.impl-WjqjRArz.js) and modify it to set bootstrapContextRunKind: "relay" when invoking the embedded Pi runner to process a task_completion event targeted at a parent session.
  • Update the isContinuationTurn and shouldRecordCompletedBootstrapTurn logic in dist/pi-embedded-CNTNdlGw.js to account for the new bootstrapContextRunKind: "relay" value.
  • Verify the fix by checking the cache read and write metrics in the session trace (/path/to/agent/main/sessions/<id>.jsonl) and confirming that the openclaw:bootstrap-context:full event is no longer fired on relay turns.
  • Monitor the cost savings by comparing the relay turn costs before and after the fix, expecting a reduction of ~$0.12 per delegation.

Example

// dist/pi-embedded-CNTNdlGw.js:31273
const isContinuationTurn = 
    (resolveContextInjectionMode(params.config) === "continuation-skip" 
     || params.bootstrapContextRunKind === "relay")
    && params.bootstrapContextRunKind !== "heartbeat"
    && await hasCompletedBootstrapTurn(params.sessionFile);

// dist/pi-embedded-CNTNdlGw.js:31274
const shouldRecordCompletedBootstrapTurn = !isContinuationTurn
    && params.bootstrapContextMode !== "lightweight"
    && params.bootstrapContextRunKind !== "heartbeat"
    && params.bootstrapContextRunKind !== "relay";

Notes

The proposed fix assumes that the bootstrapContextRunKind value is correctly set and propagated throughout the system. Additional debugging may be necessary to ensure that the fix is correctly implemented and functioning as expected.

Recommendation

Apply the proposed workaround by introducing the bootstrapContextRunKind: "relay" value and updating the relevant logic, as it is expected to reduce the cache rewrite cost and provide significant cost savings.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING