openclaw - ✅(Solved) Fix sessions_spawn returns modelApplied:true while actually running a stale resumed model [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63221Fetched 2026-04-09 07:56:44
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

When spawning a subagent with an explicit model argument, the gateway appears to resume an existing session entry for that agent (matching by agentId) instead of creating a fresh session. The resumed session can have a stale model cached in its persisted state — meaning the subagent runs on the wrong model while the spawn response cheerfully reports modelApplied: true.

Two distinct problems:

  1. Routing bug: Explicit model argument is not honored when a stale session entry exists.
  2. Reporting bug: modelApplied: true is set based on whether the gateway intended to apply a model, not on whether the actual inference used the requested one.

Root Cause

When spawning a subagent with an explicit model argument, the gateway appears to resume an existing session entry for that agent (matching by agentId) instead of creating a fresh session. The resumed session can have a stale model cached in its persisted state — meaning the subagent runs on the wrong model while the spawn response cheerfully reports modelApplied: true.

Two distinct problems:

  1. Routing bug: Explicit model argument is not honored when a stale session entry exists.
  2. Reporting bug: modelApplied: true is set based on whether the gateway intended to apply a model, not on whether the actual inference used the requested one.

Fix Action

Fix / Workaround

We had to introduce a mandatory hard rule: every subagent spawn must be verified post-completion via subagents list before its output can be trusted. This is a workaround, not a fix.

Workaround we are using

  1. Mandatory subagents list verification after every spawn
  2. A 5-minute cron that scans sessions.json files and clears stale authProfileOverride / providerOverride / modelOverride from auto-source entries older than 10 min (reduces — but does not eliminate — the chance of resuming a sticky-stale session)

PR fix notes

PR #64508: fix(agents): persist subagent spawn model to override fields

Description (problem / solution / changelog)

Summary

  • Problem: When spawning a subagent with a model override (via sessions_spawn or subagents.model config), the child session runs on the default model instead of the requested model, despite modelApplied: true in the spawn response.
  • Why it matters: Subagent model routing is broken. Users are billed for expensive models (Opus) when they configured cheaper ones (Sonnet/Gemma). The modelApplied: true response is misleading.
  • What changed: persistInitialChildSessionRuntimeModel() now writes to modelOverride/providerOverride/modelOverrideSource instead of model/modelProvider. Secondary fix: live-switch path uses agent-resolved defaults instead of global constants.
  • What did NOT change (scope boundary): agentCommand startup logic is unchanged - it already reads override fields correctly.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #48271
  • Closes #43768
  • Related #57306, #62755, #62562, #63221
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: persistInitialChildSessionRuntimeModel() writes to runtime fields (model, modelProvider), but agentCommandInternal() reads from override fields (modelOverride, providerOverride) for startup model selection.
  • Missing detection / guardrail: Tests expected runtime field persistence, which locked in the buggy behavior.
  • Contributing context: The distinction between runtime fields ("what actually ran") and override fields ("what should run") was not enforced at the spawn boundary.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/subagent-spawn.model-override.test.ts (new), updated assertions in subagent-spawn.model-session.test.ts
  • Scenario the test should lock in: Spawn persistence writes override fields, and first-run model selection reads those override fields.
  • Why this is the smallest reliable guardrail: The bug is a field-contract mismatch between spawn and startup; unit tests at both boundaries are the minimal coverage.

AI disclosure

  • AI-assisted (Claude Opus 4.5 via Claude Code)
  • Tested (build + check + unit tests pass)
  • I understand what the code does

Co-Authored-By: Claude [email protected]

Changed files

  • src/agents/pi-embedded-runner/run.ts (modified, +11/-3)
  • src/agents/subagent-spawn.model-override.test.ts (added, +157/-0)
  • src/agents/subagent-spawn.model-session.test.ts (modified, +5/-4)
  • src/agents/subagent-spawn.test-helpers.ts (modified, +15/-3)
  • src/agents/subagent-spawn.test.ts (modified, +3/-2)
  • src/agents/subagent-spawn.ts (modified, +3/-2)

Code Example

{
  "agent:systems_engineer:...": {
    "model": "glm-5.1",
    "modelProvider": "zai",
    "updatedAt": 1775500000000
  }
}

---

// MCP / RPC call
sessions_spawn({
  "agentId": "systems_engineer",
  "model": "openai-codex/gpt-5.4",
  "task": "..."
})

---

{
  "sessionId": "...",
  "modelApplied": true
}

---

model: glm-5.1NOT the requested gpt-5.4
provider: zai
RAW_BUFFERClick to expand / collapse

Upstream Bug 2 — sessions_spawn returns modelApplied: true while actually running a stale resumed model

Repo: github.com/openclaw/openclaw Suggested labels: bug, gateway, sessions, correctness OpenClaw version: 2026.4.5 Severity: High — silent model substitution with no visible signal to the orchestrator


Title

sessions_spawn with explicit model argument silently resumes a stale session and runs on the cached model, but the spawn response still reports modelApplied: true

Summary

When spawning a subagent with an explicit model argument, the gateway appears to resume an existing session entry for that agent (matching by agentId) instead of creating a fresh session. The resumed session can have a stale model cached in its persisted state — meaning the subagent runs on the wrong model while the spawn response cheerfully reports modelApplied: true.

Two distinct problems:

  1. Routing bug: Explicit model argument is not honored when a stale session entry exists.
  2. Reporting bug: modelApplied: true is set based on whether the gateway intended to apply a model, not on whether the actual inference used the requested one.

Environment

Reproduction

Setup

A session entry already exists in the target agent's sessions.json with a stale model cached (this happens naturally after 24h of operation, especially after a fallback cascade incident).

Example stale entry in ~/.openclaw/agents/systems_engineer/sessions/sessions.json:

{
  "agent:systems_engineer:...": {
    "model": "glm-5.1",
    "modelProvider": "zai",
    "updatedAt": 1775500000000
  }
}

Trigger

// MCP / RPC call
sessions_spawn({
  "agentId": "systems_engineer",
  "model": "openai-codex/gpt-5.4",
  "task": "..."
})

Spawn response (the lie)

{
  "sessionId": "...",
  "modelApplied": true
}

Actual behavior (the truth)

Run subagents list ~30 seconds later. The completed entry shows:

model: glm-5.1     ← NOT the requested gpt-5.4
provider: zai

The subagent ran for ~3 minutes on the wrong (weaker) model, with no warning to the orchestrator.

Expected behavior

Either:

  • (A) sessions_spawn with an explicit model argument should always create a fresh session with that model, never resuming a stale one. Or:
  • (B) If session-resume is intentional for context-continuity reasons, then:
    • The spawn response must include actualModel: "<provider>/<model>" reflecting the resumed session's effective model
    • modelApplied must be false if actualModel !== requestedModel
    • There should be a forceFreshSession: true parameter callers can pass to opt out of resume

Actual behavior

  • Stale session is silently resumed
  • modelApplied: true is returned regardless
  • The orchestrator has no programmatic way to detect the mismatch — it has to manually re-query subagents list after every spawn and compare

Impact

This bug compounded a 24-hour fallback cascade incident on 2026-04-06/07. The orchestrator spawned a Systems Engineer recon subagent with explicit model: openai-codex/gpt-5.4, received modelApplied: true, and only discovered post-completion (via subagents list) that the recon had actually run on zai/glm-5.1 — degrading the analysis quality silently.

We had to introduce a mandatory hard rule: every subagent spawn must be verified post-completion via subagents list before its output can be trusted. This is a workaround, not a fix.

Open questions

  1. Is session-resume on sessions_spawn intentional? If yes, please document when it triggers (matching key? matching agentId? both?).
  2. What is the contract of modelApplied? Is it "the requested model was applied" or "we applied some model"? The current behavior makes it useless as a correctness signal.
  3. Is there an existing way to force a fresh session on spawn? We could not find one in the CLI / RPC docs.

Suggested fix

  • Add actualModel field to sessions_spawn response (always populated)
  • Make modelApplied reflect effective application, not intended
  • Add forceFreshSession: boolean parameter to sessions_spawn (default behavior TBD by maintainers)
  • Document the resume-vs-fresh decision logic clearly

Workaround we are using

  1. Mandatory subagents list verification after every spawn
  2. A 5-minute cron that scans sessions.json files and clears stale authProfileOverride / providerOverride / modelOverride from auto-source entries older than 10 min (reduces — but does not eliminate — the chance of resuming a sticky-stale session)

Related

  • Sister issue: pi-agent-core lifecycle race (Agent listener invoked outside active run) — filed separately.
  • Full incident report available: ~/.openclaw/workspace/output/post-restart-fallback-cascade-incident-report.md

extent analysis

TL;DR

The most likely fix involves modifying the sessions_spawn response to include an actualModel field and updating the modelApplied logic to reflect the effective model application.

Guidance

  • Review the sessions_spawn implementation to determine why it resumes a stale session instead of creating a fresh one when an explicit model argument is provided.
  • Consider adding a forceFreshSession parameter to sessions_spawn to allow callers to opt out of session resumption.
  • Update the modelApplied logic to reflect the actual model used, rather than the intended one.
  • Document the session resumption decision logic and the contract of modelApplied to avoid similar issues in the future.

Example

No code snippet is provided, as the issue does not contain sufficient information to create a specific example.

Notes

The provided workaround, which involves verifying the subagents list after every spawn, can help mitigate the issue but may not be scalable or efficient. A more permanent fix is needed to address the root cause of the problem.

Recommendation

Apply the suggested fix, which includes adding an actualModel field to the sessions_spawn response, updating the modelApplied logic, and adding a forceFreshSession parameter. This will provide a more accurate and reliable way to determine the effective model used by the subagent.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Either:

  • (A) sessions_spawn with an explicit model argument should always create a fresh session with that model, never resuming a stale one. Or:
  • (B) If session-resume is intentional for context-continuity reasons, then:
    • The spawn response must include actualModel: "<provider>/<model>" reflecting the resumed session's effective model
    • modelApplied must be false if actualModel !== requestedModel
    • There should be a forceFreshSession: true parameter callers can pass to opt out of resume

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING