openclaw - ✅(Solved) Fix [Bug]: Subagent loses original task when model fallback triggers, receives "Continue where you left off" instead [4 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55581Fetched 2026-04-08 01:37:40
View on GitHub
Comments
2
Participants
3
Timeline
11
Reactions
0
Timeline (top)
cross-referenced ×4commented ×2labeled ×2closed ×1

When a subagent's default model call fails (e.g., missing API key), the model fallback mechanism correctly switches to the fallback model. However, the original task message is replaced with a recovery message "Continue where you left off. The previous model attempt failed or timed out.", causing the subagent to lose its original task entirely.

Error Message

[ERROR] lane task error: lane=subagent durationMs=2 error="FailoverError: No API key found for provider "zai". Auth store: /Users/cangmang/.openclaw/agents/main/agent/auth-profiles.json"

[WARN] model_fallback_decision: candidate_failed, reason="auth", status=401, requestedProvider="zai", requestedModel="glm-5-turbo", nextCandidateProvider="deepseek", nextCandidateModel="deepseek-chat"

[ERROR] lane task error: lane=session:agent:main:subagent:xxx durationMs=4 error="FailoverError: No API key found for provider "zai"."

Root Cause

When the subagent's first model call fails, OpenClaw's model fallback correctly switches to the backup model. However, when reconstructing the conversation context for the fallback model, the system replaces the original task message with a recovery prompt ("Continue where you left off..."). This causes the subagent to lose its original task.

Fix Action

Temporary Workaround

  1. Ensure the default model has a valid API key configured
  2. Or specify a working model explicitly when spawning subagents: `model: "deepseek/deepseek-chat"

PR fix notes

PR #55632: fix(agents): preserve original task prompt on model fallback for new sessions

Description (problem / solution / changelog)

Summary

Fixes #55581 — Preserves the original task prompt during model fallback for new sessions (like subagent spawns) instead of replacing it with a generic recovery message.

Root Cause

  1. When a model fallback occurs, resolveFallbackRetryPrompt replaces the original task prompt with "Continue where you left off. The previous model attempt failed or timed out." to prevent duplicate user messages in the session history.
  2. However, for new sessions (such as those created by sessions_spawn for subagents), the SessionManager only flushes the initial user message to the transcript file after the first assistant response arrives.
  3. If the primary model fails early (e.g., due to an authentication error or prompt rejection), the original user message is never persisted to the session history.
  4. Consequently, the fallback model receives only the generic recovery message without any prior context, causing the subagent to lose its original task entirely.

Changes

src/agents/command/attempt-execution.ts

  • Added a sessionHasHistory parameter to resolveFallbackRetryPrompt and runAgentAttempt.
  • Updated resolveFallbackRetryPrompt to return the original body instead of the recovery message when sessionHasHistory is false, ensuring the fallback model receives the actual task.

src/agents/agent-command.ts

  • Passed !isNewSession as the sessionHasHistory argument when calling runAgentAttempt within the fallback loop.

src/agents/command/attempt-execution.test.ts

  • Added unit tests for resolveFallbackRetryPrompt to verify prompt preservation logic for both new and existing sessions across initial attempts and fallback retries.

Test Results

  • Unit tests added and passing for resolveFallbackRetryPrompt.
  • Verified that fallback retries on new sessions correctly preserve the original prompt.

Changed files

  • src/agents/agent-command.ts (modified, +3/-1)
  • src/agents/command/attempt-execution.test.ts (added, +159/-0)
  • src/agents/command/attempt-execution.ts (modified, +73/-0)

PR #55660: fix(agents): preserve subagent task across model fallback retry

Description (problem / solution / changelog)

Summary

  • Problem: subagent runs could lose the original task when the first model attempt failed and fallback retried with the generic prompt Continue where you left off....
  • Why it matters: first-turn subagent fallbacks could waste tokens on a meaningless recovery prompt instead of executing the requested task.
  • What changed: fallback retry prompt selection now checks whether the current run already persisted the same user turn; if not, it reuses the original task body. Added a focused regression test for both branches.
  • What did NOT change (scope boundary): no fallback candidate selection, auth failover policy, or broader subagent orchestration behavior changed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #55581
  • Related #
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: fallback retry prompt resolution unconditionally replaced the original body with a generic resume prompt on every retry, even when the failed attempt had not yet persisted the user turn.
  • Missing detection / guardrail: there was no check for whether the current run had already written the in-flight user message into the session transcript.
  • Prior context (git blame, prior PR, issue, or refactor if known): issue report in #55581; local fix was derived from the current fallback retry path rather than a specific prior refactor.
  • Why this regressed now: the retry path assumed an existing persisted turn and treated all retries like “resume from interrupted conversation”, which is wrong for first-turn subagent failures.
  • If unknown, what was ruled out: ruled out model fallback candidate selection itself; the failure was in prompt replacement before the retried run.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/command/attempt-execution.test.ts
  • Scenario the test should lock in: fallback retry keeps the original task when the current run has not yet persisted it, and uses the resume prompt only when the same user turn is already present in the transcript.
  • Why this is the smallest reliable guardrail: the bug is isolated to prompt selection at the retry boundary and does not require a full model-fallback integration harness to reproduce.
  • Existing test that already covers this (if any): none found.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Subagents now keep the original task on fallback retry when the first failed attempt did not persist the user turn yet.
  • Resume-style retry text is still used for retries that genuinely continue an already-persisted turn.

Diagram (if applicable)

Before:
[user spawns subagent] -> [first model attempt fails before turn is persisted] -> [retry uses generic resume prompt] -> [original task lost]

After:
[user spawns subagent] -> [first model attempt fails before turn is persisted] -> [retry reuses original task] -> [subagent executes intended request]

## Changed files

- `src/agents/agent-command.ts` (modified, +1/-0)
- `src/agents/command/attempt-execution.test.ts` (added, +91/-0)
- `src/agents/command/attempt-execution.ts` (modified, +39/-1)


---

# PR #55670: fix: discord OOM, OpenAI tool adjacency, subagent fallback, auth cache

- Repository: openclaw/openclaw
- Author: ayushozha
- State: open | merged: False
- Link: https://github.com/openclaw/openclaw/pull/55670

## Description (problem / solution / changelog)

## Summary

Fixes four independent high-impact bugs:

- **#55606** — Discord health-monitor triggers excessive stale-socket reconnects on quiet servers, leaking memory until OOM. Removed blanket `lastEventAt` inflation from debug messages and added a quiet-server guard using `lastConnectedAt` so connected channels with no real events aren't treated as stale.

- **#55544** — Session reset prompt (`/new`, `/reset`) breaks tool_call adjacency for custom OpenAI-compatible providers (HTTP 400). Enabled `validateAnthropicTurns` for all `openai-completions` providers so orphaned tool_calls from prior sessions are stripped before sending.

- **#55581** — Subagent loses its original task when model fallback triggers, receiving a generic "Continue where you left off" recovery message instead. Now preserves the original body when `isNewSession` is true since the transcript is empty and the fallback model needs the actual task.

- **#55562** — Gateway in-memory auth cache overwrites per-agent API keys on disk during usage/cooldown updates. Added runtime cache write-through after locked saves so subsequent reads via `ensureAuthProfileStore()` return fresh data.

## Test plan

- [x] `pnpm vitest run src/gateway/channel-health-policy.test.ts` — 16 tests pass (2 new)
- [x] `pnpm vitest run src/agents/transcript-policy.policy.test.ts` — 4 tests pass (2 new)
- [x] `pnpm vitest run src/agents/auth-profiles.runtime-snapshot-save.test.ts` — 2 tests pass (1 new)
- [x] `pnpm tsc --noEmit` — clean

## Changed files

- `extensions/discord/src/monitor/provider.lifecycle.reconnect.ts` (modified, +4/-1)
- `src/agents/agent-command.ts` (modified, +1/-0)
- `src/agents/auth-profiles.runtime-snapshot-save.test.ts` (modified, +55/-0)
- `src/agents/auth-profiles/store.ts` (modified, +7/-0)
- `src/agents/command/attempt-execution.ts` (modified, +7/-1)
- `src/agents/pi-embedded-helpers.validate-turns.test.ts` (modified, +46/-0)
- `src/agents/pi-embedded-helpers/turns.ts` (modified, +92/-19)
- `src/agents/transcript-policy.policy.test.ts` (modified, +18/-0)
- `src/agents/transcript-policy.test.ts` (modified, +5/-2)
- `src/agents/transcript-policy.ts` (modified, +6/-1)
- `src/gateway/channel-health-policy.test.ts` (modified, +42/-0)
- `src/gateway/channel-health-policy.ts` (modified, +12/-0)


---

# PR #56471: agents: tighten ACP delegation prompt guards

- Repository: openclaw/openclaw
- Author: Mintalix
- State: closed | merged: False
- Link: https://github.com/openclaw/openclaw/pull/56471

## Description (problem / solution / changelog)

## Summary

Describe the problem and fix in 2–5 bullets:

- Problem: the ACP delegation prompt path had a few correctness and safety gaps after the structured prompt work landed for delegated target agents such as `codex` or `kimi`.
- Why it matters: policy-denied ACP turns should fail fast, delegated target agents should not receive unsafe or mis-cited memory content, and ACP sessions should cache their skills snapshot instead of rebuilding it every turn.
- What changed: moved ACP prompt construction to after ACP policy checks, persisted ACP `skillsSnapshot` back to the session entry, guarded memory-file reads with workspace boundary checks, added `memory.md` fallback, fixed memory line-number citation drift caused by trimming before line splitting, and removed redundant snippet re-slicing.
- What did NOT change (scope boundary): no changes to ACP runtime policy semantics, no changes to transcript persistence semantics, and no changes outside the ACP delegation prompt / ACP routing test surface.

## Change Type

- [x] Bug fix
- [ ] Feature
- [x] Refactor required for the fix
- [ ] Docs
- [ ] Security hardening
- [ ] Chore/infra

## Scope (select all touched areas)

- [x] Gateway / orchestration
- [ ] Skills / tool execution
- [ ] Auth / tokens
- [x] Memory / storage
- [ ] Integrations
- [ ] API / contracts
- [ ] UI / DX
- [ ] CI/CD / infra

## Linked Issue/PR

- Closes #
- Related #55581
- [x] This PR fixes a bug or regression

## Root Cause / Regression History

- Root cause: ACP delegation prompt construction happened too early in the command flow, memory-file reads bypassed the workspace boundary-safe read path, root memory discovery only checked `MEMORY.md`, and line citations were computed from trimmed content rather than the original file content.
- Missing detection / guardrail: ACP routing tests covered prompt structure, but did not yet lock in prompt-build timing or ACP skills-snapshot persistence; memory handling also lacked dedicated ACP-specific guardrail coverage.
- Prior context: the earlier ACP delegation prompt change introduced richer context for delegated target agents, but left a few follow-up correctness, performance, and safety edges in the new code path.
- Why this regressed now: these issues were introduced as follow-up gaps in the new ACP delegation prompt path rather than by unrelated older code.
- If unknown, what was ruled out: the failures are not caused by transcript persistence itself and are not tied to ACP manager dispatch behavior once the turn is allowed to run.

## Regression Test Plan

- Coverage level that should have caught this:
  - [ ] Unit test
  - [x] Seam / integration test
  - [ ] End-to-end test
  - [ ] Existing coverage already sufficient
- Target test or file: `src/commands/agent.acp.test.ts`
- Scenario the test should lock in: ACP policy-denied turns must not build the delegation prompt before failing, and successful ACP turns must persist `skillsSnapshot` to the session store.
- Why this is the smallest reliable guardrail: it exercises the ACP command-routing seam end-to-end without needing a real remote delegated agent.
- Existing test that already covers this: updated ACP runtime routing assertions in `src/commands/agent.acp.test.ts`
- If no new test is added, why not: N/A

## User-visible / Behavior Changes

- ACP sessions that delegate work to target agents such as `codex` or `kimi` now avoid building the delegation prompt when ACP policy rejects the turn.
- ACP delegated prompt memory context now respects `memory.md` fallback and safer workspace-boundary reads.
- ACP sessions now persist `skillsSnapshot` after prompt construction so later turns can reuse it.

## Diagram

    Before:
    [ACP ready session] -> [build delegation prompt + read memory/bootstrap] -> [policy reject or run]

    After:
    [ACP ready session] -> [policy check] -> [build delegation prompt safely] -> [persist skills snapshot] -> [run]

## Security Impact

- New permissions/capabilities? No
- Secrets/tokens handling changed? No
- New/changed network calls? No
- Command/tool execution surface changed? No
- Data access scope changed? Yes
- If any Yes, explain risk + mitigation:
  - ACP delegated prompt memory reads now explicitly use workspace boundary validation, which reduces the chance of symlink or out-of-root file content being sent to delegated target agents.
  - Mitigation: reads now go through the guarded workspace-boundary file path and skip invalid candidates instead of reading raw paths directly.

## Repro + Verification

### Environment

- OS: `Darwin 25.2.0 arm64`
- Runtime/container: `Node v24.12.0`
- Model/provider: ACP test harness / mocked ACP manager
- Integration/channel: ACP session routing
- Relevant config: default ACP test fixture config

### Steps

1. Run `pnpm test -- src/commands/agent.acp.test.ts`
2. Trigger ACP routing for an allowed session and for a policy-denied ACP session in the test harness.
3. Inspect the mocked ACP manager call arguments and persisted session store data.

### Expected

- Policy-denied ACP turns fail before prompt construction.
- Successful ACP turns persist `skillsSnapshot`.
- Memory context handling uses correct fallback and safer file reads.

### Actual

- `pnpm test -- src/commands/agent.acp.test.ts` passes with the new assertions.
- `pnpm build` still fails on unchanged baseline `sourceInfo` type errors in files outside this PR.

## Evidence

Attach at least one:

- [x] Failing test/log before + passing after
- [ ] Trace/log snippets
- [ ] Screenshot/recording
- [ ] Perf numbers (if relevant)

## Human Verification

- Verified scenarios: ACP routing tests pass; policy-denied ACP turns do not build the prompt first; successful ACP turns persist `skillsSnapshot`.
- Edge cases checked: `memory.md` fallback path, memory-citation line handling after leading blank lines, and guarded memory reads in the ACP prompt path.
- What you did **not** verify: full `pnpm test`; clean `pnpm build`, because latest `origin/main` currently fails in unchanged `sourceInfo`-related files outside this PR.

## Review Conversations

- [ ] I replied to or resolved every bot review conversation I addressed in this PR.
- [ ] I left unresolved only the conversations that still need reviewer or maintainer judgment.

## Compatibility / Migration

- Backward compatible? Yes
- Config/env changes? No
- Migration needed? No
- If yes, exact upgrade steps:

## Risks and Mitigations

- Risk:
  - ACP delegation now persists `skillsSnapshot` earlier in the ACP path, so future ACP turns depend more directly on the saved snapshot shape.
  - Mitigation:
    - It reuses the existing session-entry snapshot field and persistence path already used by the non-ACP command flow.

## Changed files

- `src/agents/acp-delegation-prompt.ts` (added, +398/-0)
- `src/agents/agent-command.ts` (modified, +40/-6)
- `src/commands/agent.acp.test.ts` (modified, +82/-4)

Code Example

[ERROR] lane task error: lane=subagent durationMs=2
error="FailoverError: No API key found for provider "zai".
Auth store: /Users/cangmang/.openclaw/agents/main/agent/auth-profiles.json"

[WARN] model_fallback_decision:
candidate_failed, reason="auth", status=401,
requestedProvider="zai", requestedModel="glm-5-turbo",
nextCandidateProvider="deepseek", nextCandidateModel="deepseek-chat"

[ERROR] lane task error: lane=session:agent:main:subagent:xxx durationMs=4
error="FailoverError: No API key found for provider "zai"."
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When a subagent's default model call fails (e.g., missing API key), the model fallback mechanism correctly switches to the fallback model. However, the original task message is replaced with a recovery message "Continue where you left off. The previous model attempt failed or timed out.", causing the subagent to lose its original task entirely.

Steps to reproduce

  1. Configure default model as zai/glm-5-turbo without an API key
  2. Configure fallback model as deepseek/deepseek-chat (with valid API key)
  3. Create a subagent via sessions_spawn with a task, e.g., "发送一句'你好'"
  4. Check the subagent's session history

Expected behavior

The subagent should receive the original task message even after model fallback. The fallback should be transparent — the fallback model should receive the same original task.

Actual behavior

The subagent's first user message in session history is:

Continue where you left off. The previous model attempt failed or timed out.

The original task is completely lost. The subagent then spends tokens trying to "recover" from a non-existent previous session.

OpenClaw version

2026.3.13 (61d171a)

Operating system

macOS Darwin 25.3.0 (arm64)

Install method

npm install -g openclaw

Model

zai/glm-5-turbo/deepseek/deepseek-chat

Provider / routing chain

zai (auth failed) → fallback → deepseek

Additional provider/model setup details

 "model": {
      "primary": "zai/glm-5-turbo",
      "fallbacks": [
        "deepseek/deepseek-chat",
        "deepseek/deepseek-reasoner"
      ]
    },

Logs, screenshots, and evidence

[ERROR] lane task error: lane=subagent durationMs=2
error="FailoverError: No API key found for provider "zai".
Auth store: /Users/cangmang/.openclaw/agents/main/agent/auth-profiles.json"

[WARN] model_fallback_decision:
candidate_failed, reason="auth", status=401,
requestedProvider="zai", requestedModel="glm-5-turbo",
nextCandidateProvider="deepseek", nextCandidateModel="deepseek-chat"

[ERROR] lane task error: lane=session:agent:main:subagent:xxx durationMs=4
error="FailoverError: No API key found for provider "zai"."

Impact and severity

  • Subagent loses original task, executes meaningless operations
  • Wastes tokens and execution time
  • Users may mistakenly believe the task was completed if they don't inspect subagent output

Additional information

Environment

  • OpenClaw Version: 2026.3.13 (61d171a)
  • OS: macOS Darwin 25.3.0 (arm64)
  • Node.js: v22.22.1

Root Cause Analysis

When the subagent's first model call fails, OpenClaw's model fallback correctly switches to the backup model. However, when reconstructing the conversation context for the fallback model, the system replaces the original task message with a recovery prompt ("Continue where you left off..."). This causes the subagent to lose its original task.

Suggested Fix

In the model fallback flow for subagents, preserve and pass the original task message to the fallback model instead of replacing it with a recovery message. The fallback should be transparent to the subagent.

Temporary Workaround

  1. Ensure the default model has a valid API key configured
  2. Or specify a working model explicitly when spawning subagents: `model: "deepseek/deepseek-chat"

extent analysis

Fix Plan

To preserve the original task message during model fallback for subagents, follow these steps:

  1. Modify the model fallback logic: In the OpenClaw codebase, locate the function responsible for handling model fallbacks. Specifically, find where the recovery message is generated and replace it with the original task message.
  2. Pass the original task message: Ensure that the original task message is passed to the fallback model. This might involve modifying the function parameters or the data structure used to store conversation context.
  3. Update the conversation context: When reconstructing the conversation context for the fallback model, use the original task message instead of the recovery prompt.

Example Code Snippet (hypothetical):

// Before (recovery message generation)
const recoveryMessage = "Continue where you left off. The previous model attempt failed or timed out.";
const fallbackContext = { message: recoveryMessage };

// After (preserve original task message)
const originalTaskMessage = getOriginalTaskMessage(); // Implement this function to retrieve the original task message
const fallbackContext = { message: originalTaskMessage };

Additional Code Changes:

// In the model fallback function
function handleModelFallback(error, originalTaskMessage, fallbackModel) {
  // ...
  const fallbackContext = { message: originalTaskMessage }; // Use the original task message
  // ...
  return fallbackModel.process(fallbackContext);
}

Verification

To verify the fix, follow these steps:

  1. Configure the default model without an API key.
  2. Configure a fallback model with a valid API key.
  3. Create a subagent with a task.
  4. Check the subagent's session history to ensure the original task message is preserved.

Extra Tips

  • Ensure that the getOriginalTaskMessage() function is implemented correctly to retrieve the original task message.
  • Test the fix with different model configurations and error scenarios to ensure the fallback logic works as expected.
  • Consider adding logging or debugging statements to verify that the original task message is being passed correctly to the fallback model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The subagent should receive the original task message even after model fallback. The fallback should be transparent — the fallback model should receive the same original task.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING