openclaw - 💡(How to fix) Fix [Bug]: cli-runner reports bare alias as agentMeta.model; persisted to sessions.json; primary candidate not alias-resolved → cascading 404 + auth profile cooldown [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73657Fetched 2026-04-29 06:16:50
View on GitHub
Comments
2
Participants
3
Timeline
3
Reactions
0
Timeline (top)
commented ×2closed ×1

For agents using a CLI backend (e.g. claude-cli/sonnet), every successful turn poisons the persisted session entry with a bare model alias ('sonnet', 'haiku', 'opus'). On the next request the bare alias is sent literally to api.anthropic.com → HTTP 404 model_not_found. A single 404 puts the auth profile into cooldown, blocking subsequent retries on Anthropic. Replies fail or take 20–30 s as the failover walks through the chain.

The bug is the interaction of three places in the codebase:

  1. src/agents/cli-runner.ts:298 writes the bare alias to agentMeta.model:
    const modelId = (params.model ?? "default").trim() || "default";   // line 77 → bare alias
    ...
    agentMeta: { ..., model: modelId, ... }                            // line 298
  2. src/auto-reply/reply/session-usage.ts:54 (persistSessionUsageUpdate) saves that alias to sessions.json:
    model: params.modelUsed ?? entry.model
  3. src/agents/model-fallback.ts:172 adds the primary candidate to the failover list without alias resolution, so the bare alias is sent on the wire:
    addCandidate({ provider, model }, false);   // not alias-resolved
    while the fallbacks correctly go through resolveModelRefFromString (lines 188–197).

The asymmetry in (3) is the root cause; (1) is what feeds it; (2) is what makes it persistent across requests.

Code word: lobster-biscuit

Error Message

error=HTTP 404 not_found_error: model: sonnet rawError=404 {"type":"error","error":{"type":"not_found_error","message":"model: sonnet"}}

Root Cause

The asymmetry in (3) is the root cause; (1) is what feeds it; (2) is what makes it persistent across requests.

Fix Action

Fix / Workaround

Workarounds

Code Example

const modelId = (params.model ?? "default").trim() || "default";   // line 77 → bare alias
   ...
   agentMeta: { ..., model: modelId, ... }                            // line 298

---

model: params.modelUsed ?? entry.model

---

addCandidate({ provider, model }, false);   // not alias-resolved
RAW_BUFFERClick to expand / collapse

Summary

For agents using a CLI backend (e.g. claude-cli/sonnet), every successful turn poisons the persisted session entry with a bare model alias ('sonnet', 'haiku', 'opus'). On the next request the bare alias is sent literally to api.anthropic.com → HTTP 404 model_not_found. A single 404 puts the auth profile into cooldown, blocking subsequent retries on Anthropic. Replies fail or take 20–30 s as the failover walks through the chain.

The bug is the interaction of three places in the codebase:

  1. src/agents/cli-runner.ts:298 writes the bare alias to agentMeta.model:
    const modelId = (params.model ?? "default").trim() || "default";   // line 77 → bare alias
    ...
    agentMeta: { ..., model: modelId, ... }                            // line 298
  2. src/auto-reply/reply/session-usage.ts:54 (persistSessionUsageUpdate) saves that alias to sessions.json:
    model: params.modelUsed ?? entry.model
  3. src/agents/model-fallback.ts:172 adds the primary candidate to the failover list without alias resolution, so the bare alias is sent on the wire:
    addCandidate({ provider, model }, false);   // not alias-resolved
    while the fallbacks correctly go through resolveModelRefFromString (lines 188–197).

The asymmetry in (3) is the root cause; (1) is what feeds it; (2) is what makes it persistent across requests.

Code word: lobster-biscuit

Steps to reproduce

  1. Configure agents.list[main].model.primary = "claude-cli/sonnet" (the published Pattern B CLI shell-out).
  2. Have agents.defaults.models declare aliases, e.g. "anthropic/claude-sonnet-4-20250514": { "alias": "sonnet" }.
  3. Send a message → agent replies successfully.
  4. Inspect the session entry in ~/.openclaw/agents/<id>/sessions/sessions.json: model is now 'sonnet', modelProvider is 'claude-cli'.
  5. Send a follow-up on the same session.
  6. The follower path / failover path resolves provider to anthropic (canonical for claude-cli) and model sonnet literally → primary candidate becomes anthropic/sonnet → 404 → auth profile cooldown.

Expected

agentMeta.model should be the canonical full model ID, or the primary candidate in resolveFallbackCandidates should go through resolveModelRefFromString like the fallbacks already do. Either fix prevents the cascade.

Actual

Cascading 404s → auth profile cooldown → "Embedded agent failed before reply: All models failed" → silent reply drop.

Environment

  • OpenClaw version: 2026.4.26 (be8c246)
  • Node: v25.2.1
  • OS: macOS Darwin 25.3.0 (Apple Silicon)
  • Install method: `npm install -g openclaw`
  • Channel where observed: Slack (Socket Mode), `main` agent on `claude-cli/sonnet`

Logs / evidence

Persisted session entry after one successful Slack reply (the failure seed): ``` "agent:main:slack:channel:CXXXXXXX": { ... "modelProvider": "claude-cli", "model": "sonnet", ← bare alias persisted by session-usage.ts:54 ... } ```

Subsequent request: ``` [agent/embedded] embedded run agent end: runId=85a1c67a... isError=true model=sonnet provider=anthropic error=HTTP 404 not_found_error: model: sonnet rawError=404 {"type":"error","error":{"type":"not_found_error","message":"model: sonnet"}}

[agent/embedded] auth profile failure state updated: runId=85a1c67a... profile=sha256:05de... provider=anthropic reason=model_not_found window=cooldown reused=false

[model-fallback/decision] decision=candidate_failed requested=anthropic/sonnet candidate=anthropic/sonnet reason=model_not_found providerErrorType=not_found_error next=anthropic/haiku detail=model: sonnet

Embedded agent failed before reply: All models failed (1): anthropic/sonnet: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found) ```

Adjacent issue: an auth profile being moved to cooldown on `model_not_found` (a config bug, not an auth failure) compounds the impact — once the alias 404 fires once, the profile is locked out for the cooldown window. Worth a separate filing if not already known.

Impact

High. Every successful CLI-backend reply seeds a future 404 on the same session, making Slack/iMessage replies feel unreliable. Auth profile cooldown widens the blast radius to also block legitimate same-provider retries.

Workarounds

  1. Local rewrite watchdog — a 60s LaunchAgent reads `sessions.json`, resolves bare-alias `model` fields against the alias map, atomic-rewrites to canonical IDs. Preserves session continuity. Low-cost, fully external to OpenClaw.
  2. Wipe alias entries — bare deletion works but loses Slack thread context, `cliSessionIds`, etc. Not durable (re-infects on next reply).
  3. Drop `alias` declarations from `agents.defaults.models` — untested; may break other paths that legitimately read aliases.

Suggested fix

In `src/agents/model-fallback.ts:172`, replace the bare `addCandidate({ provider, model }, false)` with a call through `resolveModelRefFromString({ raw: \`${provider}/${model}\`, defaultProvider, aliasIndex })`, mirroring the fallback path at lines 188–197. This makes the primary candidate symmetric with fallbacks and self-heals the persisted-alias case.

A complementary fix in `cli-runner.ts:298` to report the canonical model ID in `agentMeta.model` (using `normalizedModel` plus the alias index) would also stop the persistence side at the source — but the model-fallback fix alone is sufficient.

extent analysis

TL;DR

The most likely fix is to modify the model-fallback.ts file to resolve the primary candidate model through resolveModelRefFromString, ensuring symmetry with fallbacks and preventing the persisted alias issue.

Guidance

  • Identify the root cause: The asymmetry in model-fallback.ts where the primary candidate is added without alias resolution, causing the bare alias to be sent on the wire.
  • Verify the issue: Check the sessions.json file for bare aliases in the model field and observe the 404 errors and auth profile cooldown.
  • Apply the suggested fix: Replace the addCandidate call in model-fallback.ts:172 with a call to resolveModelRefFromString to resolve the primary candidate model.
  • Consider a complementary fix: Modify cli-runner.ts:298 to report the canonical model ID in agentMeta.model to prevent the persistence of bare aliases.

Example

// In src/agents/model-fallback.ts:172
addCandidate({
  provider,
  model: resolveModelRefFromString({ raw: `${provider}/${model}`, defaultProvider, aliasIndex }),
}, false);

Notes

The provided fix should resolve the issue, but it's essential to test and verify the changes to ensure they don't introduce new problems. Additionally, the adjacent issue of auth profile cooldown on model_not_found should be addressed separately.

Recommendation

Apply the workaround by modifying the model-fallback.ts file to resolve the primary candidate model through resolveModelRefFromString, as this fix alone is sufficient to prevent the persisted alias issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: cli-runner reports bare alias as agentMeta.model; persisted to sessions.json; primary candidate not alias-resolved → cascading 404 + auth profile cooldown [2 comments, 3 participants]