openclaw - 💡(How to fix) Fix Cron sessions inherit persisted model override instead of honoring payload.model, causing thundering herd under overload [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58513Fetched 2026-04-08 02:01:30
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
closed ×1locked ×1

Cron jobs that specify model: openai/gpt-5.4-nano in their payload are not using that model when they execute in isolated sessions. Instead, resolvePersistedLiveSelection() forces the session back to anthropic/claude-sonnet-4-6, overriding the cron's declared model intent. Under Anthropic overload conditions, this causes every cron to amplify the overload instead of gracefully degrading.

Root Cause

The scheduler fires multiple crons simultaneously (observed: 24 UUIDs in one burst). Each cron session goes through resolvePersistedLiveSelection(), which promotes the persisted Sonnet selection over payload.model. This means:

  1. Crons configured for nano/codex still hit Anthropic Sonnet
  2. A burst of 24 simultaneous crons = 24 simultaneous Sonnet requests under overload
  3. Each gets a 503, retries, and amplifies the cascade
RAW_BUFFERClick to expand / collapse

Summary

Cron jobs that specify model: openai/gpt-5.4-nano in their payload are not using that model when they execute in isolated sessions. Instead, resolvePersistedLiveSelection() forces the session back to anthropic/claude-sonnet-4-6, overriding the cron's declared model intent. Under Anthropic overload conditions, this causes every cron to amplify the overload instead of gracefully degrading.

Root Cause

The scheduler fires multiple crons simultaneously (observed: 24 UUIDs in one burst). Each cron session goes through resolvePersistedLiveSelection(), which promotes the persisted Sonnet selection over payload.model. This means:

  1. Crons configured for nano/codex still hit Anthropic Sonnet
  2. A burst of 24 simultaneous crons = 24 simultaneous Sonnet requests under overload
  3. Each gets a 503, retries, and amplifies the cascade

Impact

Observed 2026-03-31: SLBE nightly optimizer (configured model: openai/gpt-5.4-nano) failed at 22:00 PT with model_fallback_decision: candidate_failed because the effective model was Sonnet, not nano.

Expected Behavior

  1. Isolated cron sessions should honor payload.model as the effective model
  2. resolvePersistedLiveSelection() should not apply to ephemeral/isolated cron sessions
  3. The scheduler should add jitter to cron bursts (stagger simultaneous crons by 2-5s each)

Proposed Fix

  1. Skip resolvePersistedLiveSelection() for sessions with runtime: isolated - isolated sessions are ephemeral and have no meaningful persisted state to restore
  2. Add scheduler jitter: when N crons are due at the same tick, spread them across a configurable window (default: 5s)
  3. Honor payload.model as authoritative for isolated sessions

Related

#24378 #32533

extent analysis

TL;DR

To fix the issue, skip resolvePersistedLiveSelection() for isolated sessions and add scheduler jitter to prevent simultaneous cron executions.

Guidance

  • Identify and modify the resolvePersistedLiveSelection() function to check for runtime: isolated sessions and skip the persisted state restoration for these sessions.
  • Implement scheduler jitter by introducing a delay between simultaneous cron executions, spreading them across a configurable time window (e.g., 5 seconds).
  • Verify that payload.model is honored as the authoritative model for isolated sessions by checking the effective model used during cron execution.
  • Test the proposed fix under Anthropic overload conditions to ensure that the cron jobs gracefully degrade instead of amplifying the overload.

Example

No explicit code example is provided, as the issue lacks specific implementation details. However, the fix involves modifying the resolvePersistedLiveSelection() function and introducing scheduler jitter.

Notes

The proposed fix assumes that the resolvePersistedLiveSelection() function and the scheduler are modifiable. If these components are external or third-party, alternative solutions may be necessary.

Recommendation

Apply the proposed workaround by skipping resolvePersistedLiveSelection() for isolated sessions and adding scheduler jitter, as this approach directly addresses the root cause of the issue and prevents the amplification of Anthropic overload conditions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Cron sessions inherit persisted model override instead of honoring payload.model, causing thundering herd under overload [1 participants]