openclaw - 💡(How to fix) Fix [Bug]: Live Session Switch State Inconsistency After Gateway Restart [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60466Fetched 2026-04-08 02:50:51
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
labeled ×2closed ×1commented ×1locked ×1

When Gateway restarts, the Live Session's model switch state is not persisted, causing the same session to repeatedly trigger the complete model fallback chain. This leads to cascading failures, rate limiting, and unnecessary latency.

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

Workarounds (Current)

Code Example

kimi/kimi2.5 (primary) 
  → zhipu/glm-5.1 (fallback #1) 
  → siliconflow/DeepSeek-R1 (fallback #2)
  → kimi/kimi-for-coding (fallback #3)
  → local/qwen2.5:14b (fallback #4)

---

2026-04-02T12:55:37.233+08:00 [gateway] restart mode: full process restart
...
2026-04-02T12:55:40.884+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:55:40.906+08:00 [agent/embedded] live session model switch detected: siliconflow/DeepSeek-R1 -> kimi/kimi2.5
2026-04-02T12:55:40.946+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:55:40.965+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

---

2026-04-02T12:57:27.636+08:00 [gateway] restart mode: full process restart
...
2026-04-02T12:57:31.355+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:57:31.381+08:00 [agent/embedded] live session model switch detected: siliconflow/DeepSeek-V3-0324 -> kimi/kimi2.5
2026-04-02T12:57:31.425+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:57:31.447+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

---

2026-04-02T12:59:32.268+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:59:32.290+08:00 [agent/embedded] live session model switch detected: siliconflow/DeepSeek-V3.2 -> kimi/kimi2.5
2026-04-02T12:59:32.331+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:59:32.356+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

---

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "kimi-coding/kimi2.5",
        "fallbacks": [
          "zhipu/glm-5.1",
          "siliconflow/deepseek-ai/DeepSeek-V3.2",
          "local/qwen2.5:14b"
        ]
      }
    }
  }
}

---

{
  "sessionId": "17a309d8-...",
  "currentModel": "zhipu/glm-5.1",
  "attemptedModels": ["kimi-coding/kimi2.5", "zhipu/glm-5.1"],
  "modelSwitchedAt": "2026-04-02T12:55:40.884Z"
}

---

.openclaw/cache/session-models.json
{
  "17a309d8-bee5-4752-8775-5791d41df367": {
    "model": "zhipu/glm-5.1",
    "updatedAt": "2026-04-02T12:55:40.884Z"
  }
}

---

# Session transcript: 17a309d8-....jsonl
{"type": "model_switch", "from": "kimi/kimi2.5", "to": "zhipu/glm-5.1", "timestamp": "..."}

---

2026-04-02T12:55:37.233+08:00 [gateway] restart mode: full process restart (spawned pid 22218)
2026-04-02T12:55:38.377+08:00 [canvas] host mounted at http://127.0.0.1:18789/__openclaw__/canvas/
2026-04-02T12:55:38.405+08:00 [heartbeat] started
2026-04-02T12:55:38.406+08:00 [health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
2026-04-02T12:55:38.415+08:00 [gateway] agent model: kimi/kimi2.5
2026-04-02T12:55:38.415+08:00 [gateway] listening on ws://127.0.0.1:18789, ws://[::1]:18789 (PID 22218)
2026-04-02T12:55:38.786+08:00 [agents/model-providers] [xai-auth] bootstrap config fallback: no config-backed key found
2026-04-02T12:55:40.341+08:00 [telegram] [default] starting provider (@jcppa_openclaw_bot)
2026-04-02T12:55:40.641+08:00 [agents/model-providers] [xai-auth] bootstrap config fallback: no config-backed key found
2026-04-02T12:55:40.884+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:55:40.889+08:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=zhipu/glm-5.1 candidate=zhipu/glm-5.1 reason=unknown next=siliconflow/deepseek-ai/DeepSeek-R1
2026-04-02T12:55:40.906+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: siliconflow/deepseek-ai/DeepSeek-R1 -> kimi/kimi2.5
2026-04-02T12:55:40.907+08:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=zhipu/glm-5.1 candidate=siliconflow/deepseek-ai/DeepSeek-R1 reason=unknown next=kimi/kimi-for-coding
2026-04-02T12:55:40.946+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:55:40.947+08:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=zhipu/glm-5.1 candidate=kimi/kimi-for-coding reason=unknown next=local/qwen2.5:14b
2026-04-02T12:55:40.965+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: local/qwen2.5:14b -> kimi/kimi2.5

---

2026-04-03T10:19:04.800+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-03T10:19:04.817+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: siliconflow/deepseek-ai/DeepSeek-V3.2 -> kimi/kimi2.5
2026-04-03T10:19:04.856+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-03T10:19:04.872+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: local/qwen2.5:14b -> kimi/kimi2.5

# Same pattern repeats every 30 minutes for this session:
2026-04-03T10:49:04.831+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-03T10:49:04.855+08:00 [agent/embedded] live session model switch detected: siliconflow/deepseek-ai/DeepSeek-V3.2 -> kimi/kimi2.5
2026-04-03T10:49:04.903+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-03T10:49:04.920+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

2026-04-03T11:19:04.832+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-03T11:19:04.848+08:00 [agent/embedded] live session model switch detected: siliconflow/deepseek-ai/DeepSeek-V3.2 -> kimi/kimi2.5
...
# Pattern continues for hours

---

{
  "models": {
    "providers": {
      "zhipu": { "models": [{"id": "glm-5.1"}, {"id": "GLM-5-Turbo"}] },
      "siliconflow": { "models": [{"id": "deepseek-ai/DeepSeek-V3.2"}] },
      "kimi-coding": { "models": [{"id": "kimi2.5"}, {"id": "kimi-for-coding"}] },
      "local": { "models": [{"id": "qwen2.5:14b"}] }
    }
  }
}

---

{
  "sessionId": "674cdb94-f203-44b1-bb08-49d5ae7e93ee",
  "status": "running",
  "model": "kimi2.5",  // <-- Always reports primary after restart
  "actualModelBeforeRestart": "zhipu/glm-5.1"  // <-- Lost
}

### Logs, screenshots, and evidence
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

When Gateway restarts, the Live Session's model switch state is not persisted, causing the same session to repeatedly trigger the complete model fallback chain. This leads to cascading failures, rate limiting, and unnecessary latency.

Steps to reproduce

  1. Start a session with primary model kimi-coding/kimi2.5
  2. Trigger a model fallback (e.g., due to rate limit) → switches to zhipu/glm-5.1
  3. Gateway restarts (manual or crash recovery)
  4. Same session makes another request
  5. Bug: Session re-triggers the complete fallback chain from the beginning instead of resuming with the last working model

Expected behavior

After Gateway restart, the session should:

  • Resume with the last successfully used model (zhipu/glm-5.1)
  • Not re-attempt already-failed models in the fallback chain
  • Maintain model state persistence across process restarts

Actual behavior

The session repeats the entire fallback chain on every Gateway restart:

kimi/kimi2.5 (primary) 
  → zhipu/glm-5.1 (fallback #1) 
  → siliconflow/DeepSeek-R1 (fallback #2)
  → kimi/kimi-for-coding (fallback #3)
  → local/qwen2.5:14b (fallback #4)

This cycle repeats after each Gateway restart for the same session.

OpenClaw version

2026.3.28

Operating system

macOS Darwin 25.3.0 (arm64)

Install method

No response

Model

kimi-coding/kimi2.5

Provider / routing chain

OpenClaw Gateway → kimi-coding (api.kimi.com) → kimi2.5

Additional provider/model setup details

Evidence

Log Analysis (April 2, 2026)

Session ID: 17a309d8-bee5-4752-8775-5791d41df367

First Gateway Restart (12:55:37)

2026-04-02T12:55:37.233+08:00 [gateway] restart mode: full process restart
...
2026-04-02T12:55:40.884+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:55:40.906+08:00 [agent/embedded] live session model switch detected: siliconflow/DeepSeek-R1 -> kimi/kimi2.5
2026-04-02T12:55:40.946+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:55:40.965+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

Second Gateway Restart (12:57:27 - 2 minutes later)

2026-04-02T12:57:27.636+08:00 [gateway] restart mode: full process restart
...
2026-04-02T12:57:31.355+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:57:31.381+08:00 [agent/embedded] live session model switch detected: siliconflow/DeepSeek-V3-0324 -> kimi/kimi2.5
2026-04-02T12:57:31.425+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:57:31.447+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

Third Cycle (12:59:32)

2026-04-02T12:59:32.268+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:59:32.290+08:00 [agent/embedded] live session model switch detected: siliconflow/DeepSeek-V3.2 -> kimi/kimi2.5
2026-04-02T12:59:32.331+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:59:32.356+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

Impact Statistics

  • 307 model switch events in a single day (April 2)
  • 25 rate limit / timeout errors attributed to this cascading behavior
  • Multiple Gateway restarts triggered by memory pressure from accumulated session states

Root Cause Analysis

Current Behavior

  1. Model switch state exists only in memory
  2. On Gateway restart, session resumes with configured primary model
  3. Session has no knowledge of previously-failed models
  4. Complete fallback chain re-executes from the beginning

Configuration Context

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "kimi-coding/kimi2.5",
        "fallbacks": [
          "zhipu/glm-5.1",
          "siliconflow/deepseek-ai/DeepSeek-V3.2",
          "local/qwen2.5:14b"
        ]
      }
    }
  }
}

Proposed Fix

Option 1: Persist Current Model to Session Metadata (Recommended)

Store the currently active model in the session's persistent metadata:

{
  "sessionId": "17a309d8-...",
  "currentModel": "zhipu/glm-5.1",
  "attemptedModels": ["kimi-coding/kimi2.5", "zhipu/glm-5.1"],
  "modelSwitchedAt": "2026-04-02T12:55:40.884Z"
}

On Gateway restart:

  1. Read currentModel from session metadata
  2. Resume with that model instead of primary
  3. Skip already-attempted models in the fallback chain

Option 2: Add Session-Level Model State Cache

Maintain a persistent cache file mapping session IDs to their current model:

.openclaw/cache/session-models.json
{
  "17a309d8-bee5-4752-8775-5791d41df367": {
    "model": "zhipu/glm-5.1",
    "updatedAt": "2026-04-02T12:55:40.884Z"
  }
}

Option 3: Write Model Switch to Transcript Header

Add model information to the session transcript header on each switch:

# Session transcript: 17a309d8-....jsonl
{"type": "model_switch", "from": "kimi/kimi2.5", "to": "zhipu/glm-5.1", "timestamp": "..."}

Workarounds (Current)

  1. Avoid same-provider adjacent models in fallback chain - Reduces switch frequency but doesn't solve the state loss issue
  2. Increase Gateway memory limits - Delays restarts but doesn't prevent state loss
  3. Use shorter fallback chains - Reduces cascade impact

Related Issues

  • INC-2026-0402-001: Gateway memory pressure leading to frequent restarts
  • Rate limiting from kimi-coding provider due to repeated fallback attempts

Attachments

Full Log Excerpt (Session 17a309d8-bee5-4752-8775-5791d41df367)

2026-04-02T12:55:37.233+08:00 [gateway] restart mode: full process restart (spawned pid 22218)
2026-04-02T12:55:38.377+08:00 [canvas] host mounted at http://127.0.0.1:18789/__openclaw__/canvas/
2026-04-02T12:55:38.405+08:00 [heartbeat] started
2026-04-02T12:55:38.406+08:00 [health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
2026-04-02T12:55:38.415+08:00 [gateway] agent model: kimi/kimi2.5
2026-04-02T12:55:38.415+08:00 [gateway] listening on ws://127.0.0.1:18789, ws://[::1]:18789 (PID 22218)
2026-04-02T12:55:38.786+08:00 [agents/model-providers] [xai-auth] bootstrap config fallback: no config-backed key found
2026-04-02T12:55:40.341+08:00 [telegram] [default] starting provider (@jcppa_openclaw_bot)
2026-04-02T12:55:40.641+08:00 [agents/model-providers] [xai-auth] bootstrap config fallback: no config-backed key found
2026-04-02T12:55:40.884+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-02T12:55:40.889+08:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=zhipu/glm-5.1 candidate=zhipu/glm-5.1 reason=unknown next=siliconflow/deepseek-ai/DeepSeek-R1
2026-04-02T12:55:40.906+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: siliconflow/deepseek-ai/DeepSeek-R1 -> kimi/kimi2.5
2026-04-02T12:55:40.907+08:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=zhipu/glm-5.1 candidate=siliconflow/deepseek-ai/DeepSeek-R1 reason=unknown next=kimi/kimi-for-coding
2026-04-02T12:55:40.946+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-02T12:55:40.947+08:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=zhipu/glm-5.1 candidate=kimi/kimi-for-coding reason=unknown next=local/qwen2.5:14b
2026-04-02T12:55:40.965+08:00 [agent/embedded] live session model switch detected before attempt for 17a309d8-bee5-4752-8775-5791d41df367: local/qwen2.5:14b -> kimi/kimi2.5

Another Affected Session (674cdb94-f203-44b1-bb08-49d5ae7e93ee)

2026-04-03T10:19:04.800+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-03T10:19:04.817+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: siliconflow/deepseek-ai/DeepSeek-V3.2 -> kimi/kimi2.5
2026-04-03T10:19:04.856+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-03T10:19:04.872+08:00 [agent/embedded] live session model switch detected before attempt for 674cdb94-f203-44b1-bb08-49d5ae7e93ee: local/qwen2.5:14b -> kimi/kimi2.5

# Same pattern repeats every 30 minutes for this session:
2026-04-03T10:49:04.831+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-03T10:49:04.855+08:00 [agent/embedded] live session model switch detected: siliconflow/deepseek-ai/DeepSeek-V3.2 -> kimi/kimi2.5
2026-04-03T10:49:04.903+08:00 [agent/embedded] live session model switch detected: kimi/kimi-for-coding -> kimi/kimi2.5
2026-04-03T10:49:04.920+08:00 [agent/embedded] live session model switch detected: local/qwen2.5:14b -> kimi/kimi2.5

2026-04-03T11:19:04.832+08:00 [agent/embedded] live session model switch detected: zhipu/glm-5.1 -> kimi/kimi2.5
2026-04-03T11:19:04.848+08:00 [agent/embedded] live session model switch detected: siliconflow/deepseek-ai/DeepSeek-V3.2 -> kimi/kimi2.5
...
# Pattern continues for hours

Additional Context

Model Fallback Chain Configuration

{
  "models": {
    "providers": {
      "zhipu": { "models": [{"id": "glm-5.1"}, {"id": "GLM-5-Turbo"}] },
      "siliconflow": { "models": [{"id": "deepseek-ai/DeepSeek-V3.2"}] },
      "kimi-coding": { "models": [{"id": "kimi2.5"}, {"id": "kimi-for-coding"}] },
      "local": { "models": [{"id": "qwen2.5:14b"}] }
    }
  }
}

Session Status After Restart

{
  "sessionId": "674cdb94-f203-44b1-bb08-49d5ae7e93ee",
  "status": "running",
  "model": "kimi2.5",  // <-- Always reports primary after restart
  "actualModelBeforeRestart": "zhipu/glm-5.1"  // <-- Lost
}

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

To fix the issue of Live Session's model switch state not being persisted across Gateway restarts, implement a mechanism to store and retrieve the current model for each session, such as storing it in the session's persistent metadata.

Guidance

  1. Identify the root cause: The model switch state is not persisted because it exists only in memory and is lost upon Gateway restart.
  2. Choose a persistence method: Decide on a method to store the current model for each session, such as using session metadata, a session-level model state cache, or writing model switch information to a transcript header.
  3. Implement persistence: Modify the Gateway to store the current model for each session using the chosen method, ensuring that this information is retained across restarts.
  4. Update session resumption: Adjust the session resumption logic to read the stored current model and resume with that model instead of the primary model, skipping already attempted models in the fallback chain.

Example

For example, if using session metadata, the Gateway could store the current model as follows:

{
  "sessionId": "17a309d8-bee5-4752-8775-5791d41df367",
  "currentModel": "zhipu/glm-5.1",
  "attemptedModels": ["kimi-coding/kimi2.5", "zhipu/glm-5.1"],
  "modelSwitchedAt": "2026-04-02T12:55:40.884Z"
}

And then resume the session with the stored current model upon restart.

Notes

  • The choice of persistence method may depend on the specific requirements and constraints of the system, such as performance, scalability, and data consistency.
  • Ensuring data consistency and handling potential errors or conflicts when storing and retrieving the current model are crucial for a reliable solution.

Recommendation

Apply the Persist Current Model to Session Metadata workaround, as it directly addresses the root cause of the issue by ensuring that the current model for each session is stored and can be retrieved after a Gateway restart, thus preventing the repetition of the entire fallback chain.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After Gateway restart, the session should:

  • Resume with the last successfully used model (zhipu/glm-5.1)
  • Not re-attempt already-failed models in the fallback chain
  • Maintain model state persistence across process restarts

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING