hermes - ✅(Solved) Fix [Bug]: Model switching corrupts session/request state with Hindsight enabled, causing retry loop and IndexError [1 pull requests, 1 comments, 2 participants]

hermes2026-05-14 00:50:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#25325•Fetched 2026-05-14 03:47:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ashanzzz

Participants

alt-glitch

ashanzzz

Timeline (top)

labeled ×4referenced ×2commented ×1cross-referenced ×1

Error Message

⚠️ API call failed (attempt 6/100): IndexError 🔌 Provider: custom Model: gpt-5.4 🌐 Endpoint: local AIAPI aggregator 📝 Error: list index out of range ⏳ Retrying in 69.0s (attempt 6/100)...

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

Fixed by PR: fix: reset model-switch recovery state and include follow-up cleanup (https://github.com/NousResearch/hermes-agent/pull/25343)

PR fix notes

PR #25343: fix: reset model-switch recovery state and include follow-up cleanup

Repository: NousResearch/hermes-agent
Author: NeroNarada
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25343

Description (problem / solution / changelog)

Summary

Reset turn-scoped recovery state in AIAgent.switch_model to prevent stale retry counters/flags and pending request state from leaking into the first turn after a provider/model change.
Include follow-up auth and test updates in this PR:
- hermes_cli/copilot_auth.py
- hermes_cli/models.py
- tests/hermes_cli/test_api_key_providers.py
- tests/hermes_cli/test_copilot_auth.py
- tests/hermes_cli/test_copilot_catalog_oauth_fallback.py
Added PR handoff notes in .plans/2026-05-14-pr-cleanup.md for reviewer context.

Issue

Fixes #25325.

Testing

Added regression test: test_switch_model_clears_stale_recovery_state_for_next_turn.
No full test suite run in this step.

Changed files

.plans/2026-05-14-pr-cleanup.md (added, +23/-0)
hermes_cli/copilot_auth.py (modified, +1/-71)
hermes_cli/models.py (modified, +2/-3)
run_agent.py (modified, +21/-0)
tests/hermes_cli/test_api_key_providers.py (modified, +24/-52)
tests/hermes_cli/test_copilot_auth.py (modified, +1/-24)
tests/hermes_cli/test_copilot_catalog_oauth_fallback.py (modified, +2/-3)
tests/run_agent/test_switch_model_fallback_prune.py (modified, +49/-0)

Code Example

⚠️  API call failed (attempt 6/100): IndexError
   🔌 Provider: custom  Model: gpt-5.4
   🌐 Endpoint: local AIAPI aggregator
   📝 Error: list index out of range
⏳ Retrying in 69.0s (attempt 6/100)...

---

HermesAgent version: Hermes Agent v0.13.0 (2026.5.7)
Project: /opt/hermes
Python: 3.13.5
OpenAI SDK: 2.33.0
Runtime: Docker container
Working directory: /opt/hermes
Provider: custom:aiapi
Endpoint: local OpenAI-compatible AIAPI aggregation service
Default model: gpt-5.4
Memory: Hindsight
Interface: CLI / gateway

---

◆ Model
  Model:        {'provider': 'custom:aiapi', 'default': 'gpt-5.4'}
  Reasoning:    off
  Model:        gpt-5.4
  Provider:     custom:aiapi

---

"reasoning": {
  "effort": "xhigh",
  "summary": "auto"
},
"include": [
  "reasoning.encrypted_content"
],
"tool_choice": "auto",
"parallel_tool_calls": true

---

hindsight_retain
hindsight_recall
hindsight_reflect

---

Provider: custom:aiapi
   Endpoint: local AIAPI aggregation service
   Default model: gpt-5.4

---

IndexError: list index out of range

---

⚠️  API call failed (attempt 6/100): IndexError
   🔌 Provider: custom  Model: gpt-5.4
   🌐 Endpoint: local AIAPI aggregator
   📝 Error: list index out of range
⏳ Retrying in 69.0s (attempt 6/100)...

---

⚡ Interrupting agent... (press Ctrl+C again to force exit)
⚡ Interrupt detected during retry wait, aborting.

⚕ gpt-5.4 │ 0/1.1M │ [░░░░░░░░░░] 0% │ 3m │ ⏱ 2m 52s

---

Resume this session with:
  hermes --resume 20260514_083429_b52751

Session:        20260514_083429_b52751
Duration:       3m 20s
Messages:       1 (1 user, 0 tool calls)

---

telegram.error.NetworkError: httpx.ConnectError
[Telegram] Telegram polling reconnect failed: httpx.ConnectError

---

API call failed
Provider: custom
Model: gpt-5.4
Error: list index out of range

---

Report       https://paste.rs/3b5kB
  agent.log    https://paste.rs/gTHkw
  gateway.log  https://paste.rs/9xqUA

---

Additional visible runtime error:


⚠️  API call failed (attempt 6/100): IndexError
   🔌 Provider: custom  Model: gpt-5.4
   🌐 Endpoint: local AIAPI aggregator
   📝 Error: list index out of range
⏳ Retrying in 69.0s (attempt 6/100)...


The local AIAPI aggregation endpoint was tested separately and works normally. `gpt-5.4` also works before switching models. The failure starts after switching from `gpt-5.4` to another test model, and switching back to `gpt-5.4` does not recover the session.

---

Reasoning: off

---

"reasoning": {
  "effort": "xhigh",
  "summary": "auto"
},
"include": [
  "reasoning.encrypted_content"
]

RAW_BUFFERClick to expand / collapse

Bug Description

After switching models inside an existing HermesAgent session, the session becomes unusable.

gpt-5.4 works normally before switching models. My local OpenAI-compatible AIAPI aggregation endpoint has also been tested separately and can chat with the models normally, so this does not appear to be an endpoint connectivity issue.

The issue starts after switching from gpt-5.4 to another test model. After that, HermesAgent begins failing with a retry loop. Switching back to gpt-5.4 does not recover the session.

I am using Hindsight as memory.

The visible error is:

⚠️  API call failed (attempt 6/100): IndexError
   🔌 Provider: custom  Model: gpt-5.4
   🌐 Endpoint: local AIAPI aggregator
   📝 Error: list index out of range
⏳ Retrying in 69.0s (attempt 6/100)...

Relevant environment:

HermesAgent version: Hermes Agent v0.13.0 (2026.5.7)
Project: /opt/hermes
Python: 3.13.5
OpenAI SDK: 2.33.0
Runtime: Docker container
Working directory: /opt/hermes
Provider: custom:aiapi
Endpoint: local OpenAI-compatible AIAPI aggregation service
Default model: gpt-5.4
Memory: Hindsight
Interface: CLI / gateway

Relevant config output:

◆ Model
  Model:        {'provider': 'custom:aiapi', 'default': 'gpt-5.4'}
  Reasoning:    off
  Model:        gpt-5.4
  Provider:     custom:aiapi

One suspicious detail is that config says reasoning is off, but the actual request payload still appears to contain reasoning-related fields:

"reasoning": {
  "effort": "xhigh",
  "summary": "auto"
},
"include": [
  "reasoning.encrypted_content"
],
"tool_choice": "auto",
"parallel_tool_calls": true

The request also contains Hindsight memory tools:

hindsight_retain
hindsight_recall
hindsight_reflect

This looks like a model-switching/session-state/request-construction issue rather than a raw model or endpoint failure.

Steps to Reproduce

Run HermesAgent v0.13.0 in Docker.
Configure a custom OpenAI-compatible provider pointing to a local AIAPI aggregation service:
```
Provider: custom:aiapi
Endpoint: local AIAPI aggregation service
Default model: gpt-5.4
```
Enable/use Hindsight memory.
Start a Hermes session using gpt-5.4.
Confirm that gpt-5.4 can chat normally.
Switch from gpt-5.4 to another test model inside the existing session.
Send a normal message.
HermesAgent starts failing/retrying.
Switch back to gpt-5.4.
Send another normal message.
HermesAgent still fails and retries instead of recovering.

Expected Behavior

After switching models, HermesAgent should either:

continue the conversation normally with the selected model,
clear/sanitize model-specific session state before sending the next request, or
return a clear compatibility/configuration error.

Switching back to a previously working model such as gpt-5.4 should recover the session if the endpoint and model are working.

HermesAgent should not retry the same failing request up to 100 times with only:

IndexError: list index out of range

Actual Behavior

HermesAgent enters a retry loop:

⚠️  API call failed (attempt 6/100): IndexError
   🔌 Provider: custom  Model: gpt-5.4
   🌐 Endpoint: local AIAPI aggregator
   📝 Error: list index out of range
⏳ Retrying in 69.0s (attempt 6/100)...

After interrupting:

⚡ Interrupting agent... (press Ctrl+C again to force exit)
⚡ Interrupt detected during retry wait, aborting.

⚕ gpt-5.4 │ 0/1.1M │ [░░░░░░░░░░] 0% │ 3m │ ⏱ 2m 52s

After force exit:

Resume this session with:
  hermes --resume 20260514_083429_b52751

Session:        20260514_083429_b52751
Duration:       3m 20s
Messages:       1 (1 user, 0 tool calls)

There are also unrelated Telegram gateway warnings in the logs, such as:

telegram.error.NetworkError: httpx.ConnectError
[Telegram] Telegram polling reconnect failed: httpx.ConnectError

Those appear separate from the agent API failure. The model failure happens in the agent API call path and shows:

API call failed
Provider: custom
Model: gpt-5.4
Error: list index out of range

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

Debug Report

Report       https://paste.rs/3b5kB
  agent.log    https://paste.rs/gTHkw
  gateway.log  https://paste.rs/9xqUA

Operating System

Ubuntu 24.04 in Docker

Python Version

3.13.5

Hermes Version

Hermes Agent v0.13.0 (2026.5.7)

Additional Logs / Traceback (optional)

Additional visible runtime error:


⚠️  API call failed (attempt 6/100): IndexError
   🔌 Provider: custom  Model: gpt-5.4
   🌐 Endpoint: local AIAPI aggregator
   📝 Error: list index out of range
⏳ Retrying in 69.0s (attempt 6/100)...


The local AIAPI aggregation endpoint was tested separately and works normally. `gpt-5.4` also works before switching models. The failure starts after switching from `gpt-5.4` to another test model, and switching back to `gpt-5.4` does not recover the session.

Root Cause Analysis (optional)

I have not identified the exact source line.

My current hypothesis is that HermesAgent keeps stale model-specific session/request state after model switching. This may interact with Hindsight memory tools and reasoning/encrypted reasoning fields.

One suspicious detail is that config reports:

Reasoning: off

but the actual request payload appears to include:

"reasoning": {
  "effort": "xhigh",
  "summary": "auto"
},
"include": [
  "reasoning.encrypted_content"
]

So the issue may be related to stale reasoning state, request construction, or response parsing after a model switch.

Proposed Fix (optional)

Possible areas to check:

Clear or sanitize model-specific session state when switching models.
Avoid sending reasoning / include: ["reasoning.encrypted_content"] when config says reasoning is off.
Avoid carrying stale encrypted reasoning / thinking state across model switches.
Surface the raw provider response or malformed response instead of only IndexError: list index out of range.
Avoid retrying the same malformed request up to 100 times.
Check whether Hindsight memory tools interact badly with model switching and custom OpenAI-compatible providers.
Make switching back to a previously working model recover the session, or clearly explain that the existing session must be reset.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #training loop #response parsing #runtime error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug]: Model switching corrupts session/request state with Hindsight enabled, causing retry loop and IndexError [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

PR fix notes

PR #25343: fix: reset model-switch recovery state and include follow-up cleanup

Description (problem / solution / changelog)

Summary

Issue

Testing

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

RELATED_DISCOVERY

TRENDING