One of these should happen reliably: 1. The turn continues until response/tool failure is delivered to the user; or 2. A failed auxiliary `session_search` returns a bounded tool error to the model and the agent can continue without search summaries; or 3. The gateway marks the run failed/cancelled visibly instead of leaving the user with no response while `active_agents=0`.

hermes - ✅(Solved) Fix Gateway turn can silently stop after session_search timeout / initial tools while active_agents=0 [2 pull requests, 1 comments, 1 participants]

hermes2026-05-10 08:12:57

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#23081•Fetched 2026-05-11 03:31:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dudaefj

Participants

dudaefj

Timeline (top)

labeled ×4cross-referenced ×2commented ×1

A Telegram gateway turn can appear to stall/stop after session_search auxiliary timeouts or after initial tool calls. The gateway stays healthy (running, Telegram connected), but the active chat has no response, active_agents returns to 0, and no tool/model subprocess is running.

Observed locally on Hermes Agent v0.13.0 (2026.5.7), commit 498bfc7, running from source on macOS.

Error Message

WARNING agent.auxiliary_client: Auxiliary session_search: connection error on auto and no fallback available (tried: openrouter, nous, local/custom, api-key) WARNING root: Session summarization failed after 3 attempts: Codex auxiliary Responses stream exceeded 30.0s total timeout Traceback (most recent call last): File ".../tools/session_search_tool.py", line 228, in _summarize_session response = await async_call_llm( File ".../agent/auxiliary_client.py", line 4018, in async_call_llm await client.chat.completions.create(**kwargs), task) File ".../agent/auxiliary_client.py", line 853, in create return await asyncio.to_thread(self._sync.create, **kwargs) File ".../asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) TimeoutError: Codex auxiliary Responses stream exceeded 30.0s total timeout

Root Cause

Possible root cause area

Fix Action

Fix / Workaround

WARNING gateway.platforms.base: [Telegram] Cancelled task for agent:main:telegram:dm:378478304 did not exit within 5s; unblocking dispatch and letting the task unwind in the background

PR fix notes

PR #2: fix(gateway): surface incomplete tool turns

Repository: quocanh261997/hermes-agent
Author: quocanh261997
State: closed | merged: False
Link: https://github.com/quocanh261997/hermes-agent/pull/2

Description (problem / solution / changelog)

What does this PR do?

Fixes a gateway silent-stop class where a turn can become inactive after tool activity without delivering a final user-visible response. The gateway now turns incomplete, non-interrupted tool-tail results into an explicit failure message so the user is not left with active_agents=0 and no reply.

It also makes session_search summarization bridge timeouts degrade to raw transcript previews instead of returning a failed tool result. That keeps recall best-effort and lets the main agent continue when auxiliary summarization is slow or unavailable.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

gateway/run.py: normalize incomplete, non-interrupted tool-tail turns into a visible gateway response.
tools/session_search_tool.py: return successful degraded raw-preview results when summarization bridge execution times out.
tests/gateway/test_duplicate_reply_suppression.py: cover incomplete tool-tail response normalization and preserve interrupted stop flow behavior.
tests/tools/test_session_search.py: cover session_search timeout degradation to raw previews.

How to Test

Run focused gateway/session-search regression tests:

scripts/run_tests.sh tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py

Compile changed Python files:

python3 -m py_compile gateway/run.py tools/session_search_tool.py tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py

Check whitespace and Windows footguns for this diff:

git diff --check
python3 scripts/check-windows-footguns.py --diff origin/main

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS Darwin arm64

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A

Screenshots / Logs

Focused validation passed:

scripts/run_tests.sh tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py
63 passed

Additional checks passed:

python3 -m py_compile gateway/run.py tools/session_search_tool.py tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py
git diff --check
python3 scripts/check-windows-footguns.py --diff origin/main

Changed files

gateway/run.py (modified, +27/-0)
tests/gateway/test_duplicate_reply_suppression.py (modified, +46/-0)
tests/tools/test_session_search.py (modified, +43/-0)
tools/session_search_tool.py (modified, +17/-7)

PR #23214: fix(gateway): surface incomplete tool turns

Repository: NousResearch/hermes-agent
Author: quocanh261997
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/23214

Description (problem / solution / changelog)

What does this PR do?

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

gateway/run.py: normalize incomplete, non-interrupted tool-tail turns into a visible gateway response.
tools/session_search_tool.py: return successful degraded raw-preview results when summarization bridge execution times out.
tests/gateway/test_duplicate_reply_suppression.py: cover incomplete tool-tail response normalization and preserve interrupted stop flow behavior.
tests/tools/test_session_search.py: cover session_search timeout degradation to raw previews.

How to Test

Run focused gateway/session-search regression tests:

scripts/run_tests.sh tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py

Compile changed Python files:

python3 -m py_compile gateway/run.py tools/session_search_tool.py tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py

Check whitespace and Windows footguns for this diff:

git diff --check
python3 scripts/check-windows-footguns.py --diff origin/main

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS Darwin arm64

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A

Screenshots / Logs

Focused validation passed:

scripts/run_tests.sh tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py
63 passed

Additional checks passed:

python3 -m py_compile gateway/run.py tools/session_search_tool.py tests/gateway/test_duplicate_reply_suppression.py tests/tools/test_session_search.py
git diff --check
python3 scripts/check-windows-footguns.py --diff origin/main

Changed files

gateway/run.py (modified, +27/-0)
tests/gateway/test_duplicate_reply_suppression.py (modified, +46/-0)
tests/tools/test_session_search.py (modified, +43/-0)
tools/session_search_tool.py (modified, +17/-7)

Code Example

2026-05-10 05:09:53 inbound msg='Olá'
2026-05-10 05:09:57 response ready ... time=3.8s api_calls=1

---

2026-05-10 05:10:26 inbound msg='Quero que você corrija/finalize/teste/valide o plugin+engine+documentação do con...'

---

gateway_state=running
active_agents=0
telegram=connected
session_id=20260510_045801_bc737063
updated_at=2026-05-10T05:10:26.576363
last_prompt_tokens=18435

---

WARNING agent.auxiliary_client: Auxiliary session_search: connection error on auto and no fallback available (tried: openrouter, nous, local/custom, api-key)
WARNING root: Session summarization failed after 3 attempts: Codex auxiliary Responses stream exceeded 30.0s total timeout
Traceback (most recent call last):
  File ".../tools/session_search_tool.py", line 228, in _summarize_session
    response = await async_call_llm(
  File ".../agent/auxiliary_client.py", line 4018, in async_call_llm
    await client.chat.completions.create(**kwargs), task)
  File ".../agent/auxiliary_client.py", line 853, in create
    return await asyncio.to_thread(self._sync.create, **kwargs)
  File ".../asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
TimeoutError: Codex auxiliary Responses stream exceeded 30.0s total timeout

---

WARNING gateway.platforms.base: [Telegram] Cancelled task for agent:main:telegram:dm:378478304 did not exit within 5s; unblocking dispatch and letting the task unwind in the background

---

assistant: Vou fazer isso de ponta a ponta sem restart inicialmente...
tool_calls: todo(...)
tool: todo -> 7 tasks
tool_calls: skill_view(...), search_files(...)
tool: skill_view context-replace-live-debugging.md
tool: skill_view context-replace-customization.md
tool: search_files(...)
tool_calls: read_file(...), read_file(...), search_files(...), search_files(...)

RAW_BUFFERClick to expand / collapse

Summary

Observed locally on Hermes Agent v0.13.0 (2026.5.7), commit 498bfc7, running from source on macOS.

Environment

Hermes Agent: v0.13.0 (2026.5.7)
Commit: 498bfc7
OS: macOS Darwin arm64
Gateway mode: hermes gateway run --replace --accept-hooks under LaunchAgent
Platform: Telegram polling
Model/provider: openai-codex/gpt-5.5
Relevant config:
- agent.max_turns: 1000
- model.context_length: 272000
- compression.threshold: 0.65

What happened

After a gateway restart, a short sanity message worked:

2026-05-10 05:09:53 inbound msg='Olá'
2026-05-10 05:09:57 response ready ... time=3.8s api_calls=1

Then a larger task was sent:

2026-05-10 05:10:26 inbound msg='Quero que você corrija/finalize/teste/valide o plugin+engine+documentação do con...'

After that, the gateway remained healthy but the turn did not continue to completion:

gateway_state=running
active_agents=0
telegram=connected
session_id=20260510_045801_bc737063
updated_at=2026-05-10T05:10:26.576363
last_prompt_tokens=18435

No relevant codex, acpx, execute_code, terminal, or tool subprocess was still running.

Earlier related symptom in same session

Before the restart, asking for context triggered session_search repeatedly and it timed out:

WARNING agent.auxiliary_client: Auxiliary session_search: connection error on auto and no fallback available (tried: openrouter, nous, local/custom, api-key)
WARNING root: Session summarization failed after 3 attempts: Codex auxiliary Responses stream exceeded 30.0s total timeout
Traceback (most recent call last):
  File ".../tools/session_search_tool.py", line 228, in _summarize_session
    response = await async_call_llm(
  File ".../agent/auxiliary_client.py", line 4018, in async_call_llm
    await client.chat.completions.create(**kwargs), task)
  File ".../agent/auxiliary_client.py", line 853, in create
    return await asyncio.to_thread(self._sync.create, **kwargs)
  File ".../asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
TimeoutError: Codex auxiliary Responses stream exceeded 30.0s total timeout

There was also a visible warning after cancelling/stopping a previous turn:

WARNING gateway.platforms.base: [Telegram] Cancelled task for agent:main:telegram:dm:378478304 did not exit within 5s; unblocking dispatch and letting the task unwind in the background

Session transcript evidence

The JSON session store showed that the agent began the larger task, loaded skills, created a todo, and started reading files/searching. Then it stopped without completing/reporting, while active_agents was already 0:

assistant: Vou fazer isso de ponta a ponta sem restart inicialmente...
tool_calls: todo(...)
tool: todo -> 7 tasks
tool_calls: skill_view(...), search_files(...)
tool: skill_view context-replace-live-debugging.md
tool: skill_view context-replace-customization.md
tool: search_files(...)
tool_calls: read_file(...), read_file(...), search_files(...), search_files(...)

The compact JSONL transcript lagged behind the richer session_*.json store: the .jsonl only contained the earlier short exchange and did not include the larger task's tool sequence, while session_*.json did. This may or may not be related.

Expected behavior

One of these should happen reliably:

The turn continues until response/tool failure is delivered to the user; or
A failed auxiliary session_search returns a bounded tool error to the model and the agent can continue without search summaries; or
The gateway marks the run failed/cancelled visibly instead of leaving the user with no response while active_agents=0.

Actual behavior

The user sees no final answer. Operational state suggests no work is running:

active_agents=0
no tool/model subprocess
gateway and Telegram healthy
session updated at inbound time, then no visible completion

Possible root cause area

Likely around one or more of:

tools/session_search_tool.py handling of auxiliary timeout/fallback
cancellation/unwind path after Cancelled task ... did not exit within 5s
gateway turn lifecycle marking run inactive before a terminal user-visible response/failure is persisted
mismatch between session_*.json and .jsonl persistence when a turn stops mid-tool sequence

Privacy

Logs above are redacted to omit message bodies beyond short non-sensitive excerpts and local user identifiers.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

One of these should happen reliably:

The turn continues until response/tool failure is delivered to the user; or
A failed auxiliary session_search returns a bounded tool error to the model and the agent can continue without search summaries; or
The gateway marks the run failed/cancelled visibly instead of leaving the user with no response while active_agents=0.

#api #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Gateway turn can silently stop after session_search timeout / initial tools while active_agents=0 [2 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Possible root cause area

Fix Action

Fix / Workaround

PR fix notes

PR #2: fix(gateway): surface incomplete tool turns

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Changed files

PR #23214: fix(gateway): surface incomplete tool turns

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Changed files

Code Example

Summary

Environment

What happened

Earlier related symptom in same session

Session transcript evidence

Expected behavior

Actual behavior

Possible root cause area

Privacy

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING