hermes - 💡(How to fix) Fix Post-tool-call nudge fires before prefill for structured-reasoning thinking models (e.g. qwen3-vl-8b-thinking via OpenRouter) [1 pull requests]

hermes2026-05-29 14:16:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The post-tool-call empty-response nudge branch fires before the prefill branch for thinking models that emit reasoning via the structured OpenRouter reasoning / reasoning_details field (e.g. qwen/qwen3-vl-8b-thinking on the Alibaba/Parasail providers). The nudge guard only checks _has_inline_thinking (<think> tags in content), not the structured reasoning fields. Result: every tool-using turn for these models triggers ⚠️ Model returned empty after tool calls — nudging to continue, costs an extra LLM round-trip (~3-5 s and ~400 tokens), and the well-meaning "↻ Thinking-only response — prefilling to continue" path is never reached.

Root Cause

In agent/conversation_loop.py (or run_agent.py at v2026.5.16), the nudge branch fires before the prefill branch and its guard list is incomplete:

# Fires first
if (
    _prior_was_tool
    and not getattr(agent, "_post_tool_empty_retried", False)
    and not _has_inline_thinking  # <-- only guards against inline <think>, not structured reasoning
):
    # NUDGE — emits ⚠️ warning, adds synthetic (empty) assistant + nudge user, continues
    ...
    continue

# Unreachable for OpenRouter-style structured reasoning models
_has_structured = bool(
    getattr(assistant_message, "reasoning", None)
    or getattr(assistant_message, "reasoning_content", None)
    or getattr(assistant_message, "reasoning_details", None)
    or _has_inline_thinking
)
if _has_structured and agent._thinking_prefill_retries < 2:
    # PREFILL — never gets here when nudge fires first
    ...

The comment on the nudge guard ("thinking model still working — let prefill handle") already states the intent. The check just needs to be widened to cover the structured-reasoning case too.

Fix Action

Fixed

Fixed by PR: fix(agent): route structured-reasoning empties to prefill, not nudge (https://github.com/NousResearch/hermes-agent/pull/34812)

Code Example

curl -s "https://openrouter.ai/api/v1/chat/completions" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-vl-8b-thinking",
    "provider": {"order": ["alibaba", "parasail"], "allow_fallbacks": false},
    "reasoning": {"enabled": true, "effort": "medium"},
    "stream": true,
    "messages": [
      {"role": "user", "content": "what is the weather in Casablanca?"},
      {"role": "assistant", "content": "", "tool_calls": [{"id": "call_1", "type": "function", "function": {"name": "web_search", "arguments": "{\"query\":\"weather Casablanca\"}"}}]},
      {"role": "tool", "tool_call_id": "call_1", "content": "Casablanca 22 C sunny."}
    ],
    "tools": [{"type": "function", "function": {"name": "web_search", "description": "Search", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}]
  }'

---

# Fires first
if (
    _prior_was_tool
    and not getattr(agent, "_post_tool_empty_retried", False)
    and not _has_inline_thinking  # <-- only guards against inline <think>, not structured reasoning
):
    # NUDGE — emits ⚠️ warning, adds synthetic (empty) assistant + nudge user, continues
    ...
    continue

# Unreachable for OpenRouter-style structured reasoning models
_has_structured = bool(
    getattr(assistant_message, "reasoning", None)
    or getattr(assistant_message, "reasoning_content", None)
    or getattr(assistant_message, "reasoning_details", None)
    or _has_inline_thinking
)
if _has_structured and agent._thinking_prefill_retries < 2:
    # PREFILL — never gets here when nudge fires first
    ...

---

_has_structured = bool(
    getattr(assistant_message, "reasoning", None)
    or getattr(assistant_message, "reasoning_content", None)
    or getattr(assistant_message, "reasoning_details", None)
    or _has_inline_thinking
)
if (
    _prior_was_tool
    and not getattr(agent, "_post_tool_empty_retried", False)
    and not _has_structured  # widened from _has_inline_thinking
):
    # NUDGE
    ...

---

12:32:45 INFO API call #2: ... in=16346 out=360 total=16706 latency=3.9s  (tool call)
12:32:46 INFO tool web_search completed (1.29s, 2164 chars)
12:32:48 INFO API call #3: ... in=17514 out=36  total=17550 latency=1.2s  (reasoning-only)
12:32:48 INFO Empty response after tool calls — nudging model to continue processing
12:32:52 INFO API call #4: ... in=17225 out=404 total=17629 latency=4.5s  (final answer after nudge)

RAW_BUFFERClick to expand / collapse

Summary

Affected versions

v2026.5.16 — run_agent.py:15444-15497
v2026.5.29 — agent/conversation_loop.py:3997-4050 (refactored, same logic)

Reproducer (no Hermes needed)

Direct OpenRouter call against the same model + provider + reasoning settings Hermes uses:

curl -s "https://openrouter.ai/api/v1/chat/completions" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-vl-8b-thinking",
    "provider": {"order": ["alibaba", "parasail"], "allow_fallbacks": false},
    "reasoning": {"enabled": true, "effort": "medium"},
    "stream": true,
    "messages": [
      {"role": "user", "content": "what is the weather in Casablanca?"},
      {"role": "assistant", "content": "", "tool_calls": [{"id": "call_1", "type": "function", "function": {"name": "web_search", "arguments": "{\"query\":\"weather Casablanca\"}"}}]},
      {"role": "tool", "tool_call_id": "call_1", "content": "Casablanca 22 C sunny."}
    ],
    "tools": [{"type": "function", "function": {"name": "web_search", "description": "Search", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}]
  }'

Streaming deltas arrive with delta.reasoning: "Okay, ..." and delta.content: "", then later chunks contain the final answer in delta.content. The Hermes streaming code (run_agent.py:8294) correctly accumulates these into assistant_message.reasoning_content. But the empty-detection ordering in the conversation loop prevents the prefill branch from running.

Root cause

In agent/conversation_loop.py (or run_agent.py at v2026.5.16), the nudge branch fires before the prefill branch and its guard list is incomplete:

# Fires first
if (
    _prior_was_tool
    and not getattr(agent, "_post_tool_empty_retried", False)
    and not _has_inline_thinking  # <-- only guards against inline <think>, not structured reasoning
):
    # NUDGE — emits ⚠️ warning, adds synthetic (empty) assistant + nudge user, continues
    ...
    continue

# Unreachable for OpenRouter-style structured reasoning models
_has_structured = bool(
    getattr(assistant_message, "reasoning", None)
    or getattr(assistant_message, "reasoning_content", None)
    or getattr(assistant_message, "reasoning_details", None)
    or _has_inline_thinking
)
if _has_structured and agent._thinking_prefill_retries < 2:
    # PREFILL — never gets here when nudge fires first
    ...

The comment on the nudge guard ("thinking model still working — let prefill handle") already states the intent. The check just needs to be widened to cover the structured-reasoning case too.

Suggested fix

Hoist the _has_structured calculation above the nudge branch and include it in the guard:

_has_structured = bool(
    getattr(assistant_message, "reasoning", None)
    or getattr(assistant_message, "reasoning_content", None)
    or getattr(assistant_message, "reasoning_details", None)
    or _has_inline_thinking
)
if (
    _prior_was_tool
    and not getattr(agent, "_post_tool_empty_retried", False)
    and not _has_structured  # widened from _has_inline_thinking
):
    # NUDGE
    ...

Observed evidence

Hermes v2026.5.16 against qwen/qwen3-vl-8b-thinking via OpenRouter, provider pinned [alibaba, parasail], reasoning_effort: medium, on a Telegram session asking "what's the weather in Imouzzer Kandar?":

12:32:45 INFO API call #2: ... in=16346 out=360 total=16706 latency=3.9s  (tool call)
12:32:46 INFO tool web_search completed (1.29s, 2164 chars)
12:32:48 INFO API call #3: ... in=17514 out=36  total=17550 latency=1.2s  (reasoning-only)
12:32:48 INFO Empty response after tool calls — nudging model to continue processing
12:32:52 INFO API call #4: ... in=17225 out=404 total=17629 latency=4.5s  (final answer after nudge)

API call #3 returned the structured reasoning field per the OpenRouter spec; the prefill branch would have picked it up if it had been reached.

Impact

Visible: ⚠️ Model returned empty after tool calls — nudging to continue on every tool turn in Telegram/Discord/etc.
Hidden: ~1 extra OpenRouter request per tool turn (~3-5 s latency, ~400 extra completion tokens billable).
Self-recovers, so functional behavior is preserved.

Happy to PR if useful — looks like a one-line guard change.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering