hermes - ✅(Solved) Fix [Bug]: Anthropic empty-content + end_turn responses falsely flagged as invalid, triggering 3 retries and false failures [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14158Fetched 2026-04-23 07:46:33
View on GitHub
Comments
0
Participants
1
Timeline
8
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×3closed ×1

Error Message

  1. Run fails: failed=True, error="Invalid API response after 3 retries: response time 23.0s"

Relevant Logs / Traceback

Root Cause

  1. User: "Add 96 dollar uber"
  2. Assistant turn: patch + patch + text "Done. $96 Uber added. Dave's at $863.47." + memory tool_use (finish_reason=tool_calls)
  3. memory tool_result returns success (~2178 chars)
  4. Next API call → content: [], stop_reason: "end_turn", ~23s response time (extended thinking + nothing to add)
  5. AnthropicTransport.validate_response returns False because content_blocks is empty
  6. Agent retries 3x, each API call returns the same empty end_turn response
  7. Run fails: failed=True, error="Invalid API response after 3 retries: response time 23.0s"
  8. User sees contradictory messages: first the success text, then a red failure message

Fix Action

Fix / Workaround

  1. User: "Add 96 dollar uber"
  2. Assistant turn: patch + patch + text "Done. $96 Uber added. Dave's at $863.47." + memory tool_use (finish_reason=tool_calls)
  3. memory tool_result returns success (~2178 chars)
  4. Next API call → content: [], stop_reason: "end_turn", ~23s response time (extended thinking + nothing to add)
  5. AnthropicTransport.validate_response returns False because content_blocks is empty
  6. Agent retries 3x, each API call returns the same empty end_turn response
  7. Run fails: failed=True, error="Invalid API response after 3 retries: response time 23.0s"
  8. User sees contradictory messages: first the success text, then a red failure message

PR fix notes

PR #14159: fix(agent): accept empty content with stop_reason=end_turn as valid anthropic response

Description (problem / solution / changelog)

What does this PR do?

Fixes a false positive in AnthropicTransport.validate_response that rejects content=[] with stop_reason="end_turn" — a legitimate Anthropic API response meaning "I have nothing more to add." When this shape is rejected, the invalid-response retry path fires 3× and the run fails with "Invalid API response after 3 retries" even though the user-facing work was already completed on the prior turn.

The fix is one conditional: when content_blocks is empty, accept iff stop_reason == "end_turn". Empty content with any other stop_reason (tool_use, max_tokens, refusal, missing) is still rejected — the existing behaviour for those cases is preserved.

No downstream changes are required. normalize_anthropic_response already iterates response.content (empty loop is a no-op: content=None, tool_calls=None) and stop_reason_map already maps end_turn → "stop".

Related Issue

Fixes #14158

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • agent/transports/anthropic.py: validate_response now returns True when content_blocks is empty and stop_reason == "end_turn"; all other empty-content cases still return False.
  • tests/agent/transports/test_transport.py: two new unit tests — test_validate_response_empty_content_with_end_turn_is_valid (the fix) and test_validate_response_empty_content_with_tool_use_is_invalid (regression guard). The existing test_validate_response_empty_content (no stop_reason) continues to pass, confirming the default-invalid path is preserved.

How to Test

source .venv/bin/activate
scripts/run_tests.sh tests/agent/transports/test_transport.py -q

Expected: 22 passed. I also ran the related Anthropic-adjacent suites as a regression check:

scripts/run_tests.sh tests/agent/test_anthropic_adapter.py tests/agent/test_anthropic_normalize_v2.py tests/run_agent/test_anthropic_error_handling.py -q

Expected: 158 passed.

Scope note vs. #14155

Complementary to (not replacing) #14155. #14155 handles the gateway-level UX seam so that genuine failures — ones that actually should surface — read coherently after a visible interim reply. This PR removes the most common false trigger of that seam by preventing the validator from marking valid end_turn responses as invalid in the first place. Both are worth merging.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass — only targeted suites were run (see "How to Test"); the full suite was not run
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS (Darwin 25.4.0)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A; docstring on validate_response explains the new branch
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — pure attribute inspection, platform-agnostic
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

▶ running pytest with 4 workers, hermetic env, in /…/hermes-agent
  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)
......................                                                   [100%]
22 passed, 4 warnings in 0.68s

Broader regression suites:

........................................................................ [ 45%]
........................................................................ [ 91%]
..............                                                           [100%]
158 passed, 4 warnings in 4.40s

Changed files

  • agent/transports/anthropic.py (modified, +8/-2)
  • tests/agent/transports/test_transport.py (modified, +8/-0)

PR #14155: fix(gateway): add failure context after visible interim replies

Description (problem / solution / changelog)

What does this PR do?

Narrow, messaging-only fix for the contradictory UX where the gateway shows a visible interim/streamed assistant message that looks like success, and then the run later fails validation — so the user sees an apparent "success" followed by an invalid-final-response failure with no acknowledgement that the earlier visible message was only intermediate.

When the run's final_response is empty and failed is true and an interim/streamed reply has already reached the user (already_sent=True), the final failure text is now prefixed with:

Note: the earlier message was an intermediate update, not a completed final answer.

The two messages then read coherently as "that was intermediate; here's the real status" instead of "it said it worked, then said it failed." Retry / validation logic is untouched; no adapter changes; no change to when messages are sent.

Related Issue

Fixes #14154

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/run.py: added pure helper _prefix_failure_with_interim_context(response, *, failed, interim_visible) and _INTERIM_FAILURE_PREFIX constant; called from _handle_message_with_agent immediately after the existing block that synthesises a failure string for failed && empty final_response. Idempotent (never double-prefixes).
  • tests/gateway/test_run_progress_topics.py: 4 unit tests on the helper — failure-after-visible-interim is prefixed, failure-without-interim stays plain, prefixing is idempotent, non-failure responses are never prefixed.

How to Test

source .venv/bin/activate
scripts/run_tests.sh tests/gateway/test_run_progress_topics.py -q

Expected output (just ran locally on macOS): 27 passed, 1 warning in 11.50s.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass — only the targeted file tests/gateway/test_run_progress_topics.py was run (27 passed); I have not run the full suite
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS (Darwin 25.4.0)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A (messaging-only change, helper is self-descriptive)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — pure string formatting, platform-agnostic
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Targeted test run:

▶ running pytest with 4 workers, hermetic env, in /…/hermes-agent
  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)
...........................                                              [100%]
27 passed, 1 warning in 11.50s

Changed files

  • gateway/run.py (modified, +24/-0)
  • tests/gateway/test_run_progress_topics.py (modified, +58/-0)

Code Example

Invalid API response after 3 retries: response time 23.0s

---

def validate_response(self, response: Any) -> bool:
    if response is None:
        return False
    content_blocks = getattr(response, "content", None)
    if not isinstance(content_blocks, list):
        return False
    if not content_blocks:
        return False   # ← false positive when stop_reason == "end_turn"
    return True

---

if not content_blocks:
    return getattr(response, "stop_reason", None) == "end_turn"
RAW_BUFFERClick to expand / collapse

Bug Description

AnthropicTransport.validate_response in agent/transports/anthropic.py treats any response with content == [] as invalid. Per Anthropic's API spec, content: [] with stop_reason: "end_turn" is a legitimate "I have nothing more to add" response — the model's canonical way of signalling end-of-turn after a prior assistant turn already delivered the user-facing text alongside a trivial tool call (memory write, logging, etc.).

The validator rejects the response, the agent's retry path fires, each retry legitimately returns the same empty content with end_turn, and after the retry budget is exhausted the run is marked failed=True with "Invalid API response after 3 retries." — even though the actual user-facing work was already completed and delivered.

Steps to Reproduce

Trigger pattern: an assistant turn that emits its final user-facing text alongside a trivial tool call, followed by the tool result, followed by a model response that has nothing more to say.

Real-world repro (Discord session 20260420_113639_77768db5):

  1. User: "Add 96 dollar uber"
  2. Assistant turn: patch + patch + text "Done. $96 Uber added. Dave's at $863.47." + memory tool_use (finish_reason=tool_calls)
  3. memory tool_result returns success (~2178 chars)
  4. Next API call → content: [], stop_reason: "end_turn", ~23s response time (extended thinking + nothing to add)
  5. AnthropicTransport.validate_response returns False because content_blocks is empty
  6. Agent retries 3x, each API call returns the same empty end_turn response
  7. Run fails: failed=True, error="Invalid API response after 3 retries: response time 23.0s"
  8. User sees contradictory messages: first the success text, then a red failure message

Expected Behavior

content == [] with stop_reason == "end_turn" should validate as a legitimate end-of-turn response. The downstream stop-reason map (agent/anthropic_adapter.py:1587–1595) already maps end_turn → "stop", and normalize_anthropic_response already iterates over response.content (empty loop is a no-op, content=None, tool_calls=None, finish_reason="stop") — so no additional downstream changes are needed.

Actual Behavior

validate_response returns False for empty content regardless of stop_reason. The invalid-response retry path in run_agent.py (~line 9400 in the anthropic_messages branch of validation) fires three times, then the run is reported as failed.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Affects any platform — first observed on Discord.

Operating System

Reported on Discord; reporter OS not captured.

Python Version

Not captured.

Hermes Version

Reproducible on current main.

Relevant Logs / Traceback

Invalid API response after 3 retries: response time 23.0s

Indicative validator call site (agent/transports/anthropic.py:89):

def validate_response(self, response: Any) -> bool:
    if response is None:
        return False
    content_blocks = getattr(response, "content", None)
    if not isinstance(content_blocks, list):
        return False
    if not content_blocks:
        return False   # ← false positive when stop_reason == "end_turn"
    return True

Root Cause Analysis (optional)

The validator has no awareness of stop_reason, so it can't distinguish a malformed empty response from a legitimate "nothing more to add." Anthropic's API explicitly allows content: [] when the model has genuinely completed the turn, and the normalizer is already safe for that shape (empty-content loop is a no-op; finish_reason maps cleanly to "stop"). Only the validator is wrong.

Proposed Fix (optional)

In AnthropicTransport.validate_response, treat empty content as valid when stop_reason == "end_turn":

if not content_blocks:
    return getattr(response, "stop_reason", None) == "end_turn"

Everything else stays identical. Empty content with tool_use, max_tokens, refusal, or no stop_reason continues to be rejected.

Scope note vs. #14154 / #14155

#14155 addresses the gateway-level UX seam where the user sees a contradictory "success then failure" message pair. That fix is still useful for genuine failures, but it does not prevent this false positive from firing in the first place. This issue is the upstream root cause — the two are complementary.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The AnthropicTransport.validate_response method should be updated to treat empty content as valid when stop_reason equals "end_turn".

Guidance

  • Review the validate_response method in agent/transports/anthropic.py to understand the current validation logic.
  • Update the method to check for stop_reason when content_blocks is empty, as proposed in the issue.
  • Verify that the updated method correctly handles both legitimate "end-of-turn" responses and malformed empty responses.
  • Test the changes to ensure they do not introduce any regressions or affect other parts of the system.

Example

The proposed fix can be implemented by modifying the validate_response method as follows:

def validate_response(self, response: Any) -> bool:
    if response is None:
        return False
    content_blocks = getattr(response, "content", None)
    if not isinstance(content_blocks, list):
        return False
    if not content_blocks:
        return getattr(response, "stop_reason", None) == "end_turn"
    return True

Notes

This fix assumes that the stop_reason attribute is always present in the response when content_blocks is empty. If this is not the case, additional error handling may be necessary.

Recommendation

Apply the proposed workaround by updating the validate_response method to correctly handle empty content with stop_reason equal to "end_turn", as this will fix the false positive issue without introducing any regressions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: Anthropic empty-content + end_turn responses falsely flagged as invalid, triggering 3 retries and false failures [2 pull requests, 1 participants]