hermes - 💡(How to fix) Fix [Bug]: API server returns output-truncation failure as successful assistant message [5 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

In /v1/chat/completions, avoid mapping result["error"] to normal assistant content when result["failed"] or result["partial"] is true. Preserve the failure state in a structured way so clients can handle it correctly.

  • a structured error response; "error": "Response truncated due to output length limit" final_response = result.get("error", "(No response generated)") That loses the partial/error/completed=false semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users.

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

Code Example

Response truncated due to output length limit

---

Response truncated due to output length limit

---

Response remained truncated after 3 continuation attempts

---

choices[0].message.content

---

From source inspection, the agentic loop already detects truncation.

In `run_agent.py`, `AIAgent.run_conversation()` handles `finish_reason == "length"` and attempts continuation. If recovery still fails, it returns a structured result like:


{
  "final_response": None,
  "completed": False,
  "partial": True,
  "error": "Response truncated due to output length limit"
}


However, in `gateway/platforms/api_server.py`, the non-streaming `/v1/chat/completions` handler maps this into normal assistant content:


final_response = result.get("final_response", "")
if not final_response:
    final_response = result.get("error", "(No response generated)")


That loses the `partial/error/completed=false` semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users.

---
RAW_BUFFERClick to expand / collapse

Bug Description

Issue

Hermes agentic loop can detect output truncation and attempt continuation. However, when truncation recovery ultimately fails, the API server currently flattens the failure into a normal /v1/chat/completions response.

Example assistant content:

Response truncated due to output length limit

The response is still returned as a successful chat completion, so API clients treat it as a valid assistant answer instead of a failed or partial run.

Why this matters

Integrations using /v1/chat/completions cannot distinguish a real assistant answer from a failed truncated run. This causes clients to display raw internal failure text to end users.

Suggested fix

In /v1/chat/completions, avoid mapping result["error"] to normal assistant content when result["failed"] or result["partial"] is true. Preserve the failure state in a structured way so clients can handle it correctly.

Steps to Reproduce

  1. Start Hermes API server.
  2. Send a /v1/chat/completions request with a prompt that is likely to exceed the model output limit, for example a very long generation or a complex agentic task requiring a large final answer.
  3. Let the agent run until Hermes detects output truncation and continuation recovery fails.
  4. Observe the API response.

Example observed assistant content:

Response truncated due to output length limit

or:

Response remained truncated after 3 continuation attempts

Expected Behavior

The API should preserve the internal failed/partial state instead of flattening it into normal assistant content.

For example, return one of:

  • a structured error response;
  • a Hermes-specific metadata field/header;
  • partial=true, completed=false, and a machine-readable error_code;
  • or a non-2xx response when the result is not a valid assistant answer.

Actual Behavior

The API returns a successful chat completion response. The truncation failure is placed in:

choices[0].message.content

and the client receives it as if it were a normal assistant answer.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

From source inspection, the agentic loop already detects truncation.

In `run_agent.py`, `AIAgent.run_conversation()` handles `finish_reason == "length"` and attempts continuation. If recovery still fails, it returns a structured result like:


{
  "final_response": None,
  "completed": False,
  "partial": True,
  "error": "Response truncated due to output length limit"
}


However, in `gateway/platforms/api_server.py`, the non-streaming `/v1/chat/completions` handler maps this into normal assistant content:


final_response = result.get("final_response", "")
if not final_response:
    final_response = result.get("error", "(No response generated)")


That loses the `partial/error/completed=false` semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users.

Operating System

Ubuntu

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING