hermes - 💡(How to fix) Fix [Bug]: API server returns output-truncation failure as successful assistant message [5 pull requests]

StepCodex · 2026-05-09T10:50:13Z

[hermes] Bug Description Issue Hermes agentic loop can detect output truncation and attempt continuation. However, when truncation recovery ultimately fails, t… ## Fixed - Fixed by PR: fix(api_server): return 500 on failed/truncated runs instead of flattening to assistant content (https://github.com/NousResearch/hermes-agent/pull/22500) - Fixed by PR: fix(gateway): return OpenAI-style errors for failed agent runs in API server (https://github.com/NousResearch/hermes-agent/pull/22501) - Fixed by PR: fix(api-server): preserve truncation/failure semantics in chat completions (#22496) (https://github.com/NousResearch/hermes-agent/pull/22517) - Fixed by PR: fix(api): set finish_reason and X-Hermes-* headers for failed/partial runs (https://github.com/NousResearch/hermes-agent/pull/22672) - Fixed by PR: fix(api-server): emit length/error finish_reason for truncation/failure (https://github.com/NousResearch/hermes-agent/pull/22775) ### Bug Description **Issue** Hermes agentic loop can detect output truncation and attempt continuation. However, when truncation recovery ultimately fails, the API server currently flattens the failure into a normal `/v1/chat/completions` response. Example assistant content: ```text Response truncated due to output length limit ``` The response is still returned as a successful chat completion, so API clients treat it as a valid assistant answer instead of a failed or partial run. **Why this matters** Integrations using `/v1/chat/completions` cannot distinguish a real assistant answer from a failed truncated run. This causes clients to display raw internal failure text to end users. **Suggested fix** In `/v1/chat/completions`, avoid mapping `result["error"]` to normal assistant content when `result["failed"]` or `result["partial"]` is true. Preserve the failure state in a structured way so clients can handle it correctly. ### Steps to Reproduce 1. Start Hermes API server. 2. Send a `/v1/chat/completions` request with a prompt that is likely to exceed the model output limit, for example a very long generation or a complex agentic task requiring a large final answer. 3. Let the agent run until Hermes detects output truncation and continuation recovery fails. 4. Observe the API response. Example observed assistant content: ```text Response truncated due to output length limit ``` or: ```text Response remained truncated after 3 continuation attempts ``` ### Expected Behavior The API should preserve the internal failed/partial state instead of flattening it into normal assistant content. For example, return one of: - a structured error response; - a Hermes-specific metadata field/header; - `partial=true`, `completed=false`, and a machine-readable `error_code`; - or a non-2xx response when the result is not a valid assistant answer. ### Actual Behavior The API returns a successful chat completion response. The truncation failure is placed in: ```json choices[0].message.content ``` and the client receives it as if it were a normal assistant answer. ### Affected Component Gateway (Telegram/Discord/Slack/WhatsApp) ### Messaging Platform (if gateway-related) _No response_ ### Debug Report ```shell From source inspection, the agentic loop already detects truncation. In `run_agent.py`, `AIAgent.run_conversation()` handles `finish_reason == "length"` and attempts continuation. If recovery still fails, it returns a structured result like: { "final_response": None, "completed": False, "partial": True, "error": "Response truncated due to output length limit" } However, in `gateway/platforms/api_server.py`, the non-streaming `/v1/chat/completions` handler maps this into normal assistant content: final_response = result.get("final_response", "") if not final_response: final_response = result.get("error", "(No response generated)") That loses the `partial/error/completed=false` semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users. ``` ### Operating System Ubuntu ### Python Version _No response_ ### Hermes Version _No response_ ### Additional Logs / Traceback (optional) ```shell ``` ### Root Cause Analysis (optional) _No response_ ### Proposed Fix (optional) _No response_ ### Are you willing to submit a PR for this? - [ ] I'd like to fix this myself and submit a PR

Error Message

In /v1/chat/completions, avoid mapping result["error"] to normal assistant content when result["failed"] or result["partial"] is true. Preserve the failure state in a structured way so clients can handle it correctly.

a structured error response; "error": "Response truncated due to output length limit" final_response = result.get("error", "(No response generated)") That loses the partial/error/completed=false semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users.

Additional Logs / Traceback (optional)

Fix Action

Fixed

Fixed by PR: fix(api_server): return 500 on failed/truncated runs instead of flattening to assistant content (https://github.com/NousResearch/hermes-agent/pull/22500)
Fixed by PR: fix(gateway): return OpenAI-style errors for failed agent runs in API server (https://github.com/NousResearch/hermes-agent/pull/22501)
Fixed by PR: fix(api-server): preserve truncation/failure semantics in chat completions (#22496) (https://github.com/NousResearch/hermes-agent/pull/22517)
Fixed by PR: fix(api): set finish_reason and X-Hermes-* headers for failed/partial runs (https://github.com/NousResearch/hermes-agent/pull/22672)
Fixed by PR: fix(api-server): emit length/error finish_reason for truncation/failure (https://github.com/NousResearch/hermes-agent/pull/22775)

Code Example

Response truncated due to output length limit

---

Response truncated due to output length limit

---

Response remained truncated after 3 continuation attempts

---

choices[0].message.content

---

From source inspection, the agentic loop already detects truncation.

In `run_agent.py`, `AIAgent.run_conversation()` handles `finish_reason == "length"` and attempts continuation. If recovery still fails, it returns a structured result like:


{
  "final_response": None,
  "completed": False,
  "partial": True,
  "error": "Response truncated due to output length limit"
}


However, in `gateway/platforms/api_server.py`, the non-streaming `/v1/chat/completions` handler maps this into normal assistant content:


final_response = result.get("final_response", "")
if not final_response:
    final_response = result.get("error", "(No response generated)")


That loses the `partial/error/completed=false` semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users.

---

Bug Description

Issue

Hermes agentic loop can detect output truncation and attempt continuation. However, when truncation recovery ultimately fails, the API server currently flattens the failure into a normal /v1/chat/completions response.

Example assistant content:

Response truncated due to output length limit

The response is still returned as a successful chat completion, so API clients treat it as a valid assistant answer instead of a failed or partial run.

Why this matters

Integrations using /v1/chat/completions cannot distinguish a real assistant answer from a failed truncated run. This causes clients to display raw internal failure text to end users.

Suggested fix

Steps to Reproduce

Start Hermes API server.
Send a /v1/chat/completions request with a prompt that is likely to exceed the model output limit, for example a very long generation or a complex agentic task requiring a large final answer.
Let the agent run until Hermes detects output truncation and continuation recovery fails.
Observe the API response.

Example observed assistant content:

Response truncated due to output length limit

or:

Response remained truncated after 3 continuation attempts

Expected Behavior

The API should preserve the internal failed/partial state instead of flattening it into normal assistant content.

For example, return one of:

a structured error response;
a Hermes-specific metadata field/header;
partial=true, completed=false, and a machine-readable error_code;
or a non-2xx response when the result is not a valid assistant answer.

Actual Behavior

The API returns a successful chat completion response. The truncation failure is placed in:

choices[0].message.content

and the client receives it as if it were a normal assistant answer.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

From source inspection, the agentic loop already detects truncation.

In `run_agent.py`, `AIAgent.run_conversation()` handles `finish_reason == "length"` and attempts continuation. If recovery still fails, it returns a structured result like:


{
  "final_response": None,
  "completed": False,
  "partial": True,
  "error": "Response truncated due to output length limit"
}


However, in `gateway/platforms/api_server.py`, the non-streaming `/v1/chat/completions` handler maps this into normal assistant content:


final_response = result.get("final_response", "")
if not final_response:
    final_response = result.get("error", "(No response generated)")


That loses the `partial/error/completed=false` semantics. Downstream clients cannot distinguish a real assistant answer from an agent failure, so they display raw internal failure text to end users.

Operating System

Ubuntu

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: API server returns output-truncation failure as successful assistant message [5 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug]: API server returns output-truncation failure as successful assistant message [5 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

RELATED_DISCOVERY

TRENDING