hermes - 💡(How to fix) Fix API server /v1/responses drops agent reasoning from output_items

hermes2026-05-08 03:21:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

In gateway/platforms/api_server.py:

_create_agent (api_server.py:830) builds AIAgent without passing reasoning_callback.
_dispatch (api_server.py:1695) only routes __tool_started__ / __tool_completed__ tags + plain text deltas — no __reasoning__ branch.
_extract_output_items (api_server.py:2563) (batch path) only emits function_call / function_call_output / message.

AIAgent._fire_reasoning_delta exists and is invoked during streaming in run_agent.py, but its target callback is never installed in the api_server platform.

Fix Action

Fix / Workaround

_create_agent (api_server.py:830) builds AIAgent without passing reasoning_callback.
_dispatch (api_server.py:1695) only routes __tool_started__ / __tool_completed__ tags + plain text deltas — no __reasoning__ branch.
_extract_output_items (api_server.py:2563) (batch path) only emits function_call / function_call_output / message.

_create_agent / _run_agent accept reasoning_callback kwarg, forward to AIAgent.
_handle_responses installs an _on_reasoning callback that pushes ("__reasoning__", text) onto the SSE queue.
_dispatch buffers reasoning deltas in reasoning_parts.
New _flush_reasoning() helper emits a single response.output_item.added/done pair (type:'reasoning', with summary + content text parts) before the assistant message item closes — preserving the canonical order function_calls → reasoning → message.
The incomplete-snapshot path (early disconnect) also appends the buffered reasoning, so GET /v1/responses/{id} still surfaces thinking.

Happy to send a PR (already have a working patch + 2 new tests in tests/gateway/test_api_server.py, 140 passed).

RAW_BUFFERClick to expand / collapse

Problem

The api_server platform's /v1/responses endpoint (both SSE streaming and batch paths) emits function_call, function_call_output, and message output items, but never emits the agent's reasoning as a reasoning output_item — even though run_agent.py already produces reasoning text and fires reasoning_callback during streaming.

This breaks downstream consumers that persist output[] from the Responses API and expect the OpenAI Responses spec shape {type:'reasoning', summary, content}. CLI/Telegram/Slack get a preview via _emit_reasoning_preview, but Responses-API clients lose the chain-of-thought entirely.

Reproduction

Run hermes-agent exposing /v1/responses.
Use a model/profile that emits reasoning (e.g. a thinking-enabled OpenAI/Anthropic model).
POST to /v1/responses with stream:true (or stream:false).
Inspect response.completed.output[] (or the GET /v1/responses/{id} snapshot).

Observed: output[] contains only function_call, function_call_output, message. No reasoning item.

Expected: A reasoning item (per OpenAI Responses spec) carrying summary[].summary_text and content[].reasoning_text, ordered before the final message.

Root cause

In gateway/platforms/api_server.py:

_create_agent (api_server.py:830) builds AIAgent without passing reasoning_callback.
_dispatch (api_server.py:1695) only routes __tool_started__ / __tool_completed__ tags + plain text deltas — no __reasoning__ branch.
_extract_output_items (api_server.py:2563) (batch path) only emits function_call / function_call_output / message.

AIAgent._fire_reasoning_delta exists and is invoked during streaming in run_agent.py, but its target callback is never installed in the api_server platform.

Proposed fix

Wire reasoning_callback end-to-end:

_create_agent / _run_agent accept reasoning_callback kwarg, forward to AIAgent.
_handle_responses installs an _on_reasoning callback that pushes ("__reasoning__", text) onto the SSE queue.
_dispatch buffers reasoning deltas in reasoning_parts.
New _flush_reasoning() helper emits a single response.output_item.added/done pair (type:'reasoning', with summary + content text parts) before the assistant message item closes — preserving the canonical order function_calls → reasoning → message.
The incomplete-snapshot path (early disconnect) also appends the buffered reasoning, so GET /v1/responses/{id} still surfaces thinking.

Happy to send a PR (already have a working patch + 2 new tests in tests/gateway/test_api_server.py, 140 passed).

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix API server /v1/responses drops agent reasoning from output_items

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Problem

Reproduction

Root cause

Proposed fix

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix API server /v1/responses drops agent reasoning from output_items

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Problem

Reproduction

Root cause

Proposed fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING