hermes - ✅(Solved) Fix [Bug]: Responses SSE: `response.completed` event exceeds 128KB line limit, breaking Open WebUI for long sessions [1 pull requests, 1 participants]

hermes2026-04-30 17:06:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18021•Fetched 2026-05-01 05:54:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

efreykongcn

Participants

efreykongcn

Timeline (top)

labeled ×3cross-referenced ×1

When using the Responses API (POST /v1/responses) with Open WebUI, complex agent sessions with many tool calls cause a LineTooLong error because the final response.completed SSE event exceeds Python's HTTP line limit of 131,072 bytes.

Error Message

Additional Logs / Traceback (optional)

Root Cause

Fix Action

Workaround

Use Chat Completions API mode for long-running tasks (no monolithic final event). Use Responses mode for short interactive sessions where tool count stays low.

PR fix notes

PR #18034: fix(gateway): compact responses terminal tool outputs

Repository: NousResearch/hermes-agent
Author: liuhao1024
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18034

Description (problem / solution / changelog)

Summary

Compact streamed Responses API terminal payloads so response.completed does not repeat large tool outputs.
Keep the full stored response snapshot intact for GET /v1/responses/{id} / chaining.
Add a regression test covering large streamed tool outputs staying below the 128 KiB client line limit.

Root cause

POST /v1/responses streaming accumulated every emitted tool output in emitted_items, then reused that full list in the terminal response.completed SSE event. Long sessions with multiple large tool results could therefore create a single data: line larger than Python/http.client's 131072-byte line limit, even though those tool outputs had already been streamed earlier.

Fix

Before writing terminal response.completed / response.failed events, compact function_call_output items to an omission marker. The persisted response snapshot remains unmodified, so retrieval and previous_response_id behavior still retain the full data.

Regression coverage

The new test simulates two 70KB tool outputs and asserts:

the terminal response.completed SSE line is below 131072 bytes;
the large output is not repeated in the terminal payload;
the full output was still streamed and remains present in the stored response snapshot.

Testing

scripts/run_tests.sh tests/gateway/test_api_server.py::TestResponsesStreaming::test_response_completed_omits_repeated_large_tool_outputs -q (RED before fix, GREEN after fix)
scripts/run_tests.sh tests/gateway/test_api_server.py::TestResponsesStreaming -q
scripts/run_tests.sh tests/gateway/test_api_server.py -q

Closes #18021

Changed files

gateway/platforms/api_server.py (modified, +30/-5)
tests/gateway/test_api_server.py (modified, +64/-0)

Code Example

============================================================
FULL gateway.log
============================================================

--- hermes dump ---
version:          0.11.0 (2026.4.23) [(unknown)]
os:               Linux 6.19.13-orbstack-gbd1dc07b8cf4 aarch64
python:           3.13.5
openai_sdk:       2.32.0
profile:          default
hermes_home:      ~/.
model:            deepseek-v4-pro
provider:         custom
terminal:         local

api_keys:
  openrouter           not set
  openai               not set
  anthropic            not set
  anthropic_token      not set
  nous                 not set
  google/gemini        not set
  gemini               not set
  glm/zai              not set
  zai                  not set
  kimi                 not set
  minimax              not set
  deepseek             set
  dashscope            not set
  huggingface          not set
  nvidia               not set
  ai_gateway           not set
  opencode_zen         not set
  opencode_go          not set
  kilocode             not set
  firecrawl            not set
  tavily               not set
  browserbase          not set
  fal                  not set
  elevenlabs           not set
  github               not set

features:
  toolsets:           hermes-cli
  mcp_servers:        0
  memory_provider:    holographic
  gateway:            running (docker (foreground), pid 7)
  platforms:          telegram
  cron_jobs:          6 active / 6 total
  skills:             106

config_overrides:
  display.streaming: True
--- end dump ---

---

RAW_BUFFERClick to expand / collapse

Bug Description

Description

Why it's a Hermes issue (not Open WebUI)

The 128KB limit comes from Python stdlib http.client._MAXLINE * 2 — not configurable
Open WebUI correctly renders the progressive SSE events (response.output_item.added, response.output_text.delta)
The full tool outputs were already transmitted during execution via those progressive events
The response.completed event only needs summaries/references, not the full payload repeated

Environment

hermes-agent: nousresearch/hermes-agent:v2026.4.23 (Docker)
Open WebUI: ghcr.io/open-webui/open-webui:main
Model: DeepSeek V4 Pro via custom provider
Setup: Two Docker containers on same network, Responses API mode

Workaround

Use Chat Completions API mode for long-running tasks (no monolithic final event). Use Responses mode for short interactive sessions where tool count stays low.

Steps to Reproduce

Run hermes-agent with API_SERVER_ENABLED=true, API_SERVER_HOST=0.0.0.0
Connect Open WebUI via Responses API type
Send a multi-step task that triggers ~10+ tool calls with non-trivial outputs (e.g., data gathering, file reading)
Open WebUI receives partial stream, then raises:Error processing chat payload: 400, message: Got more than 131072 bytes when reading: b'data: {"type": "response.completed", ...

Expected Behavior

Show the whole response content.

Actual Behavior

400, message: Got more than 131072 bytes when reading: b'data: {"type": "response.completed", "response": {"id": "resp_65173fe0af9d46cbbf46cba37acc", "object...'.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

============================================================
FULL gateway.log
============================================================

--- hermes dump ---
version:          0.11.0 (2026.4.23) [(unknown)]
os:               Linux 6.19.13-orbstack-gbd1dc07b8cf4 aarch64
python:           3.13.5
openai_sdk:       2.32.0
profile:          default
hermes_home:      ~/.
model:            deepseek-v4-pro
provider:         custom
terminal:         local

api_keys:
  openrouter           not set
  openai               not set
  anthropic            not set
  anthropic_token      not set
  nous                 not set
  google/gemini        not set
  gemini               not set
  glm/zai              not set
  zai                  not set
  kimi                 not set
  minimax              not set
  deepseek             set
  dashscope            not set
  huggingface          not set
  nvidia               not set
  ai_gateway           not set
  opencode_zen         not set
  opencode_go          not set
  kilocode             not set
  firecrawl            not set
  tavily               not set
  browserbase          not set
  fal                  not set
  elevenlabs           not set
  github               not set

features:
  toolsets:           hermes-cli
  mcp_servers:        0
  memory_provider:    holographic
  gateway:            running (docker (foreground), pid 7)
  platforms:          telegram
  cron_jobs:          6 active / 6 total
  skills:             106

config_overrides:
  display.streaming: True
--- end dump ---

Operating System

docker

Python Version

No response

Hermes Version

Hermes Agent v0.11.0 (2026.4.23)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

In gateway/platforms/api_server.py, _write_sse_responses() accumulates all emitted_items (every function_call + function_call_output with their full arguments and full result text) and packs them into a single response.completed SSE data line. For sessions with many tools or large outputs, this easily exceeds the 128KB single-line limit imposed by Python's http.client.

Proposed Fix (optional)

In _write_sse_responses(), when building final_items for the response.completed event, truncate function_call_output text to a reasonable limit (e.g., first 256 characters + "... [truncated]"). The complete output was already streamed to the client during the session.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Truncating the function_call_output text in the response.completed SSE event can help prevent the LineTooLong error.

Guidance

Identify the _write_sse_responses() function in gateway/platforms/api_server.py as the source of the issue.
Consider truncating function_call_output text to a reasonable limit (e.g., first 256 characters + "... [truncated]") to prevent exceeding the 128KB single-line limit.
Verify that the complete output was already streamed to the client during the session, making truncation a viable workaround.
Test the workaround with sessions that previously triggered the LineTooLong error to ensure it resolves the issue.

Example

def _write_sse_responses(self, emitted_items):
    # ...
    final_items = []
    for item in emitted_items:
        # Truncate function_call_output text to 256 characters
        item['function_call_output'] = item['function_call_output'][:256] + '... [truncated]'
        final_items.append(item)
    # ...

Notes

This workaround assumes that the complete output was already streamed to the client during the session, and truncating the function_call_output text will not significantly impact the user experience.

Recommendation

Apply the workaround by truncating the function_call_output text in the response.completed SSE event, as it is a reasonable solution to prevent the LineTooLong error without requiring significant changes to the underlying architecture.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug]: Responses SSE: `response.completed` event exceeds 128KB line limit, breaking Open WebUI for long sessions [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Fix Action

Workaround

PR fix notes

PR #18034: fix(gateway): compact responses terminal tool outputs

Description (problem / solution / changelog)

Summary

Root cause

Fix

Regression coverage

Testing

Changed files

Code Example

Bug Description

Description

Why it's a Hermes issue (not Open WebUI)

Environment

Workaround

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING