hermes - ✅(Solved) Fix [Bug]: Responses SSE: `response.completed` event exceeds 128KB line limit, breaking Open WebUI for long sessions [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18021Fetched 2026-05-01 05:54:17
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×1

When using the Responses API (POST /v1/responses) with Open WebUI, complex agent sessions with many tool calls cause a LineTooLong error because the final response.completed SSE event exceeds Python's HTTP line limit of 131,072 bytes.

Error Message

When using the Responses API (POST /v1/responses) with Open WebUI, complex agent sessions with many tool calls cause a LineTooLong error because the final response.completed SSE event exceeds Python's HTTP line limit of 131,072 bytes. 4. Open WebUI receives partial stream, then raises:Error processing chat payload: 400, message:

Additional Logs / Traceback (optional)

Root Cause

When using the Responses API (POST /v1/responses) with Open WebUI, complex agent sessions with many tool calls cause a LineTooLong error because the final response.completed SSE event exceeds Python's HTTP line limit of 131,072 bytes.

Fix Action

Workaround

Use Chat Completions API mode for long-running tasks (no monolithic final event). Use Responses mode for short interactive sessions where tool count stays low.

PR fix notes

PR #18034: fix(gateway): compact responses terminal tool outputs

Description (problem / solution / changelog)

Summary

  • Compact streamed Responses API terminal payloads so response.completed does not repeat large tool outputs.
  • Keep the full stored response snapshot intact for GET /v1/responses/{id} / chaining.
  • Add a regression test covering large streamed tool outputs staying below the 128 KiB client line limit.

Root cause

POST /v1/responses streaming accumulated every emitted tool output in emitted_items, then reused that full list in the terminal response.completed SSE event. Long sessions with multiple large tool results could therefore create a single data: line larger than Python/http.client's 131072-byte line limit, even though those tool outputs had already been streamed earlier.

Fix

Before writing terminal response.completed / response.failed events, compact function_call_output items to an omission marker. The persisted response snapshot remains unmodified, so retrieval and previous_response_id behavior still retain the full data.

Regression coverage

The new test simulates two 70KB tool outputs and asserts:

  • the terminal response.completed SSE line is below 131072 bytes;
  • the large output is not repeated in the terminal payload;
  • the full output was still streamed and remains present in the stored response snapshot.

Testing

  • scripts/run_tests.sh tests/gateway/test_api_server.py::TestResponsesStreaming::test_response_completed_omits_repeated_large_tool_outputs -q (RED before fix, GREEN after fix)
  • scripts/run_tests.sh tests/gateway/test_api_server.py::TestResponsesStreaming -q
  • scripts/run_tests.sh tests/gateway/test_api_server.py -q

Closes #18021

Changed files

  • gateway/platforms/api_server.py (modified, +30/-5)
  • tests/gateway/test_api_server.py (modified, +64/-0)

Code Example

============================================================
FULL gateway.log
============================================================

--- hermes dump ---
version:          0.11.0 (2026.4.23) [(unknown)]
os:               Linux 6.19.13-orbstack-gbd1dc07b8cf4 aarch64
python:           3.13.5
openai_sdk:       2.32.0
profile:          default
hermes_home:      ~/.
model:            deepseek-v4-pro
provider:         custom
terminal:         local

api_keys:
  openrouter           not set
  openai               not set
  anthropic            not set
  anthropic_token      not set
  nous                 not set
  google/gemini        not set
  gemini               not set
  glm/zai              not set
  zai                  not set
  kimi                 not set
  minimax              not set
  deepseek             set
  dashscope            not set
  huggingface          not set
  nvidia               not set
  ai_gateway           not set
  opencode_zen         not set
  opencode_go          not set
  kilocode             not set
  firecrawl            not set
  tavily               not set
  browserbase          not set
  fal                  not set
  elevenlabs           not set
  github               not set

features:
  toolsets:           hermes-cli
  mcp_servers:        0
  memory_provider:    holographic
  gateway:            running (docker (foreground), pid 7)
  platforms:          telegram
  cron_jobs:          6 active / 6 total
  skills:             106

config_overrides:
  display.streaming: True
--- end dump ---

---
RAW_BUFFERClick to expand / collapse

Bug Description

Description

When using the Responses API (POST /v1/responses) with Open WebUI, complex agent sessions with many tool calls cause a LineTooLong error because the final response.completed SSE event exceeds Python's HTTP line limit of 131,072 bytes.

Why it's a Hermes issue (not Open WebUI)

  • The 128KB limit comes from Python stdlib http.client._MAXLINE * 2 — not configurable
  • Open WebUI correctly renders the progressive SSE events (response.output_item.added, response.output_text.delta)
  • The full tool outputs were already transmitted during execution via those progressive events
  • The response.completed event only needs summaries/references, not the full payload repeated

Environment

  • hermes-agent: nousresearch/hermes-agent:v2026.4.23 (Docker)
  • Open WebUI: ghcr.io/open-webui/open-webui:main
  • Model: DeepSeek V4 Pro via custom provider
  • Setup: Two Docker containers on same network, Responses API mode

Workaround

Use Chat Completions API mode for long-running tasks (no monolithic final event). Use Responses mode for short interactive sessions where tool count stays low.

Steps to Reproduce

  1. Run hermes-agent with API_SERVER_ENABLED=true, API_SERVER_HOST=0.0.0.0
  2. Connect Open WebUI via Responses API type
  3. Send a multi-step task that triggers ~10+ tool calls with non-trivial outputs (e.g., data gathering, file reading)
  4. Open WebUI receives partial stream, then raises:Error processing chat payload: 400, message: Got more than 131072 bytes when reading: b'data: {"type": "response.completed", ...

Expected Behavior

Show the whole response content.

Actual Behavior

400, message: Got more than 131072 bytes when reading: b'data: {"type": "response.completed", "response": {"id": "resp_65173fe0af9d46cbbf46cba37acc", "object...'.

<!-- Failed to upload "image.png" -->

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

============================================================
FULL gateway.log
============================================================

--- hermes dump ---
version:          0.11.0 (2026.4.23) [(unknown)]
os:               Linux 6.19.13-orbstack-gbd1dc07b8cf4 aarch64
python:           3.13.5
openai_sdk:       2.32.0
profile:          default
hermes_home:      ~/.
model:            deepseek-v4-pro
provider:         custom
terminal:         local

api_keys:
  openrouter           not set
  openai               not set
  anthropic            not set
  anthropic_token      not set
  nous                 not set
  google/gemini        not set
  gemini               not set
  glm/zai              not set
  zai                  not set
  kimi                 not set
  minimax              not set
  deepseek             set
  dashscope            not set
  huggingface          not set
  nvidia               not set
  ai_gateway           not set
  opencode_zen         not set
  opencode_go          not set
  kilocode             not set
  firecrawl            not set
  tavily               not set
  browserbase          not set
  fal                  not set
  elevenlabs           not set
  github               not set

features:
  toolsets:           hermes-cli
  mcp_servers:        0
  memory_provider:    holographic
  gateway:            running (docker (foreground), pid 7)
  platforms:          telegram
  cron_jobs:          6 active / 6 total
  skills:             106

config_overrides:
  display.streaming: True
--- end dump ---

Operating System

docker

Python Version

No response

Hermes Version

Hermes Agent v0.11.0 (2026.4.23)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

In gateway/platforms/api_server.py, _write_sse_responses() accumulates all emitted_items (every function_call + function_call_output with their full arguments and full result text) and packs them into a single response.completed SSE data line. For sessions with many tools or large outputs, this easily exceeds the 128KB single-line limit imposed by Python's http.client.

Proposed Fix (optional)

In _write_sse_responses(), when building final_items for the response.completed event, truncate function_call_output text to a reasonable limit (e.g., first 256 characters + "... [truncated]"). The complete output was already streamed to the client during the session.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Truncating the function_call_output text in the response.completed SSE event can help prevent the LineTooLong error.

Guidance

  • Identify the _write_sse_responses() function in gateway/platforms/api_server.py as the source of the issue.
  • Consider truncating function_call_output text to a reasonable limit (e.g., first 256 characters + "... [truncated]") to prevent exceeding the 128KB single-line limit.
  • Verify that the complete output was already streamed to the client during the session, making truncation a viable workaround.
  • Test the workaround with sessions that previously triggered the LineTooLong error to ensure it resolves the issue.

Example

def _write_sse_responses(self, emitted_items):
    # ...
    final_items = []
    for item in emitted_items:
        # Truncate function_call_output text to 256 characters
        item['function_call_output'] = item['function_call_output'][:256] + '... [truncated]'
        final_items.append(item)
    # ...

Notes

This workaround assumes that the complete output was already streamed to the client during the session, and truncating the function_call_output text will not significantly impact the user experience.

Recommendation

Apply the workaround by truncating the function_call_output text in the response.completed SSE event, as it is a reasonable solution to prevent the LineTooLong error without requiring significant changes to the underlying architecture.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: Responses SSE: `response.completed` event exceeds 128KB line limit, breaking Open WebUI for long sessions [1 pull requests, 1 participants]