hermes - ✅(Solved) Fix [Bug]: Streaming /v1/responses cannot be recovered after client disconnect because partial/in-progress responses are not persisted [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15026Fetched 2026-04-25 06:24:58
View on GitHub
Comments
2
Participants
2
Timeline
11
Reactions
0
Author
Participants
Timeline (top)
labeled ×3commented ×2cross-referenced ×2mentioned ×2

Error Message

"error": {

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

PR fix notes

PR #15171: fix(api-server): persist response snapshot on client disconnect when store=True

Description (problem / solution / changelog)

Summary

  • persist an initial in_progress response snapshot as soon as the streaming /v1/responses path emits response.created
  • update stored snapshots on terminal completed and failed outcomes before writing the final SSE event
  • write an incomplete snapshot when the SSE client disconnects before completion so stored responses remain retrievable

Root Cause

The streaming /v1/responses implementation only wrote to ResponseStore on the clean response.completed path. If the client disconnected mid-stream, the handler interrupted and cancelled the agent task without persisting the response state, so GET /v1/responses/{id} returned 404 even after a valid response.created event had already exposed the response ID.

Impact

Clients using store=true can now recover streamed responses after disconnects because the response ID always has at least an initial stored record, and disconnects update that record to an incomplete snapshot instead of dropping it entirely.

Validation

  • scripts/run_tests.sh tests/gateway/test_api_server.py -q

Changed files

  • gateway/platforms/api_server.py (modified, +74/-22)

PR #15492: fix(api-server): Implement background response run recovery

Description (problem / solution / changelog)

What does this PR do?

This PR implements true server-side background execution for streaming POST /v1/responses requests.

Previously, a stream=true Responses API request was tightly coupled to the SSE client connection. If the client disconnected, the agent execution could be interrupted and the response could not be reliably recovered. This PR decouples response execution from the SSE subscriber lifecycle so the server continues running the response in the background and allows clients to recover state through GET /v1/responses/{response_id}.

Current limitations

This PR does not attempt to make stream and store fully orthogonal across all /v1/responses modes.

The issue addressed here is specific to stream=true: durable streaming responses should continue running after SSE disconnect and should be recoverable by response ID. This PR therefore unifies the two streaming modes around ResponseRun:

  • stream=true + store=true: durable/background/recoverable
  • stream=true + store=false: ephemeral/connection-owned, cancelled on disconnect

The existing stream=false synchronous path is left unchanged.

Making all four combinations fully orthogonal would require moving non-streaming execution onto ResponseRun as well. That is a larger lifecycle refactor touching cancellation semantics, conversation active-state tracking, idempotency, error handling, tests, and client expectations. Since that is outside the scope of the reported background streaming issue, it is deferred to a future PR.

Related Issue

Fixes #15026

Type of Change

<!-- Check the one that applies. -->
  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

<!-- List the specific changes. Include file paths for code changes. -->
  • Added ResponseRun / ResponseRunManager to own background /v1/responses execution lifecycle.
  • Changed streaming /v1/responses behavior so SSE client disconnects only detach the subscriber and do not cancel the underlying agent run.
  • Added active response recovery through GET /v1/responses/{response_id}.
  • Required store=true for stream=true Responses API requests.
    • stream=true + store=false now returns 400 store_required.
  • Added streaming Idempotency-Key support.
    • Matching retries can reattach to active runs or replay stored responses.
    • Conflicting retries return 409 idempotency_key_conflict.
  • Added bounded queues and concurrent active response run limits.
  • Added response.snapshot SSE event for active-run reattach/idempotent retry.
  • Added SSE keepalive comments for response streams.
  • Added CORS headers for stored streaming response replay.
  • Updated previous_response_id and conversation chaining so only completed responses are accepted.
  • Reworked conversation tracking to separate:
    • latest_completed_response_id
    • active_response_id
  • Added migration support from the legacy conversations.response_id schema.
  • Ensured DELETE and LRU eviction clear conversation references and idempotency mappings.
  • Added shutdown/delete cancellation handling for active background response runs.
  • Removed the legacy private _write_sse_responses(...) path in favor of ResponseRun.

How to Test

<!-- Steps to verify this change works. For bugs: reproduction steps + proof that the fix works. -->

1. Run the targeted test suite

~/.hermes/hermes-agent/venv/bin/python -m pytest -o addopts= tests/gateway/test_api_server.py tests/gateway/test_sse_agent_cancel.py -q

Expected result:

140 passed

2. Verify stream=true + store=false works as ephemeral streaming

Run:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Reply with exactly: ephemeral-ok",
    "stream": true,
    "store": false
  }'

Expected result:

  • HTTP 200
  • SSE stream includes response.created
  • SSE stream includes response.completed
  • stream ends with:
data: [DONE]

Copy the response.id from the stream, then run:

curl -i http://127.0.0.1:8642/v1/responses/<response_id> \
  -H "Authorization: Bearer <API_KEY>"

Expected result:

  • HTTP 404
  • because store=false responses are ephemeral and not recoverable

3. Verify stream=true + store=false rejects Idempotency-Key

Run:

curl -i http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: ephemeral-test-key" \
  -d '{
    "model": "hermes-agent",
    "input": "hello",
    "stream": true,
    "store": false
  }'

Expected result:

  • HTTP 400
  • response error code:
"idempotency_requires_store"

This confirms that idempotent streaming replay/reattach is only available for durable store=true streams.


4. Verify stream=true + store=true survives client disconnect

Start a durable streaming response:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Take a little time, then reply with exactly: background-recovered-ok",
    "stream": true,
    "store": true
  }'

After receiving the first response.created event, stop the client with Ctrl+C.

Expected server behavior:

  • SSE subscriber disconnects
  • agent run continues in the background
  • run is not interrupted just because the client disconnected

Then recover the response:

curl http://127.0.0.1:8642/v1/responses/<response_id> \
  -H "Authorization: Bearer <API_KEY>"

Expected result:

  • while still running, response may show status: "in_progress"
  • after completion, response shows:
"status": "completed"

5. Verify durable streaming idempotency retry

Start a durable streaming request with an idempotency key:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: durable-stream-test-1" \
  -d '{
    "model": "hermes-agent",
    "input": "Reply with exactly: durable-idempotency-ok",
    "stream": true,
    "store": true
  }'

Retry the exact same request with the same Idempotency-Key:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: durable-stream-test-1" \
  -d '{
    "model": "hermes-agent",
    "input": "Reply with exactly: durable-idempotency-ok",
    "stream": true,
    "store": true
  }'

Expected result:

  • retry returns the same response.id
  • if the original run is still active, retry starts with:
event: response.snapshot
  • if the original run already completed, retry streams/replays the stored response

6. Verify idempotency conflict handling

Reuse the same idempotency key with a different request body:

curl -i http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: durable-stream-test-1" \
  -d '{
    "model": "hermes-agent",
    "input": "This is a different request body.",
    "stream": true,
    "store": true
  }'

Expected result:

  • HTTP 409
  • response error code:
"idempotency_key_conflict"

7. Verify conversation behavior for store=false streaming

First create a stored checkpoint in a conversation:

curl http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Remember the codeword pineapple. Reply exactly: checkpoint-ok",
    "conversation": "pr-test-conversation",
    "store": true
  }'

Then send an ephemeral streaming response in the same conversation:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "This is a temporary streamed turn. Reply exactly: ephemeral-conversation-ok",
    "conversation": "pr-test-conversation",
    "stream": true,
    "store": false
  }'

Expected result:

  • request succeeds
  • it may read existing conversation history
  • it does not update latest_completed_response_id
  • it does not set active_response_id
  • it is not recoverable via GET /v1/responses/{response_id}

Then send another stored request in the same conversation:

curl http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "What codeword did I ask you to remember? Reply with only the codeword.",
    "conversation": "pr-test-conversation",
    "store": true
  }'

Expected result:

  • request succeeds
  • conversation continues from the last stored checkpoint, not from the ephemeral streamed turn

8. Verify active conversation protection

Start a durable background stream in a conversation:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Take a little time before answering.",
    "conversation": "active-conversation-test",
    "stream": true,
    "store": true
  }'

While it is still running, send another request to the same conversation:

curl -i http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Follow-up while previous response is active.",
    "conversation": "active-conversation-test",
    "store": true
  }'

Expected result:

  • HTTP 409
  • response error code:
"conversation_response_not_completed"

This confirms that conversation ordering is protected while a durable response is active.

Checklist

<!-- Complete these before requesting review. -->

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Ubuntu 26.04

Documentation & Housekeeping

<!-- Check all that apply. It's OK to check "N/A" if a category doesn't apply to your change. -->
  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Changed files

  • gateway/platforms/api_server.py (modified, +856/-553)
  • tests/gateway/test_api_server.py (modified, +559/-98)

Code Example

{
  "stream": true,
  "store": true
}

---

GET /v1/responses/{response_id}

---

curl http://127.0.0.1:8642/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "hermes-agent",
    "input": "Do something that takes a while or calls tools.",
    "stream": true,
    "store": true
  }'

---

event: response.created
data: {"type":"response.created","response":{"id":"resp_xxx", ...}}

---

response.output_text.delta
response.output_item.added
response.output_item.done

---

curl http://127.0.0.1:8642/v1/responses/resp_xxx \
  -H "Authorization: Bearer $API_KEY"

---

GET /v1/responses/{response_id}

---

{
  "id": "resp_xxx",
  "object": "response",
  "status": "in_progress",
  "output": []
}

---

{
  "error": {
    "message": "Response not found: resp_xxx",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

---

Report       https://paste.rs/veGtu
  agent.log    https://paste.rs/3eRBA
  gateway.log  https://paste.rs/BJmsi

---
RAW_BUFFERClick to expand / collapse

Bug Description

When using the OpenAI-compatible POST /v1/responses endpoint with:

{
  "stream": true,
  "store": true
}

the server currently only persists the response to ResponseStore after the streaming run completes successfully.

If the client disconnects mid-stream — for example app killed, network interruption, foreground/background transition, or SSE connection dropped — Hermes interrupts and cancels the agent task. The partially generated response is not stored, and subsequent:

GET /v1/responses/{response_id}

returns 404 Response not found.

This makes it impossible for clients to implement ChatGPT/Claude-like recovery after stream interruption, even though the client already received a response.created event with a valid response_id.

Steps to Reproduce

  1. Start a streaming response:
curl http://127.0.0.1:8642/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "hermes-agent",
    "input": "Do something that takes a while or calls tools.",
    "stream": true,
    "store": true
  }'
  1. Wait until the client receives:
event: response.created
data: {"type":"response.created","response":{"id":"resp_xxx", ...}}
  1. Optionally wait until some text deltas or tool events arrive:
response.output_text.delta
response.output_item.added
response.output_item.done
  1. Disconnect the client before response.completed.

  2. Try to retrieve the response:

curl http://127.0.0.1:8642/v1/responses/resp_xxx \
  -H "Authorization: Bearer $API_KEY"

Expected Behavior

When store=true, after the server emits response.created, the response should be queryable via:

GET /v1/responses/{response_id}

At minimum, the store should contain a recoverable snapshot with a meaningful status:

{
  "id": "resp_xxx",
  "object": "response",
  "status": "in_progress",
  "output": []
}

During streaming, the stored response should be updated as output arrives:

  • response.output_text.delta
  • response.output_text.done
  • response.output_item.added
  • response.output_item.done
  • response.failed
  • response.completed

If the client disconnects before completion, the server should continue running in background.

For store=true, do not cancel the agent task on SSE client disconnect. Let the response continue to completion and persist the final response to ResponseStore.

This gives ChatGPT/Claude-like recovery:

  1. client disconnects
  2. server continues generation
  3. client relaunches
  4. client calls GET /v1/responses/{response_id}
  5. final response is available once completed

Actual Behavior

GET /v1/responses/{response_id} returns:

{
  "error": {
    "message": "Response not found: resp_xxx",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

The client cannot recover:

  • partial assistant text
  • completed tool calls
  • tool outputs
  • final assistant message

The response id previously emitted by response.created is therefore not useful for recovery unless the stream reaches response.completed.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

Report       https://paste.rs/veGtu
  agent.log    https://paste.rs/3eRBA
  gateway.log  https://paste.rs/BJmsi

Operating System

ubuntu 26.04

Python Version

3.11.15

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

For store=true, decouple the agent task from the SSE connection:

  • SSE connection is only a live transport
  • response execution continues independently
  • ResponseStore is the source of truth
  • GET /v1/responses/{response_id} can return:
    • queued
    • in_progress
    • completed
    • failed
    • incomplete

Client disconnect should not necessarily cancel response execution if the user requested store=true.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Decoupling the agent task from the SSE connection and persisting the response to ResponseStore as soon as response.created is emitted can fix the issue.

Guidance

  • Modify the POST /v1/responses endpoint to store the response in ResponseStore immediately after emitting response.created, regardless of the streaming status.
  • Update the agent task to continue running in the background even if the client disconnects, allowing the response to complete and be persisted to ResponseStore.
  • Implement a mechanism to update the stored response as output arrives during streaming, including response.output_text.delta, response.output_text.done, response.output_item.added, response.output_item.done, response.failed, and response.completed events.
  • Ensure that GET /v1/responses/{response_id} returns the current status of the response, such as queued, in_progress, completed, failed, or incomplete, even if the client disconnected before completion.

Example

# Pseudocode example of storing response in ResponseStore after response.created
def handle_response_created(response_id):
    # Store response in ResponseStore with initial status
    response_store.store(response_id, {"status": "in_progress", "output": []})

# Pseudocode example of updating stored response during streaming
def handle_output_event(response_id, event):
    # Retrieve current response from ResponseStore
    response = response_store.get(response_id)
    # Update response based on event type
    if event.type == "response.output_text.delta":
        response["output"].append(event.data)
    # ...
    # Store updated response in ResponseStore
    response_store.store(response_id, response)

Notes

The proposed fix requires changes to the POST /v1/responses endpoint, the agent task, and the ResponseStore implementation. Additionally, the `GET /v1/responses

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING