hermes - ✅(Solved) Fix [Bug]: Streaming /v1/responses cannot be recovered after client disconnect because partial/in-progress responses are not persisted [2 pull requests, 2 comments, 2 participants]

hermes2026-04-24 09:31:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15026•Fetched 2026-04-25 06:24:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

zhboner

Participants

briandevans

zhboner

Timeline (top)

labeled ×3commented ×2cross-referenced ×2mentioned ×2

Error Message

"error": {

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

Fixed by PR: fix(api-server): persist response snapshot on client disconnect when store=True (https://github.com/NousResearch/hermes-agent/pull/15171)

PR fix notes

PR #15171: fix(api-server): persist response snapshot on client disconnect when store=True

Repository: NousResearch/hermes-agent
Author: UgwujaGeorge
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15171

Description (problem / solution / changelog)

Summary

persist an initial in_progress response snapshot as soon as the streaming /v1/responses path emits response.created
update stored snapshots on terminal completed and failed outcomes before writing the final SSE event
write an incomplete snapshot when the SSE client disconnects before completion so stored responses remain retrievable

Root Cause

The streaming /v1/responses implementation only wrote to ResponseStore on the clean response.completed path. If the client disconnected mid-stream, the handler interrupted and cancelled the agent task without persisting the response state, so GET /v1/responses/{id} returned 404 even after a valid response.created event had already exposed the response ID.

Impact

Clients using store=true can now recover streamed responses after disconnects because the response ID always has at least an initial stored record, and disconnects update that record to an incomplete snapshot instead of dropping it entirely.

Validation

scripts/run_tests.sh tests/gateway/test_api_server.py -q

Changed files

gateway/platforms/api_server.py (modified, +74/-22)

PR #15492: fix(api-server): Implement background response run recovery

Repository: NousResearch/hermes-agent
Author: zhboner
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15492

Description (problem / solution / changelog)

What does this PR do?

This PR implements true server-side background execution for streaming POST /v1/responses requests.

Previously, a stream=true Responses API request was tightly coupled to the SSE client connection. If the client disconnected, the agent execution could be interrupted and the response could not be reliably recovered. This PR decouples response execution from the SSE subscriber lifecycle so the server continues running the response in the background and allows clients to recover state through GET /v1/responses/{response_id}.

Current limitations

This PR does not attempt to make stream and store fully orthogonal across all /v1/responses modes.

The issue addressed here is specific to stream=true: durable streaming responses should continue running after SSE disconnect and should be recoverable by response ID. This PR therefore unifies the two streaming modes around ResponseRun:

stream=true + store=true: durable/background/recoverable
stream=true + store=false: ephemeral/connection-owned, cancelled on disconnect

The existing stream=false synchronous path is left unchanged.

Making all four combinations fully orthogonal would require moving non-streaming execution onto ResponseRun as well. That is a larger lifecycle refactor touching cancellation semantics, conversation active-state tracking, idempotency, error handling, tests, and client expectations. Since that is outside the scope of the reported background streaming issue, it is deferred to a future PR.

Related Issue

Fixes #15026

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Added ResponseRun / ResponseRunManager to own background /v1/responses execution lifecycle.
Changed streaming /v1/responses behavior so SSE client disconnects only detach the subscriber and do not cancel the underlying agent run.
Added active response recovery through GET /v1/responses/{response_id}.
Required store=true for stream=true Responses API requests.
- stream=true + store=false now returns 400 store_required.
Added streaming Idempotency-Key support.
- Matching retries can reattach to active runs or replay stored responses.
- Conflicting retries return 409 idempotency_key_conflict.
Added bounded queues and concurrent active response run limits.
Added response.snapshot SSE event for active-run reattach/idempotent retry.
Added SSE keepalive comments for response streams.
Added CORS headers for stored streaming response replay.
Updated previous_response_id and conversation chaining so only completed responses are accepted.
Reworked conversation tracking to separate:
- latest_completed_response_id
- active_response_id
Added migration support from the legacy conversations.response_id schema.
Ensured DELETE and LRU eviction clear conversation references and idempotency mappings.
Added shutdown/delete cancellation handling for active background response runs.
Removed the legacy private _write_sse_responses(...) path in favor of ResponseRun.

How to Test

1. Run the targeted test suite

~/.hermes/hermes-agent/venv/bin/python -m pytest -o addopts= tests/gateway/test_api_server.py tests/gateway/test_sse_agent_cancel.py -q

Expected result:

140 passed

2. Verify `stream=true + store=false` works as ephemeral streaming

Run:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Reply with exactly: ephemeral-ok",
    "stream": true,
    "store": false
  }'

Expected result:

HTTP 200
SSE stream includes response.created
SSE stream includes response.completed
stream ends with:

data: [DONE]

Copy the response.id from the stream, then run:

curl -i http://127.0.0.1:8642/v1/responses/<response_id> \
  -H "Authorization: Bearer <API_KEY>"

Expected result:

HTTP 404
because store=false responses are ephemeral and not recoverable

3. Verify `stream=true + store=false` rejects `Idempotency-Key`

Run:

curl -i http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: ephemeral-test-key" \
  -d '{
    "model": "hermes-agent",
    "input": "hello",
    "stream": true,
    "store": false
  }'

Expected result:

HTTP 400
response error code:

"idempotency_requires_store"

This confirms that idempotent streaming replay/reattach is only available for durable store=true streams.

4. Verify `stream=true + store=true` survives client disconnect

Start a durable streaming response:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Take a little time, then reply with exactly: background-recovered-ok",
    "stream": true,
    "store": true
  }'

After receiving the first response.created event, stop the client with Ctrl+C.

Expected server behavior:

SSE subscriber disconnects
agent run continues in the background
run is not interrupted just because the client disconnected

Then recover the response:

curl http://127.0.0.1:8642/v1/responses/<response_id> \
  -H "Authorization: Bearer <API_KEY>"

Expected result:

while still running, response may show status: "in_progress"
after completion, response shows:

"status": "completed"

5. Verify durable streaming idempotency retry

Start a durable streaming request with an idempotency key:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: durable-stream-test-1" \
  -d '{
    "model": "hermes-agent",
    "input": "Reply with exactly: durable-idempotency-ok",
    "stream": true,
    "store": true
  }'

Retry the exact same request with the same Idempotency-Key:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: durable-stream-test-1" \
  -d '{
    "model": "hermes-agent",
    "input": "Reply with exactly: durable-idempotency-ok",
    "stream": true,
    "store": true
  }'

Expected result:

retry returns the same response.id
if the original run is still active, retry starts with:

event: response.snapshot

if the original run already completed, retry streams/replays the stored response

6. Verify idempotency conflict handling

Reuse the same idempotency key with a different request body:

curl -i http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: durable-stream-test-1" \
  -d '{
    "model": "hermes-agent",
    "input": "This is a different request body.",
    "stream": true,
    "store": true
  }'

Expected result:

HTTP 409
response error code:

"idempotency_key_conflict"

7. Verify conversation behavior for `store=false` streaming

First create a stored checkpoint in a conversation:

curl http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Remember the codeword pineapple. Reply exactly: checkpoint-ok",
    "conversation": "pr-test-conversation",
    "store": true
  }'

Then send an ephemeral streaming response in the same conversation:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "This is a temporary streamed turn. Reply exactly: ephemeral-conversation-ok",
    "conversation": "pr-test-conversation",
    "stream": true,
    "store": false
  }'

Expected result:

request succeeds
it may read existing conversation history
it does not update latest_completed_response_id
it does not set active_response_id
it is not recoverable via GET /v1/responses/{response_id}

Then send another stored request in the same conversation:

curl http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "What codeword did I ask you to remember? Reply with only the codeword.",
    "conversation": "pr-test-conversation",
    "store": true
  }'

Expected result:

request succeeds
conversation continues from the last stored checkpoint, not from the ephemeral streamed turn

8. Verify active conversation protection

Start a durable background stream in a conversation:

curl -N http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Take a little time before answering.",
    "conversation": "active-conversation-test",
    "stream": true,
    "store": true
  }'

While it is still running, send another request to the same conversation:

curl -i http://127.0.0.1:8642/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hermes-agent",
    "input": "Follow-up while previous response is active.",
    "conversation": "active-conversation-test",
    "store": true
  }'

Expected result:

HTTP 409
response error code:

"conversation_response_not_completed"

This confirms that conversation ordering is protected while a durable response is active.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: Ubuntu 26.04

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Changed files

gateway/platforms/api_server.py (modified, +856/-553)
tests/gateway/test_api_server.py (modified, +559/-98)

Code Example

{
  "stream": true,
  "store": true
}

---

GET /v1/responses/{response_id}

---

curl http://127.0.0.1:8642/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "hermes-agent",
    "input": "Do something that takes a while or calls tools.",
    "stream": true,
    "store": true
  }'

---

event: response.created
data: {"type":"response.created","response":{"id":"resp_xxx", ...}}

---

response.output_text.delta
response.output_item.added
response.output_item.done

---

curl http://127.0.0.1:8642/v1/responses/resp_xxx \
  -H "Authorization: Bearer $API_KEY"

---

GET /v1/responses/{response_id}

---

{
  "id": "resp_xxx",
  "object": "response",
  "status": "in_progress",
  "output": []
}

---

{
  "error": {
    "message": "Response not found: resp_xxx",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

---

Report       https://paste.rs/veGtu
  agent.log    https://paste.rs/3eRBA
  gateway.log  https://paste.rs/BJmsi

---

RAW_BUFFERClick to expand / collapse

Bug Description

When using the OpenAI-compatible POST /v1/responses endpoint with:

{
  "stream": true,
  "store": true
}

the server currently only persists the response to ResponseStore after the streaming run completes successfully.

If the client disconnects mid-stream — for example app killed, network interruption, foreground/background transition, or SSE connection dropped — Hermes interrupts and cancels the agent task. The partially generated response is not stored, and subsequent:

GET /v1/responses/{response_id}

returns 404 Response not found.

This makes it impossible for clients to implement ChatGPT/Claude-like recovery after stream interruption, even though the client already received a response.created event with a valid response_id.

Steps to Reproduce

Start a streaming response:

curl http://127.0.0.1:8642/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "hermes-agent",
    "input": "Do something that takes a while or calls tools.",
    "stream": true,
    "store": true
  }'

Wait until the client receives:

event: response.created
data: {"type":"response.created","response":{"id":"resp_xxx", ...}}

Optionally wait until some text deltas or tool events arrive:

response.output_text.delta
response.output_item.added
response.output_item.done

Disconnect the client before response.completed.
Try to retrieve the response:

curl http://127.0.0.1:8642/v1/responses/resp_xxx \
  -H "Authorization: Bearer $API_KEY"

Expected Behavior

When store=true, after the server emits response.created, the response should be queryable via:

GET /v1/responses/{response_id}

At minimum, the store should contain a recoverable snapshot with a meaningful status:

{
  "id": "resp_xxx",
  "object": "response",
  "status": "in_progress",
  "output": []
}

During streaming, the stored response should be updated as output arrives:

response.output_text.delta
response.output_text.done
response.output_item.added
response.output_item.done
response.failed
response.completed

If the client disconnects before completion, the server should continue running in background.

For store=true, do not cancel the agent task on SSE client disconnect. Let the response continue to completion and persist the final response to ResponseStore.

This gives ChatGPT/Claude-like recovery:

client disconnects
server continues generation
client relaunches
client calls GET /v1/responses/{response_id}
final response is available once completed

Actual Behavior

GET /v1/responses/{response_id} returns:

{
  "error": {
    "message": "Response not found: resp_xxx",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

The client cannot recover:

partial assistant text
completed tool calls
tool outputs
final assistant message

The response id previously emitted by response.created is therefore not useful for recovery unless the stream reaches response.completed.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

Report       https://paste.rs/veGtu
  agent.log    https://paste.rs/3eRBA
  gateway.log  https://paste.rs/BJmsi

Operating System

ubuntu 26.04

Python Version

3.11.15

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

For store=true, decouple the agent task from the SSE connection:

SSE connection is only a live transport
response execution continues independently
ResponseStore is the source of truth
GET /v1/responses/{response_id} can return:
- queued
- in_progress
- completed
- failed
- incomplete

Client disconnect should not necessarily cancel response execution if the user requested store=true.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Decoupling the agent task from the SSE connection and persisting the response to ResponseStore as soon as response.created is emitted can fix the issue.

Guidance

Modify the POST /v1/responses endpoint to store the response in ResponseStore immediately after emitting response.created, regardless of the streaming status.
Update the agent task to continue running in the background even if the client disconnects, allowing the response to complete and be persisted to ResponseStore.
Implement a mechanism to update the stored response as output arrives during streaming, including response.output_text.delta, response.output_text.done, response.output_item.added, response.output_item.done, response.failed, and response.completed events.
Ensure that GET /v1/responses/{response_id} returns the current status of the response, such as queued, in_progress, completed, failed, or incomplete, even if the client disconnected before completion.

Example

# Pseudocode example of storing response in ResponseStore after response.created
def handle_response_created(response_id):
    # Store response in ResponseStore with initial status
    response_store.store(response_id, {"status": "in_progress", "output": []})

# Pseudocode example of updating stored response during streaming
def handle_output_event(response_id, event):
    # Retrieve current response from ResponseStore
    response = response_store.get(response_id)
    # Update response based on event type
    if event.type == "response.output_text.delta":
        response["output"].append(event.data)
    # ...
    # Store updated response in ResponseStore
    response_store.store(response_id, response)

Notes

The proposed fix requires changes to the POST /v1/responses endpoint, the agent task, and the ResponseStore implementation. Additionally, the `GET /v1/responses

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug]: Streaming /v1/responses cannot be recovered after client disconnect because partial/in-progress responses are not persisted [2 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

PR fix notes

PR #15171: fix(api-server): persist response snapshot on client disconnect when store=True

Description (problem / solution / changelog)

Summary

Root Cause

Impact

Validation

Changed files

PR #15492: fix(api-server): Implement background response run recovery

Description (problem / solution / changelog)

What does this PR do?

Current limitations

Related Issue

Type of Change

Changes Made

How to Test

1. Run the targeted test suite

2. Verify stream=true + store=false works as ephemeral streaming

3. Verify stream=true + store=false rejects Idempotency-Key

4. Verify stream=true + store=true survives client disconnect

5. Verify durable streaming idempotency retry

6. Verify idempotency conflict handling

7. Verify conversation behavior for store=false streaming

8. Verify active conversation protection

Checklist

Code

Documentation & Housekeeping

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Still need to ship something?

RELATED_DISCOVERY

TRENDING

2. Verify `stream=true + store=false` works as ephemeral streaming

3. Verify `stream=true + store=false` rejects `Idempotency-Key`

4. Verify `stream=true + store=true` survives client disconnect

7. Verify conversation behavior for `store=false` streaming