hermes - 💡(How to fix) Fix Gateway adds ~5s of fixed latency per message via stream_task wait_for when no stream consumer is registered [3 comments, 2 participants]

hermes2026-05-03 01:44:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#19045•Fetched 2026-05-03 04:52:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

augustin-ship-it

Participants

alt-glitch

augustin-ship-it

Timeline (top)

commented ×3labeled ×3closed ×1mentioned ×1

In non-streaming mode (display.streaming: false, or any platform that doesn't register a stream consumer), every message processed by the gateway pays an unnecessary ~5s tax on the response path.

Observed end-to-end latency on Telegram with streaming disabled + Haiku 4.5 via OpenRouter:

Direct OpenRouter curl: ~1.5s
Hermes gateway: 8.6–10s (consistent, regardless of payload size or model)
After fix: 1.5s (-83%)

Error Message

Wait for stream consumer to finish its final edit

if stream_task: try: await asyncio.wait_for(stream_task, timeout=5.0) except (asyncio.TimeoutError, asyncio.CancelledError): stream_task.cancel() try: await stream_task except asyncio.CancelledError: pass

Root Cause

gateway/run.py:13352:

# Wait for stream consumer to finish its final edit
if stream_task:
    try:
        await asyncio.wait_for(stream_task, timeout=5.0)
    except (asyncio.TimeoutError, asyncio.CancelledError):
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass

stream_task is created at line ~12815 from _start_stream_consumer():

async def _start_stream_consumer():
    for _ in range(200):  # Up to 10s wait
        if stream_consumer_holder[0] is not None:
            await stream_consumer_holder[0].run()
            return
        await asyncio.sleep(0.05)

When no consumer is ever registered (non-streaming path), the task never returns naturally — it polls for the full 10s. The cleanup code at 13352 then waits the full timeout=5.0 before cancelling. The 5s is the dominant component of the wall-clock response time for short messages.

py-spy profile confirms: 5s of post-run_conversation time is spent in _run_in_executor_with_context waiting on this task, while CPU is idle.

Fix Action

Fix

Skip the wait entirely when no consumer was ever registered — that's the non-streaming case:

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

Same pattern applies to the two other call sites: gateway/run.py:11559 (proxy mode) and gateway/run.py:13233 (queued-message branch).

Code Example

# Wait for stream consumer to finish its final edit
if stream_task:
    try:
        await asyncio.wait_for(stream_task, timeout=5.0)
    except (asyncio.TimeoutError, asyncio.CancelledError):
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass

---

async def _start_stream_consumer():
    for _ in range(200):  # Up to 10s wait
        if stream_consumer_holder[0] is not None:
            await stream_consumer_holder[0].run()
            return
        await asyncio.sleep(0.05)

---

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

RAW_BUFFERClick to expand / collapse

Summary

In non-streaming mode (display.streaming: false, or any platform that doesn't register a stream consumer), every message processed by the gateway pays an unnecessary ~5s tax on the response path.

Observed end-to-end latency on Telegram with streaming disabled + Haiku 4.5 via OpenRouter:

Direct OpenRouter curl: ~1.5s
Hermes gateway: 8.6–10s (consistent, regardless of payload size or model)
After fix: 1.5s (-83%)

Root cause

gateway/run.py:13352:

# Wait for stream consumer to finish its final edit
if stream_task:
    try:
        await asyncio.wait_for(stream_task, timeout=5.0)
    except (asyncio.TimeoutError, asyncio.CancelledError):
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass

stream_task is created at line ~12815 from _start_stream_consumer():

async def _start_stream_consumer():
    for _ in range(200):  # Up to 10s wait
        if stream_consumer_holder[0] is not None:
            await stream_consumer_holder[0].run()
            return
        await asyncio.sleep(0.05)

py-spy profile confirms: 5s of post-run_conversation time is spent in _run_in_executor_with_context waiting on this task, while CPU is idle.

Repro

streaming.enabled: false in config.yaml
Send any short Telegram message ("ping")
Time inbound→response: ~8s constant, regardless of model / tools / history size

Diagnostic logs around agent.run_conversation confirm:

run_conversation itself: 1.6s (matches OpenRouter direct curl)
run_sync executor wrapper: 1.6s
Time between executor task completion and gateway returning: 5.0s

Fix

Skip the wait entirely when no consumer was ever registered — that's the non-streaming case:

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

Same pattern applies to the two other call sites: gateway/run.py:11559 (proxy mode) and gateway/run.py:13233 (queued-message branch).

Side findings during diagnosis (separate, lesser issues)

agent.reasoning_effort: medium is the effective default for Haiku 4.5 even though reasoning_effort: none is set at the top level of config.yaml. The two keys aren't unified — only the nested one is read. Costs ~1.5s per message.
_POLL_INTERVAL = 5.0 in the executor poll loop (gateway/run.py:12965) is benign in steady state because asyncio.wait returns as soon as the task finishes, but reducing it gave a marginal improvement under contention.
Background memory/skill review (_run_review) correctly runs in a daemon thread after the user response — confirmed NOT in the critical path. Misleading initially because it appears in py-spy profiles overlapping the next user message.

Environment

hermes-agent: v0.12.0
Provider: OpenRouter
Model: anthropic/claude-haiku-4.5
Platform: Telegram (polling)
streaming.enabled: false

extent analysis

TL;DR

The 5s latency issue in non-streaming mode can be fixed by skipping the wait for the stream consumer task when no consumer is registered.

Guidance

Identify if the issue is related to the non-streaming mode by checking the streaming.enabled configuration.
Verify if the stream_task is being created and if it's waiting for the full 5s timeout.
Apply the proposed fix by modifying the gateway/run.py file to cancel the stream_task immediately when no consumer is registered.
Check for other potential issues, such as the effective default of agent.reasoning_effort and the _POLL_INTERVAL value, which may also impact performance.

Example

The proposed fix involves modifying the gateway/run.py file as follows:

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

Notes

The fix only applies to the non-streaming mode, and the issue may still occur if other parts of the code are not properly handling the stream consumer task.

Recommendation

Apply the workaround by modifying the gateway/run.py file to skip the wait for the stream consumer task when no consumer is registered, as this is the most direct solution to the identified issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#autograd error #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Gateway adds ~5s of fixed latency per message via stream_task wait_for when no stream consumer is registered [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Wait for stream consumer to finish its final edit

Root Cause

Fix Action

Fix

Code Example

Summary

Root cause

Repro

Fix

Side findings during diagnosis (separate, lesser issues)

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Gateway adds ~5s of fixed latency per message via stream_task wait_for when no stream consumer is registered [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Wait for stream consumer to finish its final edit

Root Cause

Fix Action

Fix

Code Example

Summary

Root cause

Repro

Fix

Side findings during diagnosis (separate, lesser issues)

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING