hermes - 💡(How to fix) Fix Gateway adds ~5s of fixed latency per message via stream_task wait_for when no stream consumer is registered [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#19045Fetched 2026-05-03 04:52:42
View on GitHub
Comments
3
Participants
2
Timeline
9
Reactions
0
Timeline (top)
commented ×3labeled ×3closed ×1mentioned ×1

In non-streaming mode (display.streaming: false, or any platform that doesn't register a stream consumer), every message processed by the gateway pays an unnecessary ~5s tax on the response path.

Observed end-to-end latency on Telegram with streaming disabled + Haiku 4.5 via OpenRouter:

  • Direct OpenRouter curl: ~1.5s
  • Hermes gateway: 8.6–10s (consistent, regardless of payload size or model)
  • After fix: 1.5s (-83%)

Error Message

Wait for stream consumer to finish its final edit

if stream_task: try: await asyncio.wait_for(stream_task, timeout=5.0) except (asyncio.TimeoutError, asyncio.CancelledError): stream_task.cancel() try: await stream_task except asyncio.CancelledError: pass

Root Cause

gateway/run.py:13352:

# Wait for stream consumer to finish its final edit
if stream_task:
    try:
        await asyncio.wait_for(stream_task, timeout=5.0)
    except (asyncio.TimeoutError, asyncio.CancelledError):
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass

stream_task is created at line ~12815 from _start_stream_consumer():

async def _start_stream_consumer():
    for _ in range(200):  # Up to 10s wait
        if stream_consumer_holder[0] is not None:
            await stream_consumer_holder[0].run()
            return
        await asyncio.sleep(0.05)

When no consumer is ever registered (non-streaming path), the task never returns naturally — it polls for the full 10s. The cleanup code at 13352 then waits the full timeout=5.0 before cancelling. The 5s is the dominant component of the wall-clock response time for short messages.

py-spy profile confirms: 5s of post-run_conversation time is spent in _run_in_executor_with_context waiting on this task, while CPU is idle.

Fix Action

Fix

Skip the wait entirely when no consumer was ever registered — that's the non-streaming case:

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

Same pattern applies to the two other call sites: gateway/run.py:11559 (proxy mode) and gateway/run.py:13233 (queued-message branch).

Code Example

# Wait for stream consumer to finish its final edit
if stream_task:
    try:
        await asyncio.wait_for(stream_task, timeout=5.0)
    except (asyncio.TimeoutError, asyncio.CancelledError):
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass

---

async def _start_stream_consumer():
    for _ in range(200):  # Up to 10s wait
        if stream_consumer_holder[0] is not None:
            await stream_consumer_holder[0].run()
            return
        await asyncio.sleep(0.05)

---

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass
RAW_BUFFERClick to expand / collapse

Summary

In non-streaming mode (display.streaming: false, or any platform that doesn't register a stream consumer), every message processed by the gateway pays an unnecessary ~5s tax on the response path.

Observed end-to-end latency on Telegram with streaming disabled + Haiku 4.5 via OpenRouter:

  • Direct OpenRouter curl: ~1.5s
  • Hermes gateway: 8.6–10s (consistent, regardless of payload size or model)
  • After fix: 1.5s (-83%)

Root cause

gateway/run.py:13352:

# Wait for stream consumer to finish its final edit
if stream_task:
    try:
        await asyncio.wait_for(stream_task, timeout=5.0)
    except (asyncio.TimeoutError, asyncio.CancelledError):
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass

stream_task is created at line ~12815 from _start_stream_consumer():

async def _start_stream_consumer():
    for _ in range(200):  # Up to 10s wait
        if stream_consumer_holder[0] is not None:
            await stream_consumer_holder[0].run()
            return
        await asyncio.sleep(0.05)

When no consumer is ever registered (non-streaming path), the task never returns naturally — it polls for the full 10s. The cleanup code at 13352 then waits the full timeout=5.0 before cancelling. The 5s is the dominant component of the wall-clock response time for short messages.

py-spy profile confirms: 5s of post-run_conversation time is spent in _run_in_executor_with_context waiting on this task, while CPU is idle.

Repro

  1. streaming.enabled: false in config.yaml
  2. Send any short Telegram message ("ping")
  3. Time inbound→response: ~8s constant, regardless of model / tools / history size

Diagnostic logs around agent.run_conversation confirm:

  • run_conversation itself: 1.6s (matches OpenRouter direct curl)
  • run_sync executor wrapper: 1.6s
  • Time between executor task completion and gateway returning: 5.0s

Fix

Skip the wait entirely when no consumer was ever registered — that's the non-streaming case:

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

Same pattern applies to the two other call sites: gateway/run.py:11559 (proxy mode) and gateway/run.py:13233 (queued-message branch).

Side findings during diagnosis (separate, lesser issues)

  • agent.reasoning_effort: medium is the effective default for Haiku 4.5 even though reasoning_effort: none is set at the top level of config.yaml. The two keys aren't unified — only the nested one is read. Costs ~1.5s per message.
  • _POLL_INTERVAL = 5.0 in the executor poll loop (gateway/run.py:12965) is benign in steady state because asyncio.wait returns as soon as the task finishes, but reducing it gave a marginal improvement under contention.
  • Background memory/skill review (_run_review) correctly runs in a daemon thread after the user response — confirmed NOT in the critical path. Misleading initially because it appears in py-spy profiles overlapping the next user message.

Environment

  • hermes-agent: v0.12.0
  • Provider: OpenRouter
  • Model: anthropic/claude-haiku-4.5
  • Platform: Telegram (polling)
  • streaming.enabled: false

extent analysis

TL;DR

The 5s latency issue in non-streaming mode can be fixed by skipping the wait for the stream consumer task when no consumer is registered.

Guidance

  • Identify if the issue is related to the non-streaming mode by checking the streaming.enabled configuration.
  • Verify if the stream_task is being created and if it's waiting for the full 5s timeout.
  • Apply the proposed fix by modifying the gateway/run.py file to cancel the stream_task immediately when no consumer is registered.
  • Check for other potential issues, such as the effective default of agent.reasoning_effort and the _POLL_INTERVAL value, which may also impact performance.

Example

The proposed fix involves modifying the gateway/run.py file as follows:

if stream_task:
    if stream_consumer_holder[0] is None:
        # No consumer was registered — cancel immediately.
        stream_task.cancel()
        try:
            await stream_task
        except asyncio.CancelledError:
            pass
    else:
        try:
            await asyncio.wait_for(stream_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            stream_task.cancel()
            try:
                await stream_task
            except asyncio.CancelledError:
                pass

Notes

The fix only applies to the non-streaming mode, and the issue may still occur if other parts of the code are not properly handling the stream consumer task.

Recommendation

Apply the workaround by modifying the gateway/run.py file to skip the wait for the stream consumer task when no consumer is registered, as this is the most direct solution to the identified issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Gateway adds ~5s of fixed latency per message via stream_task wait_for when no stream consumer is registered [3 comments, 2 participants]