hermes - ✅(Solved) Fix fix: response=0 chars regression in gateway after lazy session creation commit [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18765Fetched 2026-05-03 04:54:27
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Error Message

2026-05-01 21:01:05,565 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=943.0s api_calls=2 response=0 chars
2026-05-02 09:13:05,677 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=5.2s api_calls=0 response=0 chars
2026-05-02 11:18:43,925 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=788.9s api_calls=10 response=0 chars
2026-05-02 16:14:12,097 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=227.6s api_calls=15 response=0 chars

Root Cause

Suspected Root Cause

Fix Action

Fixed

PR fix notes

PR #18798: fix: prevent response=0 chars regression from silent _ensure_db_session failures (#18765)

Description (problem / solution / changelog)

Fix #18765: Prevent response=0 chars regression from silent _ensure_db_session failures

Problem

After commit c5b4c4816 (lazy session creation), _ensure_db_session() silently catches all exceptions and leaves _session_db_created=False. The conversation loop continues with an uninitialized session, completing API calls but discarding the final response — logging response=0 chars.

Fix

  • _ensure_db_session() now raises instead of silently swallowing exceptions
  • run_conversation() catches the exception and disables _session_db for the run (self._session_db = None), allowing the conversation to proceed without persistence
  • The _flush_messages_or_raise() caller already has a try/except that handles the raise

Pattern

When a function silently swallows exceptions and leaves state half-initialized, every caller that depends on that state being initialized will silently produce wrong results. Prefer raising + explicit caller handling over silent catch + retry-later semantics.

Changed files

  • run_agent.py (modified, +34/-19)

Code Example

INFO gateway.run: response ready: platform=weixin chat=... time=943.0s api_calls=2 response=0 chars
INFO gateway.run: response ready: platform=weixin chat=... time=789.0s api_calls=10 response=0 chars
INFO gateway.run: response ready: platform=weixin chat=... time=227.6s api_calls=15 response=0 chars

---

2026-05-01 21:01:05,565 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=943.0s api_calls=2 response=0 chars
2026-05-02 09:13:05,677 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=5.2s api_calls=0 response=0 chars
2026-05-02 11:18:43,925 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=788.9s api_calls=10 response=0 chars
2026-05-02 16:14:12,097 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=227.6s api_calls=15 response=0 chars
RAW_BUFFERClick to expand / collapse

Bug Description

After commit c5b4c4816 (fix: lazy session creation — defer DB row until first message (#18370)), the gateway agent occasionally returns response=0 chars — the agent completes a full run (many API calls, long elapsed time) but produces no output and sends nothing back to the user.

Symptoms

From gateway agent.log, entries like:

INFO gateway.run: response ready: platform=weixin chat=... time=943.0s api_calls=2 response=0 chars
INFO gateway.run: response ready: platform=weixin chat=... time=789.0s api_calls=10 response=0 chars
INFO gateway.run: response ready: platform=weixin chat=... time=227.6s api_calls=15 response=0 chars

The agent is clearly doing work (high api_calls count, long elapsed time) but returns nothing. This is a silent failure — no error is logged.

Frequency

Observed 4 times in ~24 hours before any upstream update on May 2. The issue predates the May 2 systemd unit update.

Suspected Root Cause

In run_agent.py, the _ensure_db_session() method added by c5b4c4816 raises an exception that gets caught and logged, but the agent continues running with _session_db_created = False. The next message to the same session will retry — but the current run may proceed with a partially-initialized session state, causing the final response to be discarded.

The old code used ensure_session() (idempotent, INSERT OR IGNORE) in _flush_messages_or_raise, which never failed silently. The new code relies on _ensure_db_session() called at the top of run_conversation(), but when session row creation fails, the exception is caught and logged — yet the conversation loop continues, potentially completing but then returning no response.

Environment

  • Platform: macOS (WeChat gateway)
  • Hermes: latest from main (commit f98b5d00a)
  • Python: 3.11
  • Config: gateway mode with WeChat adapter

Logs

2026-05-01 21:01:05,565 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=943.0s api_calls=2 response=0 chars
2026-05-02 09:13:05,677 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=5.2s api_calls=0 response=0 chars
2026-05-02 11:18:43,925 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=788.9s api_calls=10 response=0 chars
2026-05-02 16:14:12,097 INFO gateway.run: response ready: platform=weixin chat=o9cq807w... time=227.6s api_calls=15 response=0 chars

Proposed Fix

The retry logic in _ensure_db_session() should not silently continue the conversation if session creation fails. Either:

  1. Make _ensure_db_session() raise instead of silently catching (and let the caller handle it), OR
  2. Fall back to the old ensure_session() call in _flush_messages_or_raise as a safety net, OR
  3. Add a success flag check at the end of run_conversation() and return an error response if the session was never created

Commit

  • c5b4c4816 — fix: lazy session creation — defer DB row until first message (#18370) — suspected culprit
  • f98b5d00a — fix: gateway systemd unit now retries indefinitely with backoff (#18639) — related (exposes the issue more due to restart behavior change)

extent analysis

TL;DR

The issue can be fixed by modifying the _ensure_db_session() method to either raise an exception or add a fallback to the old ensure_session() call to prevent silent failures.

Guidance

  • Review the _ensure_db_session() method and consider making it raise an exception instead of silently catching and logging errors.
  • Add a success flag check at the end of run_conversation() to return an error response if the session was never created.
  • Consider falling back to the old ensure_session() call in _flush_messages_or_raise as a safety net to prevent silent failures.
  • Verify the fix by checking the gateway agent logs for the presence of error responses or exceptions instead of silent failures.

Example

def _ensure_db_session():
    try:
        # session creation code
    except Exception as e:
        # log the error
        raise  # re-raise the exception instead of silently catching

def run_conversation():
    # ...
    if not _session_db_created:
        return error_response  # return an error response if session was never created

Notes

The proposed fix assumes that the issue is caused by the silent failure in the _ensure_db_session() method. However, further investigation may be needed to confirm the root cause.

Recommendation

Apply a workaround by modifying the _ensure_db_session() method to raise an exception or add a fallback to the old ensure_session() call, as this will prevent silent failures and provide more visibility into the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING