hermes - ✅(Solved) Fix [Bug]: model_tools async bridge recreates loops in running-loop contexts [1 pull requests, 1 comments, 2 participants]

chezzdev · 2026-04-27T13:39:50Z

[hermes] PR 16573: fix model-tools : reuse persistent async bridge loop - Repository: NousResearch/hermes-agent - Author: chezzdev - State: open | merged: Fals… # PR #16573: fix(model-tools): reuse persistent async bridge loop - Repository: NousResearch/hermes-agent - Author: chezzdev - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/16573 ## Description (problem / solution / changelog) ## What does this PR do? Fixes the `model_tools._run_async()` running-loop branch so gateway/async callers reuse a persistent bridge event loop instead of creating and closing a fresh `asyncio.run()` loop per call. The previous behavior could strand cached `AsyncOpenAI`/httpx clients on dead loops in long-lived gateway processes, causing stale-loop cleanup hazards and descriptor churn. This mirrors the existing persistent-loop strategy already used for the main thread and worker threads, while adding explicit bridge-loop startup and shutdown handling. Fixes #16570 ## Related Issue Fixes #16570 ## Type of Change - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Security fix - [ ] Documentation update - [x] Tests (adding or improving test coverage) - [ ] Refactor (no behavior change) - [ ] New skill (bundled or hub) ## Changes Made - `model_tools.py`: add a dedicated persistent async bridge loop for callers already inside a running event loop. - `model_tools.py`: add startup failure handling so a dead bridge thread is not published as usable state. - `model_tools.py`: add `shutdown_async_bridge_loop()` to cancel pending bridge tasks, stop the bridge thread, and close the loop. - `cli.py` and `gateway/run.py`: call bridge-loop shutdown from existing process cleanup paths after cached auxiliary clients are closed. - `tests/test_model_tools_async_bridge.py`: cover running-loop branch reuse, startup failure cleanup, and shutdown behavior. - `tests/cli/test_session_boundary_hooks.py` and `tests/gateway/test_gateway_shutdown.py`: cover CLI/gateway cleanup integration. ## How to Test 1. `python -m py_compile model_tools.py cli.py gateway/run.py tests/test_model_tools_async_bridge.py tests/cli/test_session_boundary_hooks.py tests/gateway/test_gateway_shutdown.py` 2. `python -m pytest -o "addopts=" -n 4 --ignore=tests/integration --ignore=tests/e2e -m "not integration" tests/test_model_tools_async_bridge.py tests/test_model_tools.py tests/cli/test_session_boundary_hooks.py tests/gateway/test_gateway_shutdown.py -q` 3. `scripts/run_tests.sh tests/ -q --tb=short` was also run locally; it currently fails on unrelated/current-main areas. Representative failures reproduced from a clean `origin/main` worktree, including `tests/agent/test_anthropic_adapter.py::TestRunOauthSetupToken::test_returns_token_from_env_var`, `tests/gateway/test_discord_channel_controls.py::test_non_ignored_channel_processes_normally`, and `tests/run_agent/test_tool_arg_coercion.py::TestCoerceNumber::test_inf_stays_string_for_integer_only`. ## Checklist ### Code - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits - [x] I searched for existing PRs to make sure this isn't a duplicate - [ ] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes - [x] I've tested on my platform: macOS, Python 3.13.12 ### Documentation & Housekeeping - [x] I've updated relevant documentation -- N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys -- N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows -- N/A - [x] I've considered cross-platform impact (Windows, macOS) per the compatibility guide - [x] I've updated tool descriptions/schemas if I changed tool behavior -- N/A ## Screenshots / Logs Targeted validation: ```text 50 passed, 3 warnings in 13.39s ``` ## Changed files - `cli.py` (modified, +5/-0) - `gateway/run.py` (modified, +11/-0) - `model_tools.py` (modified, +123/-14) - `tests/cli/test_session_boundary_hooks.py` (modified, +3/-5) - `tests/gateway/test_gateway_shutdown.py` (modified, +15/-0) - `tests/test_model_tools_async_bridge.py` (modified, +87/-38) ## Fix / Workaround - Tools (async tool dispatch / `model_tools._run_async()`) - Gateway (long-lived async process) ## Bug Description When `model_tools._run_async()` is called from a thread that already has a running asyncio loop, it bridges by spinning up a fresh worker thread and running the coroutine with `asyncio.run()` for that single call. That creates a new event loop per async-context tool call. Cached async clients such as `AsyncOpenAI`/httpx can remain bound to those short-lived loops, leaving clients/transports tied to dead loops and causing descriptor/resource churn in long-lived gateway processes. ## Steps to Reproduce 1. Run Hermes in a long-lived gateway or another async context. 2. Trigger an async tool path repeatedly, for example one that goes through `async_call

hermes2026-04-27 13:39:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16570•Fetched 2026-04-28 06:52:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

chezzdev

Participants

alt-glitch

chezzdev

Timeline (top)

labeled ×4commented ×1cross-referenced ×1referenced ×1

Root Cause

model_tools._run_async() already uses persistent loops for the main thread and worker threads, but the branch for callers inside an active asyncio loop still uses a throwaway thread with asyncio.run(). asyncio.run() creates and closes an event loop each time, which conflicts with cached async clients that retain loop-bound transports.

Fix Action

Fix / Workaround

Tools (async tool dispatch / model_tools._run_async())
Gateway (long-lived async process)

PR fix notes

PR #16573: fix(model-tools): reuse persistent async bridge loop

Repository: NousResearch/hermes-agent
Author: chezzdev
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16573

Description (problem / solution / changelog)

What does this PR do?

Fixes the model_tools._run_async() running-loop branch so gateway/async callers reuse a persistent bridge event loop instead of creating and closing a fresh asyncio.run() loop per call.

The previous behavior could strand cached AsyncOpenAI/httpx clients on dead loops in long-lived gateway processes, causing stale-loop cleanup hazards and descriptor churn. This mirrors the existing persistent-loop strategy already used for the main thread and worker threads, while adding explicit bridge-loop startup and shutdown handling.

Fixes #16570

Related Issue

Fixes #16570

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Security fix
Documentation update
Tests (adding or improving test coverage)
Refactor (no behavior change)
New skill (bundled or hub)

Changes Made

model_tools.py: add a dedicated persistent async bridge loop for callers already inside a running event loop.
model_tools.py: add startup failure handling so a dead bridge thread is not published as usable state.
model_tools.py: add shutdown_async_bridge_loop() to cancel pending bridge tasks, stop the bridge thread, and close the loop.
cli.py and gateway/run.py: call bridge-loop shutdown from existing process cleanup paths after cached auxiliary clients are closed.
tests/test_model_tools_async_bridge.py: cover running-loop branch reuse, startup failure cleanup, and shutdown behavior.
tests/cli/test_session_boundary_hooks.py and tests/gateway/test_gateway_shutdown.py: cover CLI/gateway cleanup integration.

How to Test

python -m py_compile model_tools.py cli.py gateway/run.py tests/test_model_tools_async_bridge.py tests/cli/test_session_boundary_hooks.py tests/gateway/test_gateway_shutdown.py
python -m pytest -o "addopts=" -n 4 --ignore=tests/integration --ignore=tests/e2e -m "not integration" tests/test_model_tools_async_bridge.py tests/test_model_tools.py tests/cli/test_session_boundary_hooks.py tests/gateway/test_gateway_shutdown.py -q
scripts/run_tests.sh tests/ -q --tb=short was also run locally; it currently fails on unrelated/current-main areas. Representative failures reproduced from a clean origin/main worktree, including tests/agent/test_anthropic_adapter.py::TestRunOauthSetupToken::test_returns_token_from_env_var, tests/gateway/test_discord_channel_controls.py::test_non_ignored_channel_processes_normally, and tests/run_agent/test_tool_arg_coercion.py::TestCoerceNumber::test_inf_stays_string_for_integer_only.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits
I searched for existing PRs to make sure this isn't a duplicate
I've run pytest tests/ -q and all tests pass
I've added tests for my changes
I've tested on my platform: macOS, Python 3.13.12

Documentation & Housekeeping

I've updated relevant documentation -- N/A
I've updated cli-config.yaml.example if I added/changed config keys -- N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows -- N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide
I've updated tool descriptions/schemas if I changed tool behavior -- N/A

Screenshots / Logs

Targeted validation:

50 passed, 3 warnings in 13.39s

Changed files

cli.py (modified, +5/-0)
gateway/run.py (modified, +11/-0)
model_tools.py (modified, +123/-14)
tests/cli/test_session_boundary_hooks.py (modified, +3/-5)
tests/gateway/test_gateway_shutdown.py (modified, +15/-0)
tests/test_model_tools_async_bridge.py (modified, +87/-38)

RAW_BUFFERClick to expand / collapse

Bug Description

When model_tools._run_async() is called from a thread that already has a running asyncio loop, it bridges by spinning up a fresh worker thread and running the coroutine with asyncio.run() for that single call.

That creates a new event loop per async-context tool call. Cached async clients such as AsyncOpenAI/httpx can remain bound to those short-lived loops, leaving clients/transports tied to dead loops and causing descriptor/resource churn in long-lived gateway processes.

Steps to Reproduce

Run Hermes in a long-lived gateway or another async context.
Trigger an async tool path repeatedly, for example one that goes through async_call_llm().
Observe that _run_async() uses a fresh loop for each running-loop branch call instead of reusing a stable bridge loop.

Expected Behavior

Running-loop callers should submit coroutines to a persistent bridge loop so cached async clients remain bound to a live event loop across gateway turns. Shutdown paths should explicitly stop and close that bridge loop.

Actual Behavior

The running-loop branch uses per-call asyncio.run() in a disposable worker thread. Cached async clients can outlive the loop they were created on, causing stale-loop cleanup hazards and resource churn.

Affected Component

Tools (async tool dispatch / model_tools._run_async())
Gateway (long-lived async process)

Root Cause Analysis

Proposed Fix

Reuse one dedicated bridge loop for running-loop callers via asyncio.run_coroutine_threadsafe(), add startup failure handling, and stop/close the bridge loop from CLI/gateway cleanup paths.

extent analysis

TL;DR

Reuse a dedicated bridge loop for running-loop callers using asyncio.run_coroutine_threadsafe() to prevent cached async clients from being bound to short-lived loops.

Guidance

Identify the model_tools._run_async() function and modify it to reuse a persistent bridge loop for callers inside an active asyncio loop.
Use asyncio.run_coroutine_threadsafe() to submit coroutines to the bridge loop instead of creating a new loop with asyncio.run() for each call.
Implement startup failure handling for the bridge loop to ensure it is properly initialized and cleaned up.
Add a mechanism to stop and close the bridge loop from CLI/gateway cleanup paths to prevent resource churn.

Example

import asyncio

# Create a dedicated bridge loop
bridge_loop = asyncio.new_event_loop()

# Modify _run_async() to reuse the bridge loop
def _run_async(coroutine):
    # Use run_coroutine_threadsafe to submit the coroutine to the bridge loop
    return asyncio.run_coroutine_threadsafe(coroutine, bridge_loop)

Notes

The proposed fix assumes that the model_tools._run_async() function is the primary entry point for async tool calls. Additional modifications may be necessary to ensure that the bridge loop is properly initialized and cleaned up.

Recommendation

Apply the workaround by reusing a dedicated bridge loop for running-loop callers using asyncio.run_coroutine_threadsafe(), as this approach addresses the root cause of the issue and prevents cached async clients from being bound to short-lived loops.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.