hermes - ✅(Solved) Fix [Bug]: hermes_tools RPC client mismatches responses under concurrent tool calls from execute_code [4 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17770Fetched 2026-05-01 05:56:00
View on GitHub
Comments
2
Participants
2
Timeline
12
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×4labeled ×3commented ×2closed ×1

When a script run via execute_code calls sandbox tools (terminal, read_file, …) from multiple threads — e.g. ThreadPoolExecutor — each thread frequently receives another thread's response. The individual responses are well-formed and complete; they just get delivered to the wrong caller. This silently corrupts any concurrent tool-calling pattern inside execute_code.

Root Cause

tools/code_execution_tool.py — the UDS transport (_UDS_TRANSPORT_HEADER) used on the local backend:

_sock = None

def _connect():
    global _sock
    if _sock is None:
        _sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        _sock.connect(os.environ["HERMES_RPC_SOCKET"])
        _sock.settimeout(300)
    return _sock

def _call(tool_name, args):
    conn = _connect()
    request = json.dumps({"tool": tool_name, "args": args}) + "\n"
    conn.sendall(request.encode())
    buf = b""
    while True:
        chunk = conn.recv(65536)
        ...
  • _sock is a shared module-level connection.
  • The newline-framed RPC protocol has no request-id.
  • Server-side _rpc_server_loop accepts a single connection and handles requests serially (one response per request, in arrival order).
  • There is no lock around sendall + recv.

With concurrent callers, multiple threads sendall() their requests, the server responds in FIFO order, and each client thread races to recv() whatever newline-terminated blob arrives next on the shared socket. GIL guarantees single socket operations are atomic, but the multi-step sendall/recv sequence is not → responses get matched to the wrong caller. There is no interleaved byte corruption (responses are whole), just wrong-addressee delivery — which is exactly what the repro above shows.

The file transport (_FILE_TRANSPORT_HEADER._call, used on remote backends) has a sibling bug: _seq += 1 is a non-atomic read-modify-write, so concurrent threads can allocate the same sequence number and clobber each other's request/response files.

Fix Action

Fix

Smallest correct fix: wrap the send+recv round-trip (UDS) and the seq allocation (file) in a threading.Lock. No protocol change, no server change.

A longer-term fix would add a request-id to the protocol so concurrent round-trips could genuinely be in flight (server and both _calls would need updating), but for now serialization preserves correctness at negligible cost — tool dispatch in the parent already happens on a single connection anyway.

PR (with regression tests, including one that fails without the fix): #NNN (replace with link after opening).

PR fix notes

PR #17771: fix(tools): serialize concurrent hermes_tools RPC calls from execute_code

Description (problem / solution / changelog)

Fixes #17770.

Problem

When execute_code scripts call sandbox tools (terminal, read_file, …) from multiple threads — e.g. ThreadPoolExecutor — each thread often receives another thread's response. Responses are individually well-formed; they just get delivered to the wrong caller. See #17770 for the minimal repro and detailed analysis.

Root cause in tools/code_execution_tool.py:

  • UDS transport (local backend): _sock is a shared module-level connection, the newline-framed protocol has no request-id, the server handles requests serially in FIFO order, and _call() has no lock around sendall + recv. Concurrent callers race on recv() and get cross-matched.
  • File transport (remote backends): _seq += 1 is a non-atomic read-modify-write, so two threads can allocate the same seq and clobber each other's request/response files.

Fix

Smallest correct fix: wrap the send+recv round-trip (UDS) and the seq allocation (file) in a threading.Lock. No protocol change, no server change.

A request-id protocol would allow genuine in-flight concurrency later, but server dispatch is single-connection serial anyway so serializing the client gives up no real throughput today.

Tests

tests/tools/test_code_execution.py:

  • test_uds_transport_serializes_concurrent_calls — asserts _call_lock is present and used in the generated UDS transport source.
  • test_file_transport_serializes_seq_allocation — asserts _seq_lock is present and used in the generated file transport source.
  • test_concurrent_tool_calls_match_responses — end-to-end: runs a sandboxed ThreadPoolExecutor of 10 terminal() calls against a slow mock dispatcher and asserts every caller sees its own tag.

Verified that test_concurrent_tool_calls_match_responses fails 10/10 mismatched without the fix (matching the real-world repro from the issue) and passes with it.

Full tests/tools/test_code_execution.py (67 tests) + tests/tools/test_code_execution_modes.py (36 tests) pass locally.

Backward compatibility

None broken. The lock only affects concurrent callers inside a single execute_code run (which were already getting wrong answers). Single-threaded use is unchanged. No protocol change; the RPC server side is untouched.

Changed files

  • tests/tools/test_code_execution.py (modified, +76/-2)
  • tools/code_execution_tool.py (modified, +27/-15)

PR #17872: fix(tools): serialize concurrent hermes_tools RPC calls from execute_code

Description (problem / solution / changelog)

Summary

When a script run via execute_code calls hermes_tools functions (terminal, read_file, etc.) from multiple threads (e.g. ThreadPoolExecutor), each thread frequently receives another thread's response.

Root Cause

UDS transport (local backend): All threads share a single socket (_sock). Concurrent send+recv interleaves.

File transport (remote backends): The global _seq counter has a race condition — two threads can get the same seq number.

Fix

TransportProblemFix
UDSShared socket, no serializationthreading.Lock() around send+recv
File_seq race conditionthreading.Lock() around _seq increment

Fixes #17770

Changed files

  • tools/code_execution_tool.py (modified, +17/-14)

PR #17894: fix(tools): serialize concurrent hermes_tools RPC calls from execute_code (#17770)

Description (problem / solution / changelog)

Salvages #17771 by @Heltman onto current main. Closes #17770. Also supersedes @vominh1919's #17872 (same fix, submitted 4h later — both contributors credited).

Problem

Inside execute_code, concurrent tool calls from multiple threads (ThreadPoolExecutor, asyncio.to_thread, etc.) silently receive each other's responses. Responses are individually intact; they just get delivered to the wrong caller.

Root cause in tools/code_execution_tool.py:

  • UDS transport (local backend) — _sock is a shared module-level connection, the newline-framed protocol has no request-id, the server handles requests serially in FIFO order, and _call() has no lock around sendall + recv. Concurrent callers race on recv() and get cross-matched.
  • File transport (remote backends) — _seq += 1 is a non-atomic read-modify-write, so two threads can allocate the same seq and clobber each other's request/response files.

Fix (author: @Heltman, 2 files, +103/-17)

Smallest correct fix: wrap send+recv round-trip (UDS) and seq allocation (file) in a threading.Lock. No protocol change, no server change.

Validation

scripts/run_tests.sh tests/tools/test_code_execution.py tests/tools/test_code_execution_modes.py
103 passed in 33.25s

New regression tests:

  • test_uds_transport_serializes_concurrent_calls — asserts _call_lock is present in generated UDS source
  • test_file_transport_serializes_seq_allocation — asserts _seq_lock is present in generated file source
  • test_concurrent_tool_calls_match_responses — end-to-end: runs a sandboxed ThreadPoolExecutor of 10 terminal() calls with a slow mock dispatcher and asserts every caller sees its own tag (fails 10/10 without the fix).

Backward compatibility

None broken. Single-threaded use is unchanged. The lock only affects concurrent callers inside one execute_code run — which were getting wrong answers without it. Server side is untouched.

Authorship preserved for @Heltman via plain cherry-pick. Thanks also to @vominh1919 who independently identified and fixed the same issue in #17872.

Changed files

  • tests/tools/test_code_execution.py (modified, +76/-2)
  • tools/code_execution_tool.py (modified, +27/-15)

PR #17902: fix(tools): serialize concurrent hermes_tools RPC calls from execute_code

Description (problem / solution / changelog)

Serializes per-thread RPC calls from execute_code scripts so each thread actually receives its own response.

Fixes #17770. Salvaged from @Heltman's #17771 (original issue reporter). Supersedes #17872 by @vominh1919 (same intent, but broken — referenced an undeclared _seq_lock and missing threading import in the file-transport header; NameError on first call).

Root cause

  • UDS transport (local backend): _sock is shared module-global. The RPC server handles one msg at a time with no request-id, so concurrent _call() invocations interleaved send+recv and each thread read a different thread's response.
  • File transport (remote backends): _seq += 1 is a non-atomic read-modify-write; two threads could allocate the same sequence number and clobber each other's request/response files.

Fix

TransportChange
UDS_call_lock = threading.Lock(), wraps send+recv round-trip
File_seq_lock = threading.Lock(), snapshots seq into a local so filename and payload agree

Validation

BeforeAfter
test_concurrent_tool_calls_match_responses (E2E through real execute_code + RPC server, mock sleeps to force overlap)MISMATCH 10/10OK 10/10
tests/tools/test_code_execution.py67/67 pass

Regression guards included: the two string-presence tests catch header drift, the full E2E test catches the actual race.

Co-authored-by: Heltman [email protected]

Changed files

  • scripts/release.py (modified, +1/-0)
  • tests/tools/test_code_execution.py (modified, +76/-2)
  • tools/code_execution_tool.py (modified, +27/-15)

Code Example

from hermes_tools import terminal
import concurrent.futures

def run(i):
    res = terminal(f"echo START-{i}; sleep 0.3; echo END-{i}", timeout=10)
    return i, res.get("output", "").strip()

with concurrent.futures.ThreadPoolExecutor(max_workers=10) as ex:
    results = list(ex.map(run, range(10)))

mismatches = 0
for i, out in results:
    ok = (f"START-{i}" in out) and (f"END-{i}" in out)
    flag = "OK" if ok else "MISMATCH"
    if not ok: mismatches += 1
    print(f"i={i} {flag}: {out!r}")
print(f"\nmismatches: {mismatches}/10")

---

i=0 MISMATCH: 'START-9\nEND-9'
i=1 MISMATCH: 'START-7\nEND-7'
i=2 MISMATCH: 'START-6\nEND-6'
i=3 MISMATCH: 'START-5\nEND-5'
i=4 MISMATCH: 'START-8\nEND-8'
i=5 MISMATCH: 'START-4\nEND-4'
i=6 MISMATCH: 'START-3\nEND-3'
i=7 MISMATCH: 'START-1\nEND-1'
i=8 MISMATCH: 'START-0\nEND-0'
i=9 MISMATCH: 'START-2\nEND-2'

mismatches: 10/10

---

_sock = None

def _connect():
    global _sock
    if _sock is None:
        _sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        _sock.connect(os.environ["HERMES_RPC_SOCKET"])
        _sock.settimeout(300)
    return _sock

def _call(tool_name, args):
    conn = _connect()
    request = json.dumps({"tool": tool_name, "args": args}) + "\n"
    conn.sendall(request.encode())
    buf = b""
    while True:
        chunk = conn.recv(65536)
        ...
RAW_BUFFERClick to expand / collapse

Summary

When a script run via execute_code calls sandbox tools (terminal, read_file, …) from multiple threads — e.g. ThreadPoolExecutor — each thread frequently receives another thread's response. The individual responses are well-formed and complete; they just get delivered to the wrong caller. This silently corrupts any concurrent tool-calling pattern inside execute_code.

Environment

  • hermes-agent main at 828d3a320 (also reproduces on older mains with the same _call() code)
  • macOS / Python 3.11 (UDS / local backend). The same bug class exists on the file-based transport used by remote backends.
  • tools/code_execution_tool.py_UDS_TRANSPORT_HEADER._call and _FILE_TRANSPORT_HEADER._call.

Minimal repro

Run this as an execute_code script (or paste into any ThreadPool-driven use of hermes_tools):

from hermes_tools import terminal
import concurrent.futures

def run(i):
    res = terminal(f"echo START-{i}; sleep 0.3; echo END-{i}", timeout=10)
    return i, res.get("output", "").strip()

with concurrent.futures.ThreadPoolExecutor(max_workers=10) as ex:
    results = list(ex.map(run, range(10)))

mismatches = 0
for i, out in results:
    ok = (f"START-{i}" in out) and (f"END-{i}" in out)
    flag = "OK" if ok else "MISMATCH"
    if not ok: mismatches += 1
    print(f"i={i} {flag}: {out!r}")
print(f"\nmismatches: {mismatches}/10")

Observed

i=0 MISMATCH: 'START-9\nEND-9'
i=1 MISMATCH: 'START-7\nEND-7'
i=2 MISMATCH: 'START-6\nEND-6'
i=3 MISMATCH: 'START-5\nEND-5'
i=4 MISMATCH: 'START-8\nEND-8'
i=5 MISMATCH: 'START-4\nEND-4'
i=6 MISMATCH: 'START-3\nEND-3'
i=7 MISMATCH: 'START-1\nEND-1'
i=8 MISMATCH: 'START-0\nEND-0'
i=9 MISMATCH: 'START-2\nEND-2'

mismatches: 10/10

Every START-X/END-X pair is intact (streams don't interleave), but pairs are delivered to the wrong caller.

Expected

Each run(i) sees START-i/END-i in its own output. mismatches: 0/10.

Root cause

tools/code_execution_tool.py — the UDS transport (_UDS_TRANSPORT_HEADER) used on the local backend:

_sock = None

def _connect():
    global _sock
    if _sock is None:
        _sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        _sock.connect(os.environ["HERMES_RPC_SOCKET"])
        _sock.settimeout(300)
    return _sock

def _call(tool_name, args):
    conn = _connect()
    request = json.dumps({"tool": tool_name, "args": args}) + "\n"
    conn.sendall(request.encode())
    buf = b""
    while True:
        chunk = conn.recv(65536)
        ...
  • _sock is a shared module-level connection.
  • The newline-framed RPC protocol has no request-id.
  • Server-side _rpc_server_loop accepts a single connection and handles requests serially (one response per request, in arrival order).
  • There is no lock around sendall + recv.

With concurrent callers, multiple threads sendall() their requests, the server responds in FIFO order, and each client thread races to recv() whatever newline-terminated blob arrives next on the shared socket. GIL guarantees single socket operations are atomic, but the multi-step sendall/recv sequence is not → responses get matched to the wrong caller. There is no interleaved byte corruption (responses are whole), just wrong-addressee delivery — which is exactly what the repro above shows.

The file transport (_FILE_TRANSPORT_HEADER._call, used on remote backends) has a sibling bug: _seq += 1 is a non-atomic read-modify-write, so concurrent threads can allocate the same sequence number and clobber each other's request/response files.

Impact

Anything inside execute_code that does concurrent tool calls silently gets wrong data — ThreadPoolExecutor/asyncio.to_thread/threading.Thread over terminal(), read_file(), search_files(), web_search(), etc. Because responses are individually well-formed, the bug doesn't raise; it just returns the wrong values. Easy to hit in:

  • Parallel crawls/fan-outs across regions or shards (how I found it — parallel argus engine trino clusters -r <region> for 13 regions came back completely cross-matched).
  • Concurrent file reads, parallel searches, N-wide API fan-out patterns — all of which are encouraged by the execute_code tool description ("batch processing / loop over N items / parallel requests").

Fix

Smallest correct fix: wrap the send+recv round-trip (UDS) and the seq allocation (file) in a threading.Lock. No protocol change, no server change.

A longer-term fix would add a request-id to the protocol so concurrent round-trips could genuinely be in flight (server and both _calls would need updating), but for now serialization preserves correctness at negligible cost — tool dispatch in the parent already happens on a single connection anyway.

PR (with regression tests, including one that fails without the fix): #NNN (replace with link after opening).

Regression tests

The PR adds three tests in tests/tools/test_code_execution.py:

  • test_uds_transport_serializes_concurrent_calls — asserts _call_lock present and used in generated UDS transport.
  • test_file_transport_serializes_seq_allocation — asserts _seq_lock present and used in generated file transport.
  • test_concurrent_tool_calls_match_responses — runs a sandboxed ThreadPoolExecutor of 10 terminal() calls against a slow mock dispatcher and asserts every caller sees its own tag. Fails 10/10 without the fix; passes with it.

extent analysis

TL;DR

The most likely fix is to wrap the send+recv round-trip in a threading.Lock to serialize concurrent requests and prevent response mismatches.

Guidance

  • Identify the shared module-level connection (_sock) and the _call function in tools/code_execution_tool.py as the root cause of the issue.
  • Add a threading.Lock around the sendall and recv operations in the _call function to prevent concurrent access and response mismatches.
  • Consider adding a request-id to the protocol for a longer-term fix, allowing concurrent round-trips and updating the server and _call functions accordingly.
  • Review the provided regression tests in tests/tools/test_code_execution.py to ensure the fix is correctly implemented and verified.

Example

import threading

_call_lock = threading.Lock()

def _call(tool_name, args):
    with _call_lock:
        conn = _connect()
        request = json.dumps({"tool": tool_name, "args": args}) + "\n"
        conn.sendall(request.encode())
        buf = b""
        while True:
            chunk = conn.recv(65536)
            # ...

Notes

The provided fix assumes that the threading.Lock will serialize the concurrent requests and prevent response mismatches. However, this may introduce performance overhead due to the lock contention. A more efficient solution may be to use a request-id based protocol, but this would require significant changes to the server and client code.

Recommendation

Apply the workaround by adding a threading.Lock around the sendall and recv operations in the _call function, as this is the smallest correct fix that preserves correctness at negligible cost.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: hermes_tools RPC client mismatches responses under concurrent tool calls from execute_code [4 pull requests, 2 comments, 2 participants]