hermes - 💡(How to fix) Fix [Bug]: gateway progress_callback has unguarded shared state (last_progress_msg / repeat_count) called from concurrent ThreadPoolExecutor workers

hermes2026-05-12 23:24:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

progress_callback in gateway/run.py (~line 14248) reads and writes two shared closure variables — last_progress_msg[0] and repeat_count[0] — without any synchronization. The callback is passed as tool_progress_callback to the agent, which calls it from concurrent ThreadPoolExecutor worker threads inside _execute_tool_calls_concurrent (run_agent.py:10951).

Root Cause

# gateway/run.py ~line 14223
last_progress_msg = [None]   # shared mutable state
repeat_count = [0]           # shared mutable state

def progress_callback(event_type, tool_name=None, preview=None, ...):
    ...
    # Called from multiple concurrent worker threads — NO lock
    if msg == last_progress_msg[0]:       # ← RACE: read
        repeat_count[0] += 1              # ← RACE: non-atomic read-modify-write
        progress_queue.put(("__dedup__", msg, repeat_count[0]))
        return
    last_progress_msg[0] = msg            # ← RACE: write
    repeat_count[0] = 0                   # ← RACE: write
    progress_queue.put(msg)

Two workers executing concurrently can:

Both read last_progress_msg[0] = None before either writes → both emit their message as "new" (one is lost from dedup perspective)
Interleave on repeat_count[0] += 1 → lost increments (the (×N) counter is wrong)
Thread A sets last_progress_msg[0] = msg_A, Thread B immediately overwrites with msg_B before A's next call checks it → A's dedup window is broken

last_tool[0] (line 14306, used in "new" mode) has the same issue.

Fix Action

Fix

Add a threading.Lock() created alongside last_progress_msg and repeat_count, and hold it for the entire dedup check-and-update block:

last_progress_msg = [None]
repeat_count = [0]
_dedup_lock = threading.Lock()

def progress_callback(...):
    ...
    with _dedup_lock:
        if msg == last_progress_msg[0]:
            repeat_count[0] += 1
            progress_queue.put(("__dedup__", msg, repeat_count[0]))
            return
        last_progress_msg[0] = msg
        repeat_count[0] = 0
    progress_queue.put(msg)

The lock covers the entire read-check-write sequence, making the dedup atomic regardless of how many workers call it simultaneously.

Code Example

# gateway/run.py ~line 14223
last_progress_msg = [None]   # shared mutable state
repeat_count = [0]           # shared mutable state

def progress_callback(event_type, tool_name=None, preview=None, ...):
    ...
    # Called from multiple concurrent worker threads — NO lock
    if msg == last_progress_msg[0]:       # ← RACE: read
        repeat_count[0] += 1              # ← RACE: non-atomic read-modify-write
        progress_queue.put(("__dedup__", msg, repeat_count[0]))
        return
    last_progress_msg[0] = msg            # ← RACE: write
    repeat_count[0] = 0                   # ← RACE: write
    progress_queue.put(msg)

---

last_progress_msg = [None]
repeat_count = [0]
_dedup_lock = threading.Lock()

def progress_callback(...):
    ...
    with _dedup_lock:
        if msg == last_progress_msg[0]:
            repeat_count[0] += 1
            progress_queue.put(("__dedup__", msg, repeat_count[0]))
            return
        last_progress_msg[0] = msg
        repeat_count[0] = 0
    progress_queue.put(msg)

RAW_BUFFERClick to expand / collapse

Description

Root cause

# gateway/run.py ~line 14223
last_progress_msg = [None]   # shared mutable state
repeat_count = [0]           # shared mutable state

def progress_callback(event_type, tool_name=None, preview=None, ...):
    ...
    # Called from multiple concurrent worker threads — NO lock
    if msg == last_progress_msg[0]:       # ← RACE: read
        repeat_count[0] += 1              # ← RACE: non-atomic read-modify-write
        progress_queue.put(("__dedup__", msg, repeat_count[0]))
        return
    last_progress_msg[0] = msg            # ← RACE: write
    repeat_count[0] = 0                   # ← RACE: write
    progress_queue.put(msg)

Two workers executing concurrently can:

Both read last_progress_msg[0] = None before either writes → both emit their message as "new" (one is lost from dedup perspective)
Interleave on repeat_count[0] += 1 → lost increments (the (×N) counter is wrong)
Thread A sets last_progress_msg[0] = msg_A, Thread B immediately overwrites with msg_B before A's next call checks it → A's dedup window is broken

last_tool[0] (line 14306, used in "new" mode) has the same issue.

Affected paths

gateway/run.py ~line 14223–14356 (progress_callback closure)
run_agent.py ~line 10951 (_execute_tool_calls_concurrent with ThreadPoolExecutor)

Fix

Add a threading.Lock() created alongside last_progress_msg and repeat_count, and hold it for the entire dedup check-and-update block:

last_progress_msg = [None]
repeat_count = [0]
_dedup_lock = threading.Lock()

def progress_callback(...):
    ...
    with _dedup_lock:
        if msg == last_progress_msg[0]:
            repeat_count[0] += 1
            progress_queue.put(("__dedup__", msg, repeat_count[0]))
            return
        last_progress_msg[0] = msg
        repeat_count[0] = 0
    progress_queue.put(msg)

The lock covers the entire read-check-write sequence, making the dedup atomic regardless of how many workers call it simultaneously.

Steps to reproduce

Run a request that triggers parallel tool execution (e.g., parallel bash calls) and observe the (×N) counter showing incorrect values or duplicate "new" progress lines that should have been collapsed.

Environment

hermes-agent main branch
Any multi-tool parallel execution via gateway (Slack, Telegram, Discord, etc.)
tool_progress enabled in config

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#ISR setup #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: gateway progress_callback has unguarded shared state (last_progress_msg / repeat_count) called from concurrent ThreadPoolExecutor workers

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

Code Example

Description

Root cause

Affected paths

Fix

Steps to reproduce

Environment

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug]: gateway progress_callback has unguarded shared state (last_progress_msg / repeat_count) called from concurrent ThreadPoolExecutor workers

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

Code Example

Description

Root cause

Affected paths

Fix

Steps to reproduce

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING