After several rounds of work using multiple sub-agents and multiple skills, Hermes becomes noticeably slow.

I traced this to a combination of issues in the current implementation:

Child agents inherit an oversized default toolset, including skills_* tools, which inflates every child system prompt and can trigger repeated skills indexing/scanning work.
delegate_task() tries to avoid hanging on stuck children, but the surrounding ThreadPoolExecutor lifecycle can still block the parent on shutdown.
long-running sessions accumulate very large compaction summaries, so later turns become increasingly heavy.
session persistence is expensive under heavy delegation: child sessions are stored in state.db, and session JSON logs are rewritten frequently.

This is especially visible in real workloads that use several sub-agents across multiple rounds rather than one short isolated delegation.

[Bug]: multi-round subagent runs become very slow because child agents inherit oversized toolsets, delegation waits on stuck children, and session persistence grows too aggressively

ryanmind · 2026-04-17T05:43:32Z

[hermes] After several rounds of work using multiple sub-agents and multiple skills, Hermes becomes noticeably slow. I traced this to a combination of issues i… After several rounds of work using multiple sub-agents and multiple skills, Hermes becomes noticeably slow. I traced this to a combination of issues in the current implementation: 1. Child agents inherit an oversized default toolset, including `skills_*` tools, which inflates every child system prompt and can trigger repeated skills indexing/scanning work. 2. `delegate_task()` tries to avoid hanging on stuck children, but the surrounding `ThreadPoolExecutor` lifecycle can still block the parent on shutdown. 3. long-running sessions accumulate very large compaction summaries, so later turns become increasingly heavy. 4. session persistence is expensive under heavy delegation: child sessions are stored in `state.db`, and session JSON logs are rewritten frequently. This is especially visible in real workloads that use several sub-agents across multiple rounds rather than one short isolated delegation. # [Bug]: multi-round subagent runs become very slow because child agents inherit oversized toolsets, delegation waits on stuck children, and session persistence grows too aggressively ## Description After several rounds of work using multiple sub-agents and multiple skills, Hermes becomes noticeably slow. I traced this to a combination of issues in the current implementation: 1. Child agents inherit an oversized default toolset, including `skills_*` tools, which inflates every child system prompt and can trigger repeated skills indexing/scanning work. 2. `delegate_task()` tries to avoid hanging on stuck children, but the surrounding `ThreadPoolExecutor` lifecycle can still block the parent on shutdown. 3. long-running sessions accumulate very large compaction summaries, so later turns become increasingly heavy. 4. session persistence is expensive under heavy delegation: child sessions are stored in `state.db`, and session JSON logs are rewritten frequently. This is especially visible in real workloads that use several sub-agents across multiple rounds rather than one short isolated delegation. ## Environment - OS: macOS - Hermes home: `~/.hermes` - Hermes repo: `NousResearch/hermes-agent` - Model in my repro: `gpt-5.4` via custom Codex-compatible endpoint - Config excerpts: - `compression.enabled: true` - `compression.threshold: 0.5` - `compression.target_ratio: 0.2` - `compression.protect_last_n: 20` - `delegation.max_iterations: 50` ## Reproduction 1. Run Hermes in a real multi-round task, not just one-off delegation. 2. Use multiple sub-agents repeatedly across several rounds. 3. Invoke skills during the workflow or keep the skills toolset available. 4. Continue the same session until context compaction and several child sessions have accumulated. 5. Observe latency growth in later rounds. ## What I found ### 1. Child agents inherit too many tools by default Relevant file: - `tools/delegate_tool.py` The default child-agent path derives toolsets from the parent's loaded tools and only strips a small blocked set. In practice, child agents can inherit toolsets such as: - `skills` - `session_search` - `todo` - `vision` - `tts` - `browser` Even when config already declares: ```yaml delegation: default_toolsets: - terminal - file - web ``` that config is not used as the effective default in the main inheritance path. Impact: - child system prompts get much larger than necessary - child startup cost increases - skills-related scanning/index work is more likely to happen on child runs ### 2. `delegate_task()` can still block on slow/stuck children Relevant file: - `tools/delegate_tool.py` The code comments say the parent should avoid waiting forever on a stuck child, but the implementation still uses a `with ThreadPoolExecutor(...)` pattern. Even if the internal loop stops waiting, leaving the executor context can still perform a blocking shutdown (`wait=True` behavior at exit), which can stall the parent until slow children finish. Impact: - one stuck child can still freeze or heavily delay the parent - this gets worse when running multiple child agents in parallel ### 3. multi-round context compaction creates very large summaries Relevant files: - `run_agent.py` - `agent/context_compressor.py` In my Hermes home, many user messages are actually large compaction handoff summaries. These become very large and later turns must carry them forward. Observed local evidence: - `state.db`: ~31 MB - `sessions/`: ~76 MB - total sessions: 190 - child sessions: 159 - total messages: 4783 - child messages: 4160 - child sessions average `system_prompt` length: ~28835 chars - parent sessions average `system_prompt` length: ~22262 chars - 34 compaction-summary user messages, average length ~28962 chars, max observed > 66000 chars Impact: - later turns become progressively more expensive - delegation + compaction together amplify slowdown ### 4. persistence cost is too high u

Description

After several rounds of work using multiple sub-agents and multiple skills, Hermes becomes noticeably slow.

I traced this to a combination of issues in the current implementation:

Child agents inherit an oversized default toolset, including skills_* tools, which inflates every child system prompt and can trigger repeated skills indexing/scanning work.
delegate_task() tries to avoid hanging on stuck children, but the surrounding ThreadPoolExecutor lifecycle can still block the parent on shutdown.
long-running sessions accumulate very large compaction summaries, so later turns become increasingly heavy.
session persistence is expensive under heavy delegation: child sessions are stored in state.db, and session JSON logs are rewritten frequently.

This is especially visible in real workloads that use several sub-agents across multiple rounds rather than one short isolated delegation.

Environment

OS: macOS
Hermes home: ~/.hermes
Hermes repo: NousResearch/hermes-agent
Model in my repro: gpt-5.4 via custom Codex-compatible endpoint
Config excerpts:
- compression.enabled: true
- compression.threshold: 0.5
- compression.target_ratio: 0.2
- compression.protect_last_n: 20
- delegation.max_iterations: 50

Reproduction

Run Hermes in a real multi-round task, not just one-off delegation.
Use multiple sub-agents repeatedly across several rounds.
Invoke skills during the workflow or keep the skills toolset available.
Continue the same session until context compaction and several child sessions have accumulated.
Observe latency growth in later rounds.

What I found

1. Child agents inherit too many tools by default

Relevant file:

tools/delegate_tool.py

The default child-agent path derives toolsets from the parent's loaded tools and only strips a small blocked set. In practice, child agents can inherit toolsets such as:

skills
session_search
todo
vision
tts
browser

Even when config already declares:

delegation:
  default_toolsets:
    - terminal
    - file
    - web

that config is not used as the effective default in the main inheritance path.

Impact:

child system prompts get much larger than necessary
child startup cost increases
skills-related scanning/index work is more likely to happen on child runs

2. `delegate_task()` can still block on slow/stuck children

Relevant file:

tools/delegate_tool.py

The code comments say the parent should avoid waiting forever on a stuck child, but the implementation still uses a with ThreadPoolExecutor(...) pattern.

Even if the internal loop stops waiting, leaving the executor context can still perform a blocking shutdown (wait=True behavior at exit), which can stall the parent until slow children finish.

Impact:

one stuck child can still freeze or heavily delay the parent
this gets worse when running multiple child agents in parallel

3. multi-round context compaction creates very large summaries

Relevant files:

run_agent.py
agent/context_compressor.py

In my Hermes home, many user messages are actually large compaction handoff summaries. These become very large and later turns must carry them forward.

Observed local evidence:

state.db: ~31 MB
sessions/: ~76 MB
total sessions: 190
child sessions: 159
total messages: 4783
child messages: 4160
child sessions average system_prompt length: ~28835 chars
parent sessions average system_prompt length: ~22262 chars
34 compaction-summary user messages, average length ~28962 chars, max observed > 66000 chars

Impact:

later turns become progressively more expensive
delegation + compaction together amplify slowdown

4. persistence cost is too high under delegation-heavy workloads

Relevant files:

run_agent.py
hermes_state.py

Observed behavior:

child sessions are persisted into state.db
session JSON logs under ~/.hermes/sessions/ are rewritten frequently
WAL growth / write contention can add visible lag

Impact:

multi-round delegated workflows create a lot of I/O and DB churn
performance degrades over time instead of staying roughly flat

Suggested fixes

P0

Make delegation.default_toolsets the true default for child agents.
- Default should be lightweight, e.g. terminal/file/web.
- Do not implicitly inherit skills, session_search, todo, vision, tts, or browser unless explicitly requested.
Fix delegation shutdown semantics.
- Do not rely on with ThreadPoolExecutor(...) for interruptible child waiting.
- On interrupt/timeout, cancel futures where possible and call shutdown(wait=False, cancel_futures=True).
- Add a hard per-child timeout such as delegation.child_timeout_seconds.
Reduce child persistence overhead.
- Consider persist_session=False for subagents by default, or only persist child summaries/metrics instead of full child transcripts.
- Avoid rewriting full session JSON on every incremental update.

P1

Add caching/indexing for skills_list() / skill_view() similar to the existing skills prompt snapshot cache.
Revisit compaction defaults for long-running delegated sessions.
- Lower threshold
- smaller tail protection
- more aggressive target ratio for heavy multi-round sessions

Why this matters

Hermes encourages long-running, self-directed, multi-agent workflows. In that usage mode, performance should remain stable across rounds. Right now, the system gets heavier as it succeeds at using its own advanced features.

Willing to submit a PR

Yes, if maintainers agree on the preferred fix direction.

extent analysis

TL;DR

Implementing the suggested fixes, particularly making delegation.default_toolsets the true default for child agents, fixing delegation shutdown semantics, and reducing child persistence overhead, should alleviate the performance issues in Hermes.

Guidance

Review and adjust the delegation.default_toolsets configuration to ensure it is used as the effective default for child agents, preventing the inheritance of unnecessary tools.
Modify the delegate_task() function to avoid blocking on slow or stuck children by implementing interruptible waiting and adding a hard per-child timeout.
Consider disabling session persistence for subagents by default or only persisting child summaries/metrics to reduce the overhead of frequent session JSON rewrites.
Investigate optimizing compaction defaults for long-running delegated sessions, such as lowering the threshold, reducing tail protection, and increasing the target ratio.

Example

# Example of setting delegation.default_toolsets
delegation:
  default_toolsets:
    - terminal
    - file
    - web

# Example of fixing delegation shutdown semantics
with ThreadPoolExecutor(...) as executor:
    # ...
    executor.shutdown(wait=False, cancel_futures=True)

Notes

The provided suggestions and examples are based on the information given in the issue and may require further adjustments based on the specific implementation details of Hermes. It's also important to test these changes thoroughly to ensure they do not introduce any regressions.

Recommendation

Apply the suggested fixes, starting with making delegation.default_toolsets the true default for child agents and fixing delegation shutdown semantics, as these changes are likely to have the most significant impact on performance.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: multi-round subagent runs become very slow due to oversized child toolsets, blocking delegation shutdown, and heavy session persistence [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

[Bug]: multi-round subagent runs become very slow because child agents inherit oversized toolsets, delegation waits on stuck children, and session persistence grows too aggressively

Description

Environment

Reproduction

What I found

1. Child agents inherit too many tools by default

2. `delegate_task()` can still block on slow/stuck children

3. multi-round context compaction creates very large summaries

4. persistence cost is too high under delegation-heavy workloads

Suggested fixes

P0

P1

Why this matters

Willing to submit a PR

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Bug: multi-round subagent runs become very slow due to oversized child toolsets, blocking delegation shutdown, and heavy session persistence [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

[Bug]: multi-round subagent runs become very slow because child agents inherit oversized toolsets, delegation waits on stuck children, and session persistence grows too aggressively

Description

Environment

Reproduction

What I found

1. Child agents inherit too many tools by default

2. delegate_task() can still block on slow/stuck children

3. multi-round context compaction creates very large summaries

4. persistence cost is too high under delegation-heavy workloads

Suggested fixes

P0

P1

Why this matters

Willing to submit a PR

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

2. `delegate_task()` can still block on slow/stuck children