hermes - ✅(Solved) Fix macOS gateway eventually hits [Errno 24] Too many open files and needs restart (redacted) [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14210Fetched 2026-04-23 07:46:04
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×2cross-referenced ×1

Hermes gateway on macOS hit OSError: [Errno 24] Too many open files and eventually became unable to process Telegram messages, cron jobs, .env loads, dynamic imports, and outbound LLM/API requests. Restarting the launch agent temporarily recovers the service, but the failure suggests a file-descriptor leak or repeated resource retention under normal runtime load.

Error Message

OSError: [Errno 24] Too many open files: '<hermes-agent>/agent' File ".../gateway/run.py", line 2920, in _handle_message_with_agent File ".../gateway/run.py", line 7179, in _run_agent File ".../gateway/run.py", line 6718, in run_sync File ".../run_agent.py", line 757, in init File "<frozen importlib._bootstrap_external>", line 1662, in _fill_cache

Root Cause

Once this state is reached, Hermes effectively degrades across multiple subsystems at once:

  • messaging
  • cron jobs
  • session summarization
  • tool execution
  • import/loading logic

So the impact is broad, not isolated.

Fix Action

Fix / Workaround

If useful, I can provide more logs or test a diagnostic patch.

PR fix notes

PR #3: fix: preserve tool-turn context in chat/completions (#14270)

Description (problem / solution / changelog)

결론

Open WebUI 계열 클라이언트에서 도구 호출 맥락이 다음 턴에 소실되는 문제(#14270)를 chat/completions 경로에서 재현 테스트 후 수정했습니다.

원인

  • /v1/chat/completions 파서가 user/assistant만 conversation_history에 포함하고 tool role은 버렸습니다.
  • 또한 마지막 메시지를 무조건 user_message로 간주해, 일부 클라이언트에서 trailing tool/assistant 턴이 섞일 때 입력 추출이 불안정했습니다.
  • assistant tool-call 턴이 content: "" + tool_calls 형태로 전달되면, 이전 턴에 도구를 불렀다는 힌트가 완전히 사라졌습니다.

수정 내용

  1. tool role 메시지를 conversation_history에 포함
  2. assistant 턴이 empty content + tool_calls인 경우, 툴 이름 기반 marker를 history에 보존
    • 예: [assistant issued tool calls: read_file]
  3. user_message 추출을 "마지막 메시지"가 아닌 "마지막 유효 user 턴" 기준으로 변경

테스트

  • test_tool_messages_and_assistant_tool_calls_are_preserved
  • test_last_user_message_selected_when_trailing_tool_message_exists
  • 기존 test_conversation_history_passed 회귀 포함

실행:

  • scripts/run_tests.sh tests/gateway/test_api_server.py::TestChatCompletionsEndpoint

결과:

  • 17 passed

연계 이슈

  • Closes #14270
  • #14238, #14210 은 별도 패치로 이어서 진행 예정 (레이스/FD 누수 성격으로 범위가 큼)

Changed files

  • agent/error_classifier.py (modified, +24/-13)
  • gateway/platforms/api_server.py (modified, +37/-5)
  • run_agent.py (modified, +1/-1)
  • tests/agent/test_error_classifier.py (modified, +23/-0)
  • tests/gateway/test_api_server.py (modified, +69/-0)
  • tests/run_agent/test_run_agent.py (modified, +20/-0)
  • tests/tools/test_registry.py (modified, +24/-0)
  • tests/tools/test_todo_tool.py (modified, +23/-0)
  • tools/registry.py (modified, +43/-1)
  • tools/todo_tool.py (modified, +16/-2)

Code Example

OSError: [Errno 24] Too many open files: '<hermes-agent>/agent'
  File ".../gateway/run.py", line 2920, in _handle_message_with_agent
  File ".../gateway/run.py", line 7179, in _run_agent
  File ".../gateway/run.py", line 6718, in run_sync
  File ".../run_agent.py", line 757, in __init__
  File "<frozen importlib._bootstrap_external>", line 1662, in _fill_cache

---

OSError: [Errno 24] Too many open files: '~/.hermes/.env'
  File ".../cron/scheduler.py", line 559, in run_job
  File ".../site-packages/dotenv/main.py", line 63, in _get_stream

---

openai.APIConnectionError: Connection error.
httpx.ConnectError: [Errno 24] Too many open files
  File ".../tools/session_search_tool.py", line 155, in _summarize_session
  File ".../agent/auxiliary_client.py", line 2289, in async_call_llm
RAW_BUFFERClick to expand / collapse

Summary

Hermes gateway on macOS hit OSError: [Errno 24] Too many open files and eventually became unable to process Telegram messages, cron jobs, .env loads, dynamic imports, and outbound LLM/API requests. Restarting the launch agent temporarily recovers the service, but the failure suggests a file-descriptor leak or repeated resource retention under normal runtime load.

Environment

  • OS: macOS (Apple Silicon)
  • Runtime: launchd LaunchAgent
  • Hermes command:
    • <venv>/bin/python -m hermes_cli.main gateway run --replace
  • Hermes home:
    • ~/.hermes
  • Repo:
    • NousResearch/hermes-agent

Symptoms

After running for a while, Hermes starts failing broadly with [Errno 24] Too many open files, including:

  • Telegram handling failures for inbound DM sessions
  • Cron scheduler failures opening temp files and .env
  • gh CLI helper/tool invocations failing with the same error
  • OpenAI/httpx connection errors caused by FD exhaustion
  • Python import machinery failing to scan the agent/ package directory

Representative failing paths observed:

  • ~/.hermes/.env
  • ~/.hermes/cron/*.tmp
  • ~/.hermes/.channel_directory_*.tmp
  • <hermes-agent>/agent

Representative stack traces

Gateway / import failure

OSError: [Errno 24] Too many open files: '<hermes-agent>/agent'
  File ".../gateway/run.py", line 2920, in _handle_message_with_agent
  File ".../gateway/run.py", line 7179, in _run_agent
  File ".../gateway/run.py", line 6718, in run_sync
  File ".../run_agent.py", line 757, in __init__
  File "<frozen importlib._bootstrap_external>", line 1662, in _fill_cache

Cron / dotenv failure

OSError: [Errno 24] Too many open files: '~/.hermes/.env'
  File ".../cron/scheduler.py", line 559, in run_job
  File ".../site-packages/dotenv/main.py", line 63, in _get_stream

OpenAI/httpx failure under FD exhaustion

openai.APIConnectionError: Connection error.
httpx.ConnectError: [Errno 24] Too many open files
  File ".../tools/session_search_tool.py", line 155, in _summarize_session
  File ".../agent/auxiliary_client.py", line 2289, in async_call_llm

Additional observations

At the time of failure, the Hermes process had a high FD count and many repeated opens around SQLite-related files:

  • ~/.hermes/state.db
  • ~/.hermes/state.db-wal
  • ~/.hermes/response_store.db
  • ~/.hermes/response_store.db-wal

There were also socket entries such as:

  • 127.0.0.1:<ephemeral> -> 127.0.0.1:7897 (CLOSE_WAIT)

This may indicate one or both of:

  1. Repeated DB handle creation without timely close/reuse
  2. Network/client/socket leakage (e.g. lingering CLOSE_WAIT connections)

Recovery

A full restart of the launch agent recovers Hermes immediately. After restart, the new Hermes process came up healthy with a low FD count (~42 open files), which supports the theory that the process accumulates descriptors over time rather than starting high.

Why this matters

Once this state is reached, Hermes effectively degrades across multiple subsystems at once:

  • messaging
  • cron jobs
  • session summarization
  • tool execution
  • import/loading logic

So the impact is broad, not isolated.

Request

Please help investigate potential file descriptor leaks in the gateway runtime, especially around:

  • agent/auxiliary_client.py
  • tools/session_search_tool.py
  • cron dotenv loading
  • repeated SQLite handle reuse (response_store.db, state.db)
  • lingering network connections / CLOSE_WAIT

If useful, I can provide more logs or test a diagnostic patch.

extent analysis

TL;DR

The Hermes gateway on macOS is experiencing a file descriptor leak, likely due to repeated DB handle creation without timely close/reuse or network/client/socket leakage, causing it to hit the "Too many open files" error and fail to process various tasks.

Guidance

  • Investigate the agent/auxiliary_client.py and tools/session_search_tool.py files for potential file descriptor leaks, particularly around SQLite handle creation and reuse.
  • Review the cron dotenv loading process to ensure that files are properly closed after use.
  • Check for lingering network connections in a CLOSE_WAIT state and implement a mechanism to close them timely.
  • Consider implementing a file descriptor limit check to detect and prevent the leak before it causes the "Too many open files" error.

Example

import os

# Example of how to check the current file descriptor count
def get_fd_count():
    return os.getrlimit(os.RLIMIT_NOFILE)[0]

# Example of how to close a file descriptor
def close_fd(fd):
    os.close(fd)

Notes

The provided stack traces and observations suggest that the issue is related to file descriptor leaks, but further investigation is needed to pinpoint the exact cause. The CLOSE_WAIT connections and repeated DB handle creation without timely close/reuse are potential contributing factors.

Recommendation

Apply a workaround to detect and prevent file descriptor leaks, such as implementing a file descriptor limit check and closing lingering network connections. This will help mitigate the issue until the root cause can be fully identified and fixed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix macOS gateway eventually hits [Errno 24] Too many open files and needs restart (redacted) [1 pull requests, 2 comments, 2 participants]