hermes - 💡(How to fix) Fix Code-quality, reliability & test findings (external full-codebase review)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  • Silent error swallowing — broad except Exception: pass / DEBUG-only logging hides production failures, e.g. hermes_cli/env_loader.py:184, agent/conversation_loop.py (plugin-hook blocks), tests/conftest.py:372,800,823. Log at DEBUG+ with context.
  • Leaked aiohttp.ClientSession on error pathhermes_cli/proxy/server.py:162-176; session not closed when _open_upstream errors after creation. Use async with / try-finally.
  • console.warn left in productionweb/src/pages/ChatPage.tsx:345,374,376,388 (clipboard paths).
  • ~1,000 assertion-free "smoke" tests — functions that only verify "no exception raised" (no assert / pytest.raises / mock-assert / caplog). Add output/state assertions so the suite actually verifies behavior.
RAW_BUFFERClick to expand / collapse

Code-quality, reliability & test findings (external full-codebase review)

Hi team 👋 — I ran a structured, full-codebase review of hermes-agent (graph-guided, per-layer) and wanted to share the non-security findings as a triage checklist.

Scope note: This issue intentionally contains only maintainability / reliability / performance / test-quality items. Per SECURITY.md, any security-sensitive findings are excluded here and would be routed through the private advisory channel instead — none are included below.

Caveats: Findings come from an automated multi-agent review at commit 2b768535c; reviewers prioritized the highest-complexity / highest-fan-in files per area, so this is high-signal rather than exhaustive. Line numbers are approximate and items are unverified — please triage. Several may be intentional design choices.


Maintainability

  • hermes_cli/main.py (~14k LOC, 175 top-level fns, fan-in 66) — split into packages (update.py, tui_launch.py, model_flows/, session.py); keep cmd_* as thin call-throughs.
  • hermes_cli/auth.py (~7.6k LOC, fan-in 87) — extract into auth/ package (store.py, providers/, credential_pool.py, resolve.py) with a re-export shim.
  • run_agent.py AIAgent (~3.8k LOC / ~185 methods) — extract cohesive responsibilities.
  • hermes_cli/config.py:147import yaml mid-file; move to the top-level import block (PEP 8 ordering).
  • Missing return-type annotations on many public functions (tools/file_tools.py, hermes_cli/main.py / auth.py, several plugins) — blocks effective mypy/pyright.
  • Undocumented magic numbers — e.g. agent/context_compressor.py _CHARS_PER_TOKEN=4, _IMAGE_TOKEN_ESTIMATE=1600.

Reliability / correctness

  • Unbounded scheduler thread poolcron/scheduler.py:1933 ThreadPoolExecutor(max_workers=None); a surge of due jobs (e.g. after an outage) can exhaust FDs/memory. Default to a conservative cap.
  • save_jobs() not under the jobs-file lockcron/jobs.py:433-449 is called from CRUD paths without _jobs_file_lock (only mark_job_run/advance_next_run hold it) → jobs.json corruption race with mid-tick writes.
  • Per-job os.environ mutation across parallel jobscron/scheduler.py:1335,1397 (HERMES_CRON_SESSION, TERMINAL_CWD) mutate the shared process env while parallel jobs run → cross-job bleed. Set the session flag once at startup; pass cwd via the per-job env copy.
  • Timezone cache never invalidated + missing APIhermes_time.py:84-88 caches the tz once for the process; the reset_cache() referenced in the module docstring isn't defined (AttributeError for doc-followers). Implement it / add a TTL.
  • Job-failure path skips output filecron/scheduler.py:1903-1906 records the failure in jobs.json but writes no save_job_output, so failed jobs have no audit artifact.
  • assert used as a runtime guardplugins/google_meet/realtime/openai_client.py:224,230; stripped under python -O, after which _ws is None silently calls .send()/.recv() on None. Use explicit raise.
  • Data races on module-level mutable state from daemon threads (no locks) — plugins/memory/honcho/session.py:270-346 (get_or_create TOCTOU clobbers a concurrent session), plugins/memory/mem0/__init__.py:180-197 (circuit-breaker counters), plugins/disk-cleanup/disk_cleanup.py:160-193 (track() read-modify-write).
  • File-queue without locking / non-atomic rewriteplugins/google_meet/.../openai_client.py:261-287 reads+rewrites say_queue.jsonl with no lock and a full overwrite → dropped/corrupted entries on concurrent access or crash. Use a lock + temp-file rename.
  • Silent error swallowing — broad except Exception: pass / DEBUG-only logging hides production failures, e.g. hermes_cli/env_loader.py:184, agent/conversation_loop.py (plugin-hook blocks), tests/conftest.py:372,800,823. Log at DEBUG+ with context.
  • Optional[callable] is not a valid type hintplugins/disk-cleanup/disk_cleanup.py:343 should be Optional[Callable[[Dict[str, Any]], bool]] (silently Optional[Any] today).
  • Unbounded in-memory caches (no TTL/LRU) → memory growth in long-running gateways, e.g. webhook idempotency map gateway/platforms/webhook.py:498-512 and other module-level dict caches.
  • Leaked aiohttp.ClientSession on error pathhermes_cli/proxy/server.py:162-176; session not closed when _open_upstream errors after creation. Use async with / try-finally.

Performance

  • Per-request aiohttp.ClientSessionhermes_cli/proxy/server.py:162 creates a new session (and connection pool) per proxied request; reuse one per adapter.
  • SDK install triggered on every isinstanceplugins/web/firecrawl/provider.py:103 __instancecheck__ calls a lazy install path; cache the result / drop the hook.
  • Eager f-string logging on hot paths — e.g. agent/tool_executor.py:312 logging.debug(f"..."); use %-style lazy args.

Frontend (dashboard web/ + ui-tui/) — tsc --noEmit passes; eslint reports 22 errors

  • Component created during renderweb/src/plugins/PluginPage.tsx:17-29; the plugin subtree remounts and loses state on every parent render. Stabilize the component identity (useRef).
  • 14× synchronous setState inside effects (React Compiler warnings) — cascading renders under React 19 concurrent; sites incl. ConfigPage.tsx, SessionsPage.tsx, ModelsPage.tsx, OAuthProvidersCard.tsx, Toast.tsx, LogsPage.tsx. Derive state outside effects or move to callbacks.
  • List keys by indexweb/src/pages/SessionsPage.tsx:252 (key={i}); breaks diffing on prepend/splice in streaming lists. Use a stable id.
  • console.warn left in productionweb/src/pages/ChatPage.tsx:345,374,376,388 (clipboard paths).
  • exhaustive-deps suppressions hiding stale closuresthemes/context.tsx:383, ChatSidebar.tsx:83, OAuthLoginModal.tsx:63; prefer functional updates.

Tests

  • ~1,000 assertion-free "smoke" tests — functions that only verify "no exception raised" (no assert / pytest.raises / mock-assert / caplog). Add output/state assertions so the suite actually verifies behavior.
  • Real time.sleep in timing-dependent tests → CI flakiness — e.g. tests/gateway/test_run_progress_topics.py (~3.15s/run), tests/test_hermes_state.py. No freezegun/time-machine is used anywhere; adopt a fake clock + event-driven sync, and mark slow/integration tests so they can be gated.
  • Real local-server/port-binding tests outside e2e/tests/hermes_cli/test_proxy.py, test_auth_xai_oauth_provider.py; tag @pytest.mark.integration and exclude from the default run.

Config hygiene (minor)

  • .env.example ships some uncommented active defaults (e.g. BROWSERBASE_PROXIES=true) — cp .env.example .env silently enables opt-in/privacy-affecting behavior. Comment these out or annotate clearly.

Happy to open focused PRs for any of these (the monolith splits and the test-assertion backfill are the highest-leverage). Generated from a graph-guided review using the understand-anything knowledge graph.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING