hermes - 💡(How to fix) Fix [Important] Hermescheck community audit: static architecture scan + runtime smoke [1 comments, 1 participants]

huangrichao2020 · 2026-04-25T08:44:10Z

[hermes] I ran a community hermescheck https://github.com/huangrichao2020/hermescheck audit against NousResearch/hermes-agent with both static architecture sca… I ran a community [hermescheck](https://github.com/huangrichao2020/hermescheck) audit against `NousResearch/hermes-agent` with both static architecture scans and a live runtime smoke test. This is an open-source architecture quality audit rather than a vulnerability report. Hermescheck is an architecture-health scanner, so some findings are risk signals, cleanup opportunities, or documentation/contract gaps that need maintainer judgment. ## Summary I ran a community [hermescheck](https://github.com/huangrichao2020/hermescheck) audit against `NousResearch/hermes-agent` with both static architecture scans and a live runtime smoke test. This is an open-source architecture quality audit rather than a vulnerability report. Hermescheck is an architecture-health scanner, so some findings are risk signals, cleanup opportunities, or documentation/contract gaps that need maintainer judgment. ## Target - Repository: `NousResearch/hermes-agent` - Static scan commit: `e5647d7` (`docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530)`) - Scan date: 2026-04-25 - Scanner: `hermescheck 0.1.0` - Static reports were schema-validated after generation. - Full sanitized artifacts: https://gist.github.com/huangrichao2020/f079988b42c6f49970320231f613c938 ## Static Scan Results | Profile | Overall | Critical | High | Medium | Low | Total | | --- | --- | ---: | ---: | ---: | ---: | ---: | | Personal development | critical | 1 | 6 | 46 | 70 | 123 | | Enterprise production | critical | 35 | 18 | 70 | 0 | 123 | Architecture maturity result: - Era: AI age / `人工智能时代` - Score: `100/100` - Detected strengths: methodology layer, agent runtime, tool/syscall boundary, fact memory, skill memory, context compaction, semantic paging, page-fault recovery, impression cues, scheduler/workers, fair scheduling, capability table, semantic VFS, traces/evals, stateful recovery, environment-as-state, LLM CLI workers, task envelope, CLI prompt contract. ## Top Static Findings ### 1. Hardcoded secret-like fixture detected - Severity: critical in both profiles - Layer: `secrets_management` - Evidence: `tests/agent/test_redact.py:70` - Symptom: test fixture includes a token-shaped string: `MY_SECRET_TOKEN="supersecretvalue123456789"` - Suggested handling: if this is intentional test data, consider renaming it to an obviously fake sentinel or adding scanner/test allowlisting so automated scanners do not treat it as a real credential. ### 2. Internal orchestration sprawl - Severity: high in personal profile - Layer: `orchestration` - Symptom: Hermescheck found `16418` orchestration markers across delegation, planning, recovery, routing, and scheduling categories. - Representative evidence: `.plans/openai-api-server.md:219`, `.plans/streaming-support.md:87`, `.github/ISSUE_TEMPLATE/setup_help.yml`, `AGENTS.md` - Suggested handling: document one canonical main loop and clarify ownership for planning, routing, retries, recovery, and delegation. ### 3. Completion closure gap - Severity: high in personal profile - Layer: `completion_closure` - Representative evidence: `AGENTS.md:82`, `AGENTS.md:664`, `CONTRIBUTING.md:86` - Symptom: file creation and index update signals exist, but reusable memory closure is not always explicit. - Suggested handling: define a completion contract such as file created -> index updated -> impression/anchor/pointer registered -> acceptance evidence. ### 4. Memory freshness / generation confusion - Severity: high in personal profile - Layer: `memory_freshness` - Representative evidence: `acp_adapter/session.py`, `gateway/session.py`, `plugins/memory/*` - Symptom: `121` memory-like surfaces across `9` categories, with overlapping stems such as `init`, `memory`, `plugin`, `readme`, and `session`. - Suggested handling: document the authoritative current memory surface and mark archives/snapshots/summaries as secondary. ### 5. Role-play handoff orchestration - Severity: high in personal profile - Layer: `orchestration` - Representative evidence: `RELEASE_v0.9.0.md:30`, `.plans/streaming-support.md:543`, `RELEASE_v0.2.0.md:177` - Symptom: `130` role markers across builder, manager, researcher, reviewer categories, plus `1732` serial handoff markers. - Suggested handling: keep one owner for the user intent and reserve subagents for bounded evidence gathering or isolated execution. ### 6. Startup surface sprawl - Severity: high in personal profile - Layer: `startup` - Representative evidence: `docker/entrypoint.sh`, `docker-compose.yml`, `gateway/platforms/whatsapp.py` - Symptom: `42` startup-like files and `171` launcher/wrapper sites. - Suggested handling: clearly separate canonical local development startup, service startup, and optional integration launchers. ### 7. Runtime surface sprawl - Severity: high in personal profile - Layer: `runtime_architectu

hermes2026-04-25 08:44:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15568•Fetched 2026-04-26 05:26:34

View on GitHub

Comments

Participants

Timeline

Reactions

Author

huangrichao2020

Participants

huangrichao2020

Timeline (top)

labeled ×2commented ×1renamed ×1

I ran a community hermescheck audit against NousResearch/hermes-agent with both static architecture scans and a live runtime smoke test.

This is an open-source architecture quality audit rather than a vulnerability report. Hermescheck is an architecture-health scanner, so some findings are risk signals, cleanup opportunities, or documentation/contract gaps that need maintainer judgment.

Error Message

hermes chat ... --provider deepseek ... error: argument --provider: invalid choice: 'deepseek'

Root Cause

Important caveat: the runtime smoke used an operational Hermes checkout at 8fdc8bf0d on branch codex/reasoning-replay-state because that environment already had provider credentials and dependencies. The static findings above are from upstream main at e5647d7. The runtime finding below should be verified against current upstream main.

Code Example

hermes chat ... --provider deepseek ...
error: argument --provider: invalid choice: 'deepseek'

---

{
  "status": "pass",
  "remembered_stateful_agent": "combine transcript replay with durable environment state",
  "remembered_context_budget": "stable prompt prefixes, explicit cache markers, compaction, page-fault style retrieval",
  "remembered_cli_worker": "natural-language task prompt or referenced Task JSON path, not raw JSON dumped blindly to stdin",
  "tool_runtime_observation": {
    "workspace_file": "AGENTS.md (571 bytes)",
    "available_tools": ["terminal", "process"],
    "secrets": false
  }
}

---

git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermescheck-hermes-agent
cd /path/to/hermescheck
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile personal \
  -o personal-audit.json \
  -r personal-audit.md \
  --sarif personal-audit.sarif.json
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile enterprise \
  -o enterprise-audit.json \
  -r enterprise-audit.md \
  --sarif enterprise-audit.sarif.json
uv run python -m hermescheck validate personal-audit.json
uv run python -m hermescheck validate enterprise-audit.json

RAW_BUFFERClick to expand / collapse

Summary

I ran a community hermescheck audit against NousResearch/hermes-agent with both static architecture scans and a live runtime smoke test.

Target

Repository: NousResearch/hermes-agent
Static scan commit: e5647d7 (docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530))
Scan date: 2026-04-25
Scanner: hermescheck 0.1.0
Static reports were schema-validated after generation.
Full sanitized artifacts: https://gist.github.com/huangrichao2020/f079988b42c6f49970320231f613c938

Static Scan Results

Profile	Overall	Critical	High	Medium	Low	Total
Personal development	critical	1	6	46	70	123
Enterprise production	critical	35	18	70	0	123

Architecture maturity result:

Era: AI age / 人工智能时代
Score: 100/100
Detected strengths: methodology layer, agent runtime, tool/syscall boundary, fact memory, skill memory, context compaction, semantic paging, page-fault recovery, impression cues, scheduler/workers, fair scheduling, capability table, semantic VFS, traces/evals, stateful recovery, environment-as-state, LLM CLI workers, task envelope, CLI prompt contract.

Top Static Findings

1. Hardcoded secret-like fixture detected

Severity: critical in both profiles
Layer: secrets_management
Evidence: tests/agent/test_redact.py:70
Symptom: test fixture includes a token-shaped string: MY_SECRET_TOKEN="supersecretvalue123456789"
Suggested handling: if this is intentional test data, consider renaming it to an obviously fake sentinel or adding scanner/test allowlisting so automated scanners do not treat it as a real credential.

2. Internal orchestration sprawl

Severity: high in personal profile
Layer: orchestration
Symptom: Hermescheck found 16418 orchestration markers across delegation, planning, recovery, routing, and scheduling categories.
Representative evidence: .plans/openai-api-server.md:219, .plans/streaming-support.md:87, .github/ISSUE_TEMPLATE/setup_help.yml, AGENTS.md
Suggested handling: document one canonical main loop and clarify ownership for planning, routing, retries, recovery, and delegation.

3. Completion closure gap

Severity: high in personal profile
Layer: completion_closure
Representative evidence: AGENTS.md:82, AGENTS.md:664, CONTRIBUTING.md:86
Symptom: file creation and index update signals exist, but reusable memory closure is not always explicit.
Suggested handling: define a completion contract such as file created -> index updated -> impression/anchor/pointer registered -> acceptance evidence.

4. Memory freshness / generation confusion

Severity: high in personal profile
Layer: memory_freshness
Representative evidence: acp_adapter/session.py, gateway/session.py, plugins/memory/*
Symptom: 121 memory-like surfaces across 9 categories, with overlapping stems such as init, memory, plugin, readme, and session.
Suggested handling: document the authoritative current memory surface and mark archives/snapshots/summaries as secondary.

5. Role-play handoff orchestration

Severity: high in personal profile
Layer: orchestration
Representative evidence: RELEASE_v0.9.0.md:30, .plans/streaming-support.md:543, RELEASE_v0.2.0.md:177
Symptom: 130 role markers across builder, manager, researcher, reviewer categories, plus 1732 serial handoff markers.
Suggested handling: keep one owner for the user intent and reserve subagents for bounded evidence gathering or isolated execution.

6. Startup surface sprawl

Severity: high in personal profile
Layer: startup
Representative evidence: docker/entrypoint.sh, docker-compose.yml, gateway/platforms/whatsapp.py
Symptom: 42 startup-like files and 171 launcher/wrapper sites.
Suggested handling: clearly separate canonical local development startup, service startup, and optional integration launchers.

7. Runtime surface sprawl

Severity: high in personal profile
Layer: runtime_architecture
Representative evidence: RELEASE_v0.11.0.md:38, RELEASE_v0.11.0.md:317, .plans/streaming-support.md:590
Symptom: runtime markers span agent stack, ops, queue jobs, storage, UI, and web API surfaces.
Suggested handling: document the primary runtime path and explicitly classify optional runtime services.

8. LLM CLI worker contract incomplete

Severity: medium in personal profile
Layer: llm_cli_workers
Representative evidence: RELEASE_v0.11.0.md:18, RELEASE_v0.11.0.md:56, RELEASE_v0.3.0.md:233
Symptom: Hermescheck found 210 external LLM CLI worker markers, 0 task-envelope markers, 15 CLI prompt-contract markers, 2064 result-capture markers, and 3403 process-control markers.
Suggested handling: define a stable worker contract: Task JSON file, natural-language prompt or task-file reference to stdin, timeout/concurrency limits, stdout/stderr/exit capture, and merge back into the main agent context/observability path.

9. Duplicated skill / SOP artifacts

Severity: medium in personal profile
Layer: skill_system
Representative evidence: hermes_cli/skills_hub.py, tools/skills_hub.py, optional-skills/*
Symptom: 256 overlapping skill-like files across 27 duplicate groups.
Suggested handling: pick canonical skill/SOP locations and keep history in Git instead of parallel near-duplicates.

10. Hidden or secondary LLM calls

Severity: medium in personal profile
Layer: llm_routing
Representative evidence: agent/auxiliary_client.py:3038, agent/auxiliary_client.py:3258, agent/auxiliary_client.py:3300, tools/mixture_of_agents_tool.py
Suggested handling: if these paths are intentional, document their routing and observability contracts so extra model calls are predictable.

Enterprise Profile Notes

The enterprise profile promotes unsafe-code-execution patterns more aggressively. Several hits may be intentional tests, docs, comments, or red-team skill internals rather than production RCE paths. The full sanitized enterprise report is included in the artifact gist so maintainers can triage them in context.

Representative examples include:

environments/benchmarks/terminalbench_2/terminalbench2_env.py:284
environments/tool_call_parsers/qwen3_coder_parser.py:49
skills/red-teaming/godmode/scripts/auto_jailbreak.py
skills/red-teaming/godmode/scripts/load_godmode.py
tests/agent/test_bedrock_adapter.py
tests/tools/test_approval.py

Dynamic Runtime Smoke

I also ran a live runtime smoke in an isolated Hermes home with a small public-safe AGENTS.md memory fixture.

Doctor summary

Python 3.11.15 and Hermes virtual environment were active.
Required packages were present.
DeepSeek connectivity check passed.
Built-in memory provider was active.
Optional integrations such as Docker, agent-browser, browser-cdp, image generation, some web-search credentials, and Skills Hub were not available in the isolated home.

Runtime finding: provider registration mismatch for DeepSeek

Direct CLI provider selection failed:

hermes chat ... --provider deepseek ...
error: argument --provider: invalid choice: 'deepseek'

Exit code: 2.

The same model worked when the provider was supplied through config and the command only passed -m deepseek-v4-flash. That suggests deepseek is supported by config/status/runtime paths but is missing from the hermes chat --provider argparse choices.

Runtime smoke passed through config provider

{
  "status": "pass",
  "remembered_stateful_agent": "combine transcript replay with durable environment state",
  "remembered_context_budget": "stable prompt prefixes, explicit cache markers, compaction, page-fault style retrieval",
  "remembered_cli_worker": "natural-language task prompt or referenced Task JSON path, not raw JSON dumped blindly to stdin",
  "tool_runtime_observation": {
    "workspace_file": "AGENTS.md (571 bytes)",
    "available_tools": ["terminal", "process"],
    "secrets": false
  }
}

Reproduction

git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermescheck-hermes-agent
cd /path/to/hermescheck
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile personal \
  -o personal-audit.json \
  -r personal-audit.md \
  --sarif personal-audit.sarif.json
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile enterprise \
  -o enterprise-audit.json \
  -r enterprise-audit.md \
  --sarif enterprise-audit.sarif.json
uv run python -m hermescheck validate personal-audit.json
uv run python -m hermescheck validate enterprise-audit.json

Suggested Next Actions

Treat the token-shaped test fixture as either an intentional fake with explicit allowlisting or rename it to avoid credential-scanner false positives.
Verify whether deepseek should be a first-class value in hermes chat --provider choices.
Add or document contracts for stateful recovery, memory freshness, LLM CLI worker spawning, and completion closure.
Use the SARIF artifacts from the gist for code-scanning style triage if useful.

Thanks for building Hermes. I am sharing this as a community ecosystem audit and would be happy to rerun Hermescheck after fixes or provide smaller focused follow-up reports.

extent analysis

TL;DR

The most likely fix involves addressing the hardcoded secret-like fixture, internal orchestration sprawl, and provider registration mismatch for DeepSeek, as well as documenting contracts for stateful recovery, memory freshness, and LLM CLI worker spawning.

Guidance

Verify if the token-shaped test fixture in tests/agent/test_redact.py:70 is intentional and consider renaming it or adding scanner allowlisting to avoid false positives.
Investigate why deepseek is not a valid choice for hermes chat --provider despite being supported through config and runtime paths.
Document contracts for stateful recovery, memory freshness, LLM CLI worker spawning, and completion closure to address high-severity findings.
Review the SARIF artifacts from the gist for code-scanning style triage and prioritize fixes based on severity and impact.

Example

No specific code snippet is provided, but the issue suggests updating the hermes chat --provider argparse choices to include deepseek as a valid option.

Notes

The provided information is based on a community architecture quality audit, and some findings may require maintainer judgment to determine the best course of action. The audit reports are available in the gist for further review and triage.

Recommendation

Apply the suggested fixes and workarounds, starting with the highest-severity findings, to address the identified issues and improve the overall architecture quality of the hermes-agent repository.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.