hermes - 💡(How to fix) Fix [Important] Hermescheck community audit: static architecture scan + runtime smoke [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15568Fetched 2026-04-26 05:26:34
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×2commented ×1renamed ×1

I ran a community hermescheck audit against NousResearch/hermes-agent with both static architecture scans and a live runtime smoke test.

This is an open-source architecture quality audit rather than a vulnerability report. Hermescheck is an architecture-health scanner, so some findings are risk signals, cleanup opportunities, or documentation/contract gaps that need maintainer judgment.

Error Message

hermes chat ... --provider deepseek ... error: argument --provider: invalid choice: 'deepseek'

Root Cause

Important caveat: the runtime smoke used an operational Hermes checkout at 8fdc8bf0d on branch codex/reasoning-replay-state because that environment already had provider credentials and dependencies. The static findings above are from upstream main at e5647d7. The runtime finding below should be verified against current upstream main.

Code Example

hermes chat ... --provider deepseek ...
error: argument --provider: invalid choice: 'deepseek'

---

{
  "status": "pass",
  "remembered_stateful_agent": "combine transcript replay with durable environment state",
  "remembered_context_budget": "stable prompt prefixes, explicit cache markers, compaction, page-fault style retrieval",
  "remembered_cli_worker": "natural-language task prompt or referenced Task JSON path, not raw JSON dumped blindly to stdin",
  "tool_runtime_observation": {
    "workspace_file": "AGENTS.md (571 bytes)",
    "available_tools": ["terminal", "process"],
    "secrets": false
  }
}

---

git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermescheck-hermes-agent
cd /path/to/hermescheck
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile personal \
  -o personal-audit.json \
  -r personal-audit.md \
  --sarif personal-audit.sarif.json
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile enterprise \
  -o enterprise-audit.json \
  -r enterprise-audit.md \
  --sarif enterprise-audit.sarif.json
uv run python -m hermescheck validate personal-audit.json
uv run python -m hermescheck validate enterprise-audit.json
RAW_BUFFERClick to expand / collapse

Summary

I ran a community hermescheck audit against NousResearch/hermes-agent with both static architecture scans and a live runtime smoke test.

This is an open-source architecture quality audit rather than a vulnerability report. Hermescheck is an architecture-health scanner, so some findings are risk signals, cleanup opportunities, or documentation/contract gaps that need maintainer judgment.

Target

Static Scan Results

ProfileOverallCriticalHighMediumLowTotal
Personal developmentcritical164670123
Enterprise productioncritical3518700123

Architecture maturity result:

  • Era: AI age / 人工智能时代
  • Score: 100/100
  • Detected strengths: methodology layer, agent runtime, tool/syscall boundary, fact memory, skill memory, context compaction, semantic paging, page-fault recovery, impression cues, scheduler/workers, fair scheduling, capability table, semantic VFS, traces/evals, stateful recovery, environment-as-state, LLM CLI workers, task envelope, CLI prompt contract.

Top Static Findings

1. Hardcoded secret-like fixture detected

  • Severity: critical in both profiles
  • Layer: secrets_management
  • Evidence: tests/agent/test_redact.py:70
  • Symptom: test fixture includes a token-shaped string: MY_SECRET_TOKEN="supersecretvalue123456789"
  • Suggested handling: if this is intentional test data, consider renaming it to an obviously fake sentinel or adding scanner/test allowlisting so automated scanners do not treat it as a real credential.

2. Internal orchestration sprawl

  • Severity: high in personal profile
  • Layer: orchestration
  • Symptom: Hermescheck found 16418 orchestration markers across delegation, planning, recovery, routing, and scheduling categories.
  • Representative evidence: .plans/openai-api-server.md:219, .plans/streaming-support.md:87, .github/ISSUE_TEMPLATE/setup_help.yml, AGENTS.md
  • Suggested handling: document one canonical main loop and clarify ownership for planning, routing, retries, recovery, and delegation.

3. Completion closure gap

  • Severity: high in personal profile
  • Layer: completion_closure
  • Representative evidence: AGENTS.md:82, AGENTS.md:664, CONTRIBUTING.md:86
  • Symptom: file creation and index update signals exist, but reusable memory closure is not always explicit.
  • Suggested handling: define a completion contract such as file created -> index updated -> impression/anchor/pointer registered -> acceptance evidence.

4. Memory freshness / generation confusion

  • Severity: high in personal profile
  • Layer: memory_freshness
  • Representative evidence: acp_adapter/session.py, gateway/session.py, plugins/memory/*
  • Symptom: 121 memory-like surfaces across 9 categories, with overlapping stems such as init, memory, plugin, readme, and session.
  • Suggested handling: document the authoritative current memory surface and mark archives/snapshots/summaries as secondary.

5. Role-play handoff orchestration

  • Severity: high in personal profile
  • Layer: orchestration
  • Representative evidence: RELEASE_v0.9.0.md:30, .plans/streaming-support.md:543, RELEASE_v0.2.0.md:177
  • Symptom: 130 role markers across builder, manager, researcher, reviewer categories, plus 1732 serial handoff markers.
  • Suggested handling: keep one owner for the user intent and reserve subagents for bounded evidence gathering or isolated execution.

6. Startup surface sprawl

  • Severity: high in personal profile
  • Layer: startup
  • Representative evidence: docker/entrypoint.sh, docker-compose.yml, gateway/platforms/whatsapp.py
  • Symptom: 42 startup-like files and 171 launcher/wrapper sites.
  • Suggested handling: clearly separate canonical local development startup, service startup, and optional integration launchers.

7. Runtime surface sprawl

  • Severity: high in personal profile
  • Layer: runtime_architecture
  • Representative evidence: RELEASE_v0.11.0.md:38, RELEASE_v0.11.0.md:317, .plans/streaming-support.md:590
  • Symptom: runtime markers span agent stack, ops, queue jobs, storage, UI, and web API surfaces.
  • Suggested handling: document the primary runtime path and explicitly classify optional runtime services.

8. LLM CLI worker contract incomplete

  • Severity: medium in personal profile
  • Layer: llm_cli_workers
  • Representative evidence: RELEASE_v0.11.0.md:18, RELEASE_v0.11.0.md:56, RELEASE_v0.3.0.md:233
  • Symptom: Hermescheck found 210 external LLM CLI worker markers, 0 task-envelope markers, 15 CLI prompt-contract markers, 2064 result-capture markers, and 3403 process-control markers.
  • Suggested handling: define a stable worker contract: Task JSON file, natural-language prompt or task-file reference to stdin, timeout/concurrency limits, stdout/stderr/exit capture, and merge back into the main agent context/observability path.

9. Duplicated skill / SOP artifacts

  • Severity: medium in personal profile
  • Layer: skill_system
  • Representative evidence: hermes_cli/skills_hub.py, tools/skills_hub.py, optional-skills/*
  • Symptom: 256 overlapping skill-like files across 27 duplicate groups.
  • Suggested handling: pick canonical skill/SOP locations and keep history in Git instead of parallel near-duplicates.

10. Hidden or secondary LLM calls

  • Severity: medium in personal profile
  • Layer: llm_routing
  • Representative evidence: agent/auxiliary_client.py:3038, agent/auxiliary_client.py:3258, agent/auxiliary_client.py:3300, tools/mixture_of_agents_tool.py
  • Suggested handling: if these paths are intentional, document their routing and observability contracts so extra model calls are predictable.

Enterprise Profile Notes

The enterprise profile promotes unsafe-code-execution patterns more aggressively. Several hits may be intentional tests, docs, comments, or red-team skill internals rather than production RCE paths. The full sanitized enterprise report is included in the artifact gist so maintainers can triage them in context.

Representative examples include:

  • environments/benchmarks/terminalbench_2/terminalbench2_env.py:284
  • environments/tool_call_parsers/qwen3_coder_parser.py:49
  • skills/red-teaming/godmode/scripts/auto_jailbreak.py
  • skills/red-teaming/godmode/scripts/load_godmode.py
  • tests/agent/test_bedrock_adapter.py
  • tests/tools/test_approval.py

Dynamic Runtime Smoke

I also ran a live runtime smoke in an isolated Hermes home with a small public-safe AGENTS.md memory fixture.

Important caveat: the runtime smoke used an operational Hermes checkout at 8fdc8bf0d on branch codex/reasoning-replay-state because that environment already had provider credentials and dependencies. The static findings above are from upstream main at e5647d7. The runtime finding below should be verified against current upstream main.

Doctor summary

  • Python 3.11.15 and Hermes virtual environment were active.
  • Required packages were present.
  • DeepSeek connectivity check passed.
  • Built-in memory provider was active.
  • Optional integrations such as Docker, agent-browser, browser-cdp, image generation, some web-search credentials, and Skills Hub were not available in the isolated home.

Runtime finding: provider registration mismatch for DeepSeek

Direct CLI provider selection failed:

hermes chat ... --provider deepseek ...
error: argument --provider: invalid choice: 'deepseek'

Exit code: 2.

The same model worked when the provider was supplied through config and the command only passed -m deepseek-v4-flash. That suggests deepseek is supported by config/status/runtime paths but is missing from the hermes chat --provider argparse choices.

Runtime smoke passed through config provider

{
  "status": "pass",
  "remembered_stateful_agent": "combine transcript replay with durable environment state",
  "remembered_context_budget": "stable prompt prefixes, explicit cache markers, compaction, page-fault style retrieval",
  "remembered_cli_worker": "natural-language task prompt or referenced Task JSON path, not raw JSON dumped blindly to stdin",
  "tool_runtime_observation": {
    "workspace_file": "AGENTS.md (571 bytes)",
    "available_tools": ["terminal", "process"],
    "secrets": false
  }
}

Reproduction

git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermescheck-hermes-agent
cd /path/to/hermescheck
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile personal \
  -o personal-audit.json \
  -r personal-audit.md \
  --sarif personal-audit.sarif.json
uv run python -m hermescheck audit /tmp/hermescheck-hermes-agent \
  --profile enterprise \
  -o enterprise-audit.json \
  -r enterprise-audit.md \
  --sarif enterprise-audit.sarif.json
uv run python -m hermescheck validate personal-audit.json
uv run python -m hermescheck validate enterprise-audit.json

Suggested Next Actions

  1. Treat the token-shaped test fixture as either an intentional fake with explicit allowlisting or rename it to avoid credential-scanner false positives.
  2. Verify whether deepseek should be a first-class value in hermes chat --provider choices.
  3. Add or document contracts for stateful recovery, memory freshness, LLM CLI worker spawning, and completion closure.
  4. Use the SARIF artifacts from the gist for code-scanning style triage if useful.

Thanks for building Hermes. I am sharing this as a community ecosystem audit and would be happy to rerun Hermescheck after fixes or provide smaller focused follow-up reports.

extent analysis

TL;DR

The most likely fix involves addressing the hardcoded secret-like fixture, internal orchestration sprawl, and provider registration mismatch for DeepSeek, as well as documenting contracts for stateful recovery, memory freshness, and LLM CLI worker spawning.

Guidance

  • Verify if the token-shaped test fixture in tests/agent/test_redact.py:70 is intentional and consider renaming it or adding scanner allowlisting to avoid false positives.
  • Investigate why deepseek is not a valid choice for hermes chat --provider despite being supported through config and runtime paths.
  • Document contracts for stateful recovery, memory freshness, LLM CLI worker spawning, and completion closure to address high-severity findings.
  • Review the SARIF artifacts from the gist for code-scanning style triage and prioritize fixes based on severity and impact.

Example

No specific code snippet is provided, but the issue suggests updating the hermes chat --provider argparse choices to include deepseek as a valid option.

Notes

The provided information is based on a community architecture quality audit, and some findings may require maintainer judgment to determine the best course of action. The audit reports are available in the gist for further review and triage.

Recommendation

Apply the suggested fixes and workarounds, starting with the highest-severity findings, to address the identified issues and improve the overall architecture quality of the hermes-agent repository.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING