hermes - ✅(Solved) Fix [Bug]: resolve_api_key_provider_credentials() uses os.getenv for base_url_env_var — misses ~/.hermes/.env values [19 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18757Fetched 2026-05-03 04:54:28
View on GitHub
Comments
0
Participants
1
Timeline
25
Reactions
0
Participants
Timeline (top)
cross-referenced ×19labeled ×5referenced ×1

Error Message

Additional Logs / Traceback

Title generation failed: Error code: 401 - {'error': {'message': 'Invalid API Key', 'param': 'Please provide valid API Key', 'code': '401', 'type': 'invalid_key'}}

Root Cause

hermes_cli/auth.py L3529-3531resolve_api_key_provider_credentials():

# API key — uses get_env_value() ✅ (reads .env file)
for env_var in pconfig.api_key_env_vars:
    val = (get_env_value(env_var) or "").strip()

# base_url — uses os.getenv() ❌ (does NOT read .env file)
env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip()

hermes_cli/runtime_provider.py — same pattern:

env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")

Fix Action

Fixed

PR fix notes

PR #17246: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in NousResearch/hermes-agent.

Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass.

Issues resolved

  1. #18757 - resolve_api_key_provider_credentials() misses ~/.hermes/.env for base_url_env_var

    • Replaced os.getenv(...) with get_env_value(...) in API-key provider credential resolution.
    • Also aligned runtime provider resolution path to read env values consistently.
  2. #18705 - load_hermes_dotenv() overrides runtime env vars (override=True)

    • Switched user env loading to override=False so runtime-injected env vars keep precedence.
    • Updated function docstring behavior notes accordingly.
  3. #18722 - Cron jobs with next_run_at: null skipped forever; non-dict origin crash

    • Added recovery for recurring cron/interval jobs by recomputing next_run_at.
    • Hardened _resolve_origin() to tolerate non-dict origin payloads.
  4. #18742 - Kimi/Moonshot via aggregators misses reasoning-mode detection

    • _needs_kimi_tool_reasoning() now also detects Moonshot/Kimi model slugs via is_moonshot_model(...).
  5. #18744 - constraints_path dead config (not loaded)

    • Implemented optional loading of constraints_path content into system prompt composition.
  6. #18778 - Gateway scoped lock stale detection no-op on macOS/Windows

    • Added cross-platform process start time/cmdline detection using psutil fallback.
    • Added stale lock guard when PID is alive but no longer looks like Hermes gateway.

Files modified

  • hermes_cli/auth.py
  • hermes_cli/runtime_provider.py
  • hermes_cli/env_loader.py
  • cron/jobs.py
  • cron/scheduler.py
  • run_agent.py
  • gateway/status.py

Commit list

  • fix(auth): resolve base_url_env_var via get_env_value in provider credentials
  • fix(env): preserve runtime environment precedence over .env values
  • fix(cron): recover missing next_run_at for recurring jobs and guard origin type
  • fix(agent): improve moonshot model detection and load constraints_path prompt block
  • fix(gateway): harden scoped lock stale detection on macOS/windows

Changed files

  • Dockerfile (modified, +3/-2)
  • acp_adapter/session.py (modified, +12/-0)
  • agent/auxiliary_client.py (modified, +280/-28)
  • agent/context_compressor.py (modified, +496/-52)
  • agent/title_generator.py (modified, +2/-2)
  • agent/transports/chat_completions.py (modified, +14/-0)
  • agent/usage_pricing.py (modified, +4/-0)
  • cli-config.yaml.example (modified, +5/-0)
  • cli.py (modified, +27/-3)
  • cron/jobs.py (modified, +10/-2)
  • cron/scheduler.py (modified, +14/-4)
  • docker/entrypoint.sh (modified, +9/-1)
  • gateway/channel_directory.py (modified, +14/-4)
  • gateway/platforms/discord.py (modified, +33/-7)
  • gateway/platforms/email.py (modified, +12/-2)
  • gateway/platforms/feishu.py (modified, +34/-1)
  • gateway/platforms/qqbot/adapter.py (modified, +8/-2)
  • gateway/platforms/telegram_network.py (modified, +7/-2)
  • gateway/platforms/weixin.py (modified, +10/-1)
  • gateway/run.py (modified, +129/-32)
  • gateway/status.py (modified, +37/-2)
  • hermes_cli/auth.py (modified, +4/-4)
  • hermes_cli/commands.py (modified, +1/-1)
  • hermes_cli/config.py (modified, +271/-40)
  • hermes_cli/copilot_auth.py (modified, +1/-1)
  • hermes_cli/doctor.py (modified, +6/-1)
  • hermes_cli/env_loader.py (modified, +5/-4)
  • hermes_cli/gateway.py (modified, +16/-13)
  • hermes_cli/main.py (modified, +69/-3)
  • hermes_cli/memory_setup.py (modified, +1/-1)
  • hermes_cli/model_switch.py (modified, +6/-1)
  • hermes_cli/models.py (modified, +60/-2)
  • hermes_cli/profiles.py (modified, +16/-3)
  • hermes_cli/runtime_provider.py (modified, +17/-14)
  • hermes_cli/setup.py (modified, +8/-2)
  • hermes_cli/slack_cli.py (modified, +1/-2)
  • hermes_cli/status.py (modified, +17/-2)
  • hermes_cli/web_server.py (modified, +1/-1)
  • hermes_constants.py (modified, +16/-3)
  • model_tools.py (modified, +44/-13)
  • run_agent.py (modified, +413/-82)
  • setup-hermes.sh (modified, +23/-12)
  • skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
  • tests/agent/test_context_compressor.py (modified, +389/-0)
  • tests/agent/transports/test_chat_completions.py (modified, +11/-0)
  • tests/gateway/test_compress_command.py (modified, +49/-0)
  • tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
  • tests/hermes_cli/test_config.py (modified, +17/-0)
  • tests/run_agent/test_413_compression.py (modified, +81/-1)
  • tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
  • tests/run_agent/test_run_agent.py (modified, +100/-13)
  • tests/tools/test_skill_manager_tool.py (modified, +270/-0)
  • tools/approval.py (modified, +1/-1)
  • tools/delegate_tool.py (modified, +4/-1)
  • tools/environments/docker.py (modified, +36/-5)
  • tools/environments/local.py (modified, +8/-1)
  • tools/file_operations.py (modified, +70/-67)
  • tools/file_tools.py (modified, +13/-2)
  • tools/send_message_tool.py (modified, +72/-2)
  • tools/session_search_tool.py (modified, +2/-2)
  • tools/skill_manager_tool.py (modified, +82/-21)
  • tools/skills_tool.py (modified, +13/-1)
  • tools/terminal_tool.py (modified, +6/-0)
  • tools/tool_backend_helpers.py (modified, +15/-5)
  • tools/tts_tool.py (modified, +27/-16)
  • tools/voice_mode.py (modified, +23/-10)
  • toolsets.py (modified, +14/-1)
  • tui_gateway/server.py (modified, +5/-3)
  • ui-tui/src/app/turnController.ts (modified, +1/-1)
  • ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
  • ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
  • ui-tui/src/gatewayTypes.ts (modified, +1/-0)
  • utils.py (modified, +9/-0)
  • uv.lock (modified, +161/-2)
  • website/docs/reference/environment-variables.md (modified, +1/-1)

PR #18797: fix(auth): use get_env_value for base_url_env_var instead of os.getenv (#18757)

Description (problem / solution / changelog)

Fix #18757: Use get_env_value() for base_url_env_var instead of os.getenv()

Problem

resolve_api_key_provider_credentials() in hermes_cli/auth.py resolves base_url_env_var using os.getenv(), which does not read from ~/.hermes/.env. Providers with a custom base URL stored only in .env (not exported in the shell environment) silently fall back to the wrong endpoint — the default inference_base_url from PROVIDER_REGISTRY.

Fix

  • Replace os.getenv() with get_env_value() for base_url_env_var in auth.py (4 sites) and runtime_provider.py (1 site)
  • This ensures custom base URLs stored in ~/.hermes/.env are properly resolved

Testing

  • Verified that providers with base_url_env_var set in ~/.hermes/.env now correctly resolve their custom endpoints
  • get_env_value() already handles both os.environ and .env file lookups

Changed files

  • hermes_cli/auth.py (modified, +11/-4)
  • hermes_cli/runtime_provider.py (modified, +3/-1)

PR #18908: fix: use get_env_value() for base_url_env_var resolution

Description (problem / solution / changelog)

Summary

resolve_api_key_provider_credentials() and related functions used os.getenv() for base_url_env_var, which does NOT read ~/.hermes/.env. Providers with custom base URLs stored only in .env hit the wrong endpoint.

Root Cause

API key resolution already correctly uses get_env_value() — this was an inconsistency where keys were found but base URLs were not.

Fix

Replace os.getenv() with get_env_value() at all 5 occurrences:

FileFunction
auth.pyget_api_key_provider_status()
auth.pyget_external_process_provider_status()
auth.pyresolve_api_key_provider_credentials()
auth.pyresolve_external_process_provider_credentials()
runtime_provider.py_resolve_explicit_runtime()

Same Bug Class

  • #15914 → PR #16101 (api_key + credential_pool)
  • #17140 → PR #17434 (TTS/STT tools)

The base_url_env_var resolution was missed in both prior fixes.

Scope

  • 2 files, +14 / -5 lines
  • No behavioral change for env vars already in os.environ
  • Purely fixes the .env file reading gap

Fixes #18757

Changed files

  • hermes_cli/auth.py (modified, +12/-4)
  • hermes_cli/runtime_provider.py (modified, +2/-1)

PR #18910: fix(doctor): read env vars from .env and default to China DashScope endpoint

Description (problem / solution / changelog)

Summary

hermes doctor API-key health checks had two bugs:

Bug 1: env vars from .env invisible to doctor

os.getenv() does not read ~/.hermes/.env. Keys and base URLs stored only in .env (not exported to the shell) were invisible to all 16 api-key provider health checks.

Fix: Replace os.getenv() with get_env_value() for both API key and base URL resolution.

Bug 2: DashScope default URL is international-only

The default health-check URL was dashscope-intl.aliyuncs.com (international). China-region keys — the vast majority of DashScope users — are valid only on dashscope.aliyuncs.com. Doctor reported these as invalid.

Fix: Default to dashscope.aliyuncs.com (China). Users with DASHSCOPE_BASE_URL set are unaffected.

Same Bug Class

  • #14134 (PR #18906) — api_key drift on provider switch
  • #15914 (PR #16101) — api_key + credential_pool
  • #17140 (PR #17434) — TTS/STT tools
  • #18757 (PR #18908) — base_url_env_var in auth.py/runtime_provider.py

Testing

23 existing doctor tests: all pass, zero regression.

Scope

  • 1 file: hermes_cli/doctor.py (+5 / -4)
  • Affects all 16 api-key provider health checks (env var reading)
  • DashScope default URL change

Fixes #18904

Changed files

  • hermes_cli/doctor.py (modified, +28/-6)

PR #18948: fix(auth): resolve base_url_env_var via get_env_value everywhere (closes #18757)

Description (problem / solution / changelog)

Closes #18757

resolve_api_key_provider_credentials() and friends read base_url_env_var via os.getenv(), which never consults ~/.hermes/.env. A user who sets, e.g., XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/v1 only in the dotenv file silently falls back to the registry default and gets 401s on auxiliary tasks. API keys are read correctly via get_env_value() — base URLs are not. This is the same bug class fixed for API keys in #16101 and for TTS/STT in #17434.

Why this PR rather than #17246

#17246 (diff) targets the same issue but only patches 2 of 6 buggy spots, and despite the PR description claiming runtime_provider.py was "aligned", the diff doesn't touch that file at all.

FileFunction#17246This PR
hermes_cli/auth.pyget_api_key_provider_status
hermes_cli/auth.pyresolve_api_key_provider_credentials
hermes_cli/auth.pyget_external_process_provider_status (Copilot ACP)
hermes_cli/auth.pyresolve_external_process_provider_credentials
hermes_cli/runtime_provider.pyresolve_runtime_provider (api_key branch)❌ (claimed but absent)
hermes_cli/model_switch.py_refresh_curated_models builtin endpoint dedup

#17246 also bundles 5 unrelated fixes (cron, env precedence, moonshot detection, gateway lock, etc.) — this PR is scoped solely to #18757 so it can land without dragging the rest along.

Behavior

get_env_value() already preserves shell-export precedence — os.environ wins when the variable is exported, dotenv is the fallback. Existing deployments are unaffected; users who relied on the dotenv file finally get the right endpoint.

Tests

New tests/hermes_cli/test_base_url_dotenv_resolution.py (7 tests):

  • resolve_api_key_provider_credentials / get_api_key_provider_status read base URL from ~/.hermes/.env (Xiaomi)
  • resolve_external_process_provider_credentials / *_status read base URL from ~/.hermes/.env (Copilot ACP)
  • resolve_runtime_provider honours dotenv on the api_key branch
  • model_switch dedup helper is wired through get_env_value
  • Regression guard: shell exports still beat dotenv values
522 passed, 522 warnings in 6.24s

Run on the full intersection of test files touching the changed modules — full green, no regressions.

Files changed

  • hermes_cli/auth.py — top-level get_env_value import + 4 call sites
  • hermes_cli/runtime_provider.py — 1 call site
  • hermes_cli/model_switch.py — 1 call site (with os.environ fallback for safety)
  • tests/hermes_cli/test_base_url_dotenv_resolution.py — new

Closes #18757.

Changed files

  • hermes_cli/auth.py (modified, +20/-5)
  • hermes_cli/model_switch.py (modified, +7/-1)
  • hermes_cli/runtime_provider.py (modified, +5/-1)
  • tests/hermes_cli/test_base_url_dotenv_resolution.py (added, +192/-0)

PR #18788: fix(web/dashboard): skip xterm.js WebGL renderer on Safari to fix Unicode box-drawing glyphs

Description (problem / solution / changelog)

Summary

Fixes #18773. In the dashboard Chat tab, Safari's WebGL renderer mangles Unicode box-drawing characters (╔╗║╚╝, ██╗, etc.) used by the HERMES AGENT banner and TUI borders — they fragment into blocks instead of forming proper shapes. Chrome and Firefox WebGL render the same glyphs correctly.

This is a known xterm.js + Safari WebKit interaction (also affects VS Code Server, JupyterLab, etc.).

Fix

web/src/pages/ChatPage.tsx: skip the WebglAddon on Safari and let xterm.js fall back to the default DOM renderer, which renders the box-drawing glyphs faithfully on Safari.

A short isSafariBrowser() helper detects macOS/iOS Safari without false positives on Chromium derivatives:

  • Requires the Safari/ UA token.
  • Excludes Chromium fingerprints (Chrome/, Chromium/, CriOS/) — Chromium-based browsers all advertise Safari/ in their UA for legacy compat.
  • Excludes other WebKit-wrapping shells we know don't hit the bug (FxiOS/, EdgiOS/, Android UAs).

Existing WebGL gate (terminalTierWidthPx(host) >= 768) is preserved, so wide layouts on Chrome/Firefox/Edge still get the crisp WebGL rendering. Only Safari at any width goes to the DOM renderer.

Why not the issue's proposed rendererType: 'dom'?

The proposed new Terminal({ rendererType: 'dom' }) option is from xterm.js v4 and was removed in v5+. The repo is on @xterm/xterm@^6.0.0, where renderer choice is controlled via addons (WebglAddon, CanvasAddon). Skipping the WebglAddon is the modern equivalent.

Verification

  • tsc -b: clean.
  • vite build: clean (1.65s).
  • Diff scope: web/src/pages/ChatPage.tsx only, +38/-1.

What I did NOT change

  • Chrome/Firefox/Edge WebGL path (still active for wide layouts).
  • Mobile/narrow layout fallback (< 768px → DOM renderer; unchanged).
  • The Safari path uses the default DOM renderer, not the canvas addon — adding @xterm/addon-canvas would be a bigger dependency change and the DOM renderer already renders box-drawing correctly on Safari per the issue and xterm.js docs.

cc @bb @W0921

Closes #18773

Changed files

  • web/src/pages/ChatPage.tsx (modified, +38/-1)

PR #17349: fix(compressor): shrink protect_first_n on recompaction (#17344)

Description (problem / solution / changelog)

Closes #17344.

Bug

Reporter traced a 6-session compression chain in which every child session carried the identical original first user request — as if no progress had been made. After resume (or on new sessions opened post-compression), the model re-executes the original first task instead of continuing from the handoff summary's ## Active Task.

Root cause

ContextCompressor protects protect_first_n=3 messages at the head — [system, user1, assistant1]. On every cycle that head is preserved verbatim:

[system + compaction note, user1 (ORIGINAL), assistant1, summary, …tail…, latest_user]

SUMMARY_PREFIX says "resume from ## Active Task," but user1 (ORIGINAL) is sitting right next to it as a still-prominent user-role message. The model latches onto the first plausible unanswered request and re-executes it — structured summary prose loses against direct attention on a user message. After 6 cycles that same user1 has been re-anchored 6 times.

Fix

On the second and subsequent compactions — detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time — shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool, and the structured ## Active Task section becomes the sole steering signal as designed.

The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default.

effective_protect_first_n = self.protect_first_n
if self._is_recompaction(messages) and self.protect_first_n > 1:
    effective_protect_first_n = 1

Detection signal

Reuses the existing compaction note already written to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring ("earlier conversation turns have been compacted into a handoff summary") so PR #17301 — which expands the note text — will not break detection. New helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper.

Why not just strengthen SUMMARY_PREFIX?

The prefix already says "Respond ONLY to the latest user message that appears AFTER this summary." Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: "the model responds as if the session had just started." That's an architectural problem, not a wording problem.

Coordination with PR #17301 / #17251

PR #17301 (open, by @HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies "background reference" framing to memory and skills. Both fixes stem from the same root concern (compaction handoff misinterpreted by the model) but are orthogonal — #17301 carves out exceptions inside SUMMARY_PREFIX text; this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter.

Out of scope

The reporter also flagged parent_session_id = NULL observations on chained sessions. That's a separate DB-write concern — run_agent.py:8891 explicitly passes parent_session_id=old_session_id and resolve_resume_session_id (#15000) handles chain-walking. If NULL is observed it's likely a different write-path failure and deserves its own bug. This PR stays focused on the message-level fix that unbreaks the user-visible "restarts first task" behaviour.

Tests

TestIsRecompaction (6 cases) — sentinel detection edge cases:

  • test_fresh_system_prompt_is_not_recompaction
  • test_system_prompt_with_compaction_note_is_recompaction
  • test_empty_messages_safe
  • test_non_system_first_message_is_not_recompaction
  • test_multimodal_system_content_is_inspected
  • test_garbage_content_does_not_raise

TestRecompactionShrinksProtectFirstN (5 cases) — behavioural:

  • test_first_compaction_preserves_first_exchange_in_head (control)
  • test_recompaction_demotes_first_exchange_to_summary (the bug)
  • test_recompaction_preserves_latest_user_message_in_tail
  • test_recompaction_keeps_protect_first_n_attribute_unchanged
  • test_protect_first_n_one_no_op_for_recompaction

TestRecompactionMinForCompressGate (1 case)_min_for_compress early-return uses the effective (post-shrink) head count.

$ python -m pytest tests/agent/test_context_compressor.py \
                   tests/agent/test_context_compressor_recompaction.py \
                   tests/run_agent/test_compression_boundary_hook.py \
                   tests/run_agent/test_compression_persistence.py \
                   tests/run_agent/test_413_compression.py -q
99 passed in 8.26s

87 pre-existing + 12 new, zero regressions.

Changed files

  • agent/context_compressor.py (modified, +48/-2)
  • tests/agent/test_context_compressor_recompaction.py (added, +234/-0)

PR #17329: fix(delegate): surface tool_trace on N-API-call subagent timeouts (#17308)

Description (problem / solution / changelog)

Closes #17308.

Problem

When a subagent under delegate_task times out after making >0 API calls, the lead agent gets a vague string and nothing else:

Subagent timed out after 120s with 3 API call(s) completed — likely stuck on a slow API call or unresponsive network request.

There's no way to tell apart the two failure modes:

  1. Tool finished, next LLM request hung — the tool itself is fine; the provider froze.
  2. Tool itself hung — network partition, blocked I/O, etc.

This was the gap between the two existing diagnostic paths:

PathCoverage
Normal completion (#1175)tool_trace in return dict
0-API-call timeout (#15105)diagnostic_path with structured log
N-API-call timeoutNone ← this PR

Fix

Three pieces:

1. Extract a shared trace builder

The normal-completion branch already reconstructs tool_trace from result['messages']. Pulled that loop out into a module-level _build_tool_trace_from_messages() helper so both branches use one implementation.

2. Reconstruct trace on the N-API-call timeout branch

In _run_single_child's timeout branch (when is_timeout and child_api_calls > 0):

  • Read child._session_messages and run it through the helper.
  • If the trace tail has no matching tool-role response → mark status='in_progress' (the tool itself is hung).
  • Read get_activity_summary().current_tool. If it disagrees with the trace tail, prefer it — the tool-role write can lag because the agent writes the assistant message first and the tool response only after the tool returns.

3. Surface the diagnostics

Return dict now carries tool_trace, last_tool, last_tool_status, current_tool. Error message gets a last_tool=X (status=Y) suffix so it shows up in logs and the lead's prompt:

Subagent timed out after 120s with 3 API call(s) completed — likely stuck on a slow API call or unresponsive network request. last_tool=terminal (status=in_progress)

0-API-call timeouts (diagnostic_path branch) and non-timeout errors leave the new fields empty/None so consumers don't read stale data.

Tests

Added two test classes in tests/tools/test_delegate_subagent_timeout_diagnostic.py:

TestRunSingleChildTimeoutToolTrace — end-to-end through _run_single_child with a tiny timeout:

  • test_timeout_after_completed_tool_marks_status_ok — tool returned cleanly → status=ok, current_tool=None
  • test_timeout_inside_running_tool_marks_status_in_progress — tool never returned → status=in_progress, current_tool set
  • test_timeout_with_tool_error_preserves_error_status — error responses keep status=error
  • test_timeout_with_parallel_tool_calls_pairs_by_id — out-of-order replies still pair correctly
  • test_zero_api_call_timeout_skips_tool_trace — 0-API branch keeps the new fields empty (no stale data alongside diagnostic_path)
  • test_timeout_with_no_session_messages_attr_does_not_crash — degrades to empty trace if _session_messages is absent

TestBuildToolTraceFromMessages — direct unit tests for the extracted helper (non-list input, non-dict entries, assistants without tool_calls, tool responses without tool_call_id).

$ python -m pytest tests/tools/test_delegate_subagent_timeout_diagnostic.py -q
.................                                                       [100%]
17 passed in 3.88s

Combined with the existing test_delegate.py suite: 137/137 pass.

Changed files

  • tests/tools/test_delegate_subagent_timeout_diagnostic.py (modified, +254/-0)
  • tools/delegate_tool.py (modified, +113/-32)

PR #17325: fix(telegram): stop large videos from triggering infinite model fallback (#17302)

Description (problem / solution / changelog)

Closes #17302.

Summary

When a Telegram video > 20 MB hits the bot, getFile() raises BadRequest("File is too big"). The current handler catches the exception, logs a warning, and falls through to handle_message(event) with an effectively empty event — the agent then burns through every fallback model (15+ retries in the reporter's logs) trying to respond to nothing.

This PR fixes it with three layered defenses, mirroring the size-check pattern that the non-video document branch already uses:

1. Pre-check file_size before downloading

Both the native msg.video branch and the video-as-document branch now verify file_size <= 20 MB before calling get_file(). Oversize / unverifiable videos short-circuit with a user-visible message ("Telegram's Bot API limits file downloads to 20 MB…") and message_type=VIDEO. The agent gets a meaningful event instead of a blank one.

2. Trap "File is too big" inside the except block

For forwarded-video / edited-message edge cases where file_size lies, the BadRequest is now caught at runtime and event.text is set to an explanatory message instead of being left blank. This is the surgical fix that prevents the fallback storm even when the pre-check is bypassed.

3. Optional opt-out: telegram.extra.ignore_videos: true

When set, video messages (native and video/* MIME documents) are dropped at the top of _handle_media_message, before any work is done. Other media (PDFs, photos, voice, audio) is unaffected.

Why not the issue's exact diff

The issue's proposed diff calls self._send_safe_message(...) which doesn't exist in this codebase (only self._bot.send_message(...) does). I kept the spirit of the suggestion — the friendly text message — but routed it through the existing event.text + handle_message() path that the document handler already uses for "Unsupported document type" and "too large or unverifiable", so the fix is consistent with the surrounding code rather than introducing a new send pattern.

Tests

Added 8 new tests in tests/gateway/test_telegram_documents.py::TestVideoDownloadBlock:

  • test_oversize_native_video_short_circuits_with_friendly_text
  • test_unverifiable_native_video_size_short_circuits (parity with the existing document file_size=None security fix)
  • test_oversize_video_document_short_circuits
  • test_native_video_get_file_too_big_does_not_send_blank_event
  • test_video_document_get_file_too_big_does_not_send_blank_event
  • test_ignore_videos_config_skips_native_video
  • test_ignore_videos_config_skips_video_documents
  • test_ignore_videos_does_not_block_pdfs

The existing _make_video() helper now defaults file_size=1024 so the prior happy-path test still passes through the new gate.

$ python -m pytest tests/gateway/test_telegram_documents.py -q
............................................                          [100%]
44 passed in 3.50s

python3 -c "import ast; ast.parse(open('gateway/platforms/telegram.py').read())" clean.

Out of scope

The reporter's _send_safe_message reply-via-Telegram pattern is more invasive than needed; the agent-event path is sufficient and consistent with how the document handler already communicates blocked uploads. Happy to add a direct reply if maintainers prefer it.

Changed files

  • gateway/platforms/telegram.py (modified, +82/-4)
  • tests/gateway/test_telegram_documents.py (modified, +120/-1)

PR #17323: docs(run_agent): note xiaomi/MiMo empirical exclusion from reasoning whitelist (#17314)

Description (problem / solution / changelog)

Closes #17314 (the suggested follow-up #1 — inline doc).

Summary

#17314 empirically established that xiaomi/MiMo (mimo-v2.5-pro) accepts reasoning_effort at the schema layer but produces statistically indistinguishable reasoning depth, length, and accuracy across none / low / medium / high (4 efforts × N=5 on AIME 2025 II P2; Mann-Whitney U pairwise p > 0.1 on every pair; 100% accuracy on all 20 trials; identical solution paths in inspected reasoning_content).

The conservative default in _supports_reasoning_extra_body()not whitelisting xiaomi — is therefore correct. Forwarding the field would just ship a no-op.

This PR adds a docstring note documenting the empirical test so a future PR doesn't "complete the list" by adding xiaomi without first re-verifying server-side behavior.

Changes

  • run_agent.py: extended _supports_reasoning_extra_body() docstring with an "Empirically excluded providers" section noting the xiaomi/MiMo result, the test methodology, and the link back to #17314.
  • No code-path changes. No behavior changes. AST-parses clean.

Not included (intentional)

The issue's optional follow-up #2 — surfacing a startup warning when agent.reasoning_effort is set in config.yaml for a provider that won't forward it — is broader-scope (touches every excluded provider, not just xiaomi) and is left as a separate optional follow-up so this PR stays a minimal, mergeable doc-only change.

Tests

  • python3 -c "import ast; ast.parse(open('run_agent.py').read())" clean.
  • Docs-only change; no functional test surface.

Changed files

  • run_agent.py (modified, +14/-0)

PR #16381: fix(doctor): use importlib.util.find_spec for editable-install detection

Description (problem / solution / changelog)

Closes #16365.

Problem

hermes doctor reports ⚠ tinker-atropos found but not installed even when the package is installed editably (uv pip install -e ./tinker-atropos) and importable from the same interpreter — exactly the case @rbrowning85 hit.

$ python -c "import tinker_atropos; print('OK')"
OK
$ uv pip list | grep tinker
tinker-atropos 0.1.0 (editable install)
$ hermes doctor
◆ Submodules
⚠ tinker-atropos found but not installed (run: uv pip install -e ./tinker-atropos)

Root cause: the doctor check uses __import__("tinker_atropos") inside a try/except ImportError. That probe can fail in launcher contexts (e.g. ~/.local/bin/hermes) whose sys.path or import-machinery state differs from the active shell — particularly for editable installs hooked through .pth shims — even though the spec is locatable via importlib.

Fix

Adopt the issue's preferred approach (option 2) and centralize it in a _module_available() helper:

def _module_available(module: str) -> bool:
    import importlib.util
    try:
        return importlib.util.find_spec(module) is not None
    except (ImportError, ValueError):
        return False

find_spec only checks importability — never executes the module body — so it's:

  • Reliable for editable installs across launcher contexts.
  • Side-effect free.
  • Robust to a transitive dep failing to import (we only care whether tinker_atropos itself is locatable).
  • Defensive against ValueError from malformed spec strings and ImportError from a parent package failing to load.

Then swap the __import__ probe in the tinker-atropos branch (hermes_cli/doctor.py) for _module_available("tinker_atropos").

I deliberately scoped this to the tinker-atropos check rather than the general required_packages / optional_packages loops, since those check user-facing third-party libs where the existing import-with-side-effects probe is fine and the install-cmd advice is the goal anyway. Happy to broaden if maintainers prefer.

Tests

tests/hermes_cli/test_doctor.py::TestModuleAvailable — 5 new tests:

  • test_returns_true_for_stdlib_module — sanity check (json).
  • test_returns_false_for_missing_module — non-existent module returns False.
  • test_returns_false_on_value_error — malformed spec swallowed.
  • test_returns_false_when_parent_package_fails — parent-package ImportError swallowed.
  • test_does_not_execute_module_body — confirms find_spec path doesn't import the body, the property that fixes the editable-install case.
pytest tests/hermes_cli/test_doctor.py -q
28 passed (23 pre-existing + 5 new)

Changed files

  • hermes_cli/doctor.py (modified, +27/-3)
  • tests/hermes_cli/test_doctor.py (modified, +48/-0)

PR #16380: fix(error_classifier): gate absolute msg/token heuristics to small context windows

Description (problem / solution / changelog)

Closes #16351.

Problem

agent/error_classifier.py flagged non-context errors as context_overflow in long-context (1M) Codex/GPT-5.x sessions, purely because num_messages > 80 (generic 400) or num_messages > 200 (disconnect) — even when approx_tokens was a fraction of the actual budget.

Repro from the issue:

classify_api_error(
    FakeHTTP400(),
    provider="openai-codex",
    model="gpt-5.5",
    approx_tokens=74320,
    context_length=1_000_000,
    num_messages=432,
)
# Before: FailoverReason.context_overflow (retryable=True, should_compress=True)
# After:  FailoverReason.format_error      (retryable=False, should_compress=False)

That sent format errors into the compression/probe-down path, causing unnecessary compaction and stale handoff pollution on 1M sessions.

Fix

Apply exactly the gate suggested in the issue body: scope absolute token/message-count fallbacks to context_length <= 256000. Relative pressure thresholds (> 0.6 for disconnect, > 0.4 for generic 400) still fire on any context size.

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or (
    context_length <= 256000 and (approx_tokens > 120000 or num_messages > 200)
)

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or (
    context_length <= 256000 and (approx_tokens > 80000 or num_messages > 80)
)

Existing behavior for ~128K/200K context windows is unchanged.

Tests

tests/agent/test_error_classifier.py — 4 new tests covering the 1M-context regime:

  • test_400_generic_1m_context_high_message_count_not_overflow — exact repro from issue (74K tokens, 432 msgs, 1M ctx) → format_error.
  • test_400_generic_1m_context_relative_pressure_still_overflow — 500K tokens / 1M ctx still → context_overflow.
  • test_disconnect_1m_context_high_message_count_is_timeout — 150K tokens, 300 msgs, 1M ctx → timeout.
  • test_disconnect_1m_context_relative_pressure_still_overflow — 700K tokens / 1M ctx still → context_overflow.
pytest tests/agent/test_error_classifier.py -q
122 passed (118 pre-existing + 4 new)

Changed files

  • agent/error_classifier.py (modified, +6/-2)
  • tests/agent/test_error_classifier.py (modified, +62/-0)

PR #16373: fix(memory): return existing entry previews on zero-match in replace/remove

Description (problem / solution / changelog)

Closes #16266.

Problem

memory.replace / memory.remove zero-match returned a bare "No entry matched '...'" error with no context about what entries actually exist. LLMs that paraphrased instead of substring-matching would burn a full tool round trip every time, producing the consistent [error] → retry-success pattern @ahmadhawamdah documented.

Fix

Mirror the multi-match branch behavior on zero-match: return the same 80-char truncated previews under a new existing_entries field so the next LLM call can pick a correct old_text. Symmetric in remove().

Diff

tools/memory_tool.py — replace + remove zero-match returns:

previews = [e[:80] + ("..." if len(e) > 80 else "") for e in entries]
return {
    "success": False,
    "error": f"No entry matched '{old_text}'.",
    "existing_entries": previews,
}

Additive on the error response — no schema change for callers.

Tests

tests/tools/test_memory_tool.py — 2 new tests:

  • test_replace_no_match_returns_existing_previews — asserts existing_entries shape, truncation, ellipsis bound.
  • test_remove_no_match_returns_existing_previews — same for remove.
pytest tests/tools/test_memory_tool.py -q
35 passed (33 pre-existing + 2 new)

Changed files

  • tests/tools/test_memory_tool.py (modified, +19/-0)
  • tools/memory_tool.py (modified, +12/-2)

PR #14354: fix(ssh): forward skill-allowlisted env vars over SSH via SendEnv

Description (problem / solution / changelog)

Fixes #14091

The bug

tools/env_passthrough.py already builds an allowlist of env vars that should reach sandboxed environments — populated from skill required_environment_variables frontmatter and terminal.env_passthrough config. The local and code_execution backends consult is_env_passthrough() / get_all_passthrough(), but SSHEnvironment never did.

Result: when terminal_backend: ssh, every ssh ... bash -c subprocess inherits a stripped child environment that excludes the allowlist, and the remote bash session sees no skill-declared variables — even with AcceptEnv * configured on the remote sshd.

The reporter's diagnosis is correct: the SSH command was missing -o SendEnv=NAME.

Fix

In _build_ssh_command():

  • Append -o SendEnv=<NAME> for every var in get_all_passthrough() that is actually present in os.environ.
  • Names are sorted for deterministic command construction (so ControlMaster connection reuse stays stable).

In _run_bash():

  • -o SendEnv only forwards names; the OpenSSH client reads values from its own process environment. Pass them explicitly via a new _build_subprocess_env() so allowlisted vars are guaranteed to be present even if a future caller scrubs the parent env.

Failure to import or call get_all_passthrough() is non-fatal — SSH still works, just without forwarding (matches the existing best-effort posture in skills_tool.py).

Why this is the right layer

  • Same allowlist source as local / code_execution — single source of truth (get_all_passthrough()), so security guarantees stay consistent (Hermes provider credentials are still blocked from skill registration per GHSA-rhgp-j443-p4rf).
  • Doesn't bypass remote sshd policy — admins still need AcceptEnv on the remote (the issue notes AcceptEnv *); we just stop silently dropping the names client-side.
  • Zero behavior change when no skill/config registers anything — empty allowlist ⇒ no SendEnv flags, no env override.

Verification

uv run --frozen --python 3.11 --extra dev pytest -o addopts='' \
  tests/tools/test_ssh_environment.py \
  tests/tools/test_ssh_bulk_upload.py \
  tests/tools/test_sync_back_backends.py -q
52 passed, 11 skipped

7 new targeted tests in TestBuildSSHCommand:

  • test_no_send_env_when_no_passthrough_registered — zero behavior change when allowlist empty
  • test_send_env_added_for_registered_passthrough_var — the actual #14091 case
  • test_send_env_skips_unset_vars — allowlisted-but-unset vars don't leak as empty SendEnv lines
  • test_send_env_is_deterministic — sorted order for ControlMaster reuse stability
  • test_passthrough_failure_is_non_fatal — SSH keeps working if env_passthrough breaks
  • test_subprocess_env_includes_passthrough_values — values propagated to ssh client process env
  • test_subprocess_env_is_none_when_no_passthrough — don't override default child-env semantics unnecessarily

Notes

  • Minimal, focused diff. Uses the existing env_passthrough infrastructure rather than introducing a parallel mechanism.
  • Doesn't touch skills_tool.py's setup_needed reporting — that's accurate for the registration side; the bug was purely on the SSH consumer side.
  • No new dependencies.

Changed files

  • tests/tools/test_ssh_environment.py (modified, +83/-0)
  • tools/environments/ssh.py (modified, +47/-1)

PR #14332: fix(gateway): treat recycled PID with unreadable start_time as stale (#14176)

Description (problem / solution / changelog)

What does this PR do?

gateway/status.py::find_gateway_pids() iterates over the PIDs recorded in ~/.hermes/gateway.lock to decide whether the gateway is "still running". For each candidate it:

  1. Checks the PID is alive (os.kill(pid, 0)).
  2. Compares the recorded start_time against the live process's start_time to detect PID recycling.
  3. Falls back to _looks_like_gateway_process(pid) / _record_looks_like_gateway(record) heuristics.

When the recycled PID is owned by a different UID (typical on Linux when /proc/<pid>/stat is owned by another user, or under rootless container setups), _get_process_start_time returns None. The recorded-vs-live mismatch check then can't fire (current_start is None), and _looks_like_gateway_process can give a false positive on any long-lived python or hermes-related process the user happens to own. Result: the gateway thinks it's still running, refuses to start, and the user has to manually rm ~/.hermes/gateway.pid to recover.

Reporter (#14176) sees this in production with a systemd user service that restarts the gateway nightly — every few weeks the next PID up the queue lands on a recycled foreign PID, the lock file goes stale, and hermes gateway start fails with "Gateway already running".

Fix: be conservative. When the PID record carries a recorded_start but we can't read the candidate's current_start, skip the candidate (treat as stale) instead of falling through to the heuristic. Outside /proc-readable territory we don't have enough information to confirm this is the same gateway process, so prefer "no" over "maybe".

Related Issue

Fixes #14176

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • gateway/status.py (+10 / −0): in the find_gateway_pids() candidate loop, skip any PID whose recorded start_time exists but whose live start_time is unreadable. Same code path as the existing recorded-vs-live mismatch case, just covering the unreadable variant.
  • tests/gateway/test_status.py (+40 / −0): one new regression case under TestGatewayPidState, test_get_running_pid_treats_recycled_pid_with_unreadable_start_time_as_stale. Monkeypatches _get_process_start_time to return None and _looks_like_gateway_process to return True (the strongest stress for the false-positive path) and asserts the PID file is cleaned and get_running_pid() returns None.

Core diff:

         recorded_start = record.get("start_time")
         current_start = _get_process_start_time(pid)
         if recorded_start is not None and current_start is not None and current_start != recorded_start:
             continue
+        # If the PID record carries a recorded start_time but we can't read
+        # the current process's start_time, the PID may have been recycled by
+        # the OS to a process the current user can't introspect (typical on
+        # Linux when /proc/<pid>/stat is owned by another UID). The downstream
+        # _looks_like_gateway_process heuristic can give a false positive in
+        # that situation — e.g. another long-lived python process — leaving
+        # a stale PID file that blocks future starts. Be conservative and
+        # skip this candidate. See #14176.
+        if recorded_start is not None and current_start is None:
+            continue

         if _looks_like_gateway_process(pid) or _record_looks_like_gateway(record):
             return pid

How to Test

Reporter-style repro on a Linux host:

  1. Run the gateway, kill -9 the parent process to leave ~/.hermes/gateway.pid and ~/.hermes/gateway.lock populated with the dead PID.
  2. Start a long-lived python process under a different UID (e.g. another hermes daemon under another account) that the test user can see via ps but NOT via /proc/<pid>/stat. Note its PID.
  3. Edit the lock file to point at that recycled PID, keeping the original start_time field intact.
  4. Run hermes gateway start.

Before: refuses to start with "Gateway already running". After: detects the start_time mismatch is unverifiable, treats the entry as stale, cleans the lock file, and starts a fresh gateway.

Automated regression suite:

pytest tests/gateway/test_status.py::TestGatewayPidState -q

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(gateway):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run pytest tests/gateway/test_status.py::TestGatewayPidState -q and all tests pass (11/11)
  • I've added tests for my changes
  • I've tested on my platform: macOS 26.5 (arm64), Python 3.11.14 via uv

Documentation & Housekeeping

  • Documentation updates — N/A (internal helper, no user-visible API change beyond bug fix)
  • cli-config.yaml.example — N/A (no new config)
  • CONTRIBUTING.md / AGENTS.md — N/A
  • Cross-platform impact considered — change is conservative on every platform; the false positive fix matters most on Linux but doesn't regress macOS/Windows behavior
  • Tool descriptions/schemas — N/A

Not in scope

  • The reporter's bash script idea (a dedicated hermes gateway clean-pid command) — that's nice-to-have but a separate UX surface; the in-band fix here is the higher-impact change since it stops the bad state from forming.
  • Auditing gateway.target / Restart= semantics in the example systemd unit (the issue's secondary note) — that's a docs change for docs/deploy/ that deserves its own PR.
  • Hardening atexit-vs-SIGKILL paths so a kill -9 of the gateway doesn't leave a PID file behind — a real concern but out of scope for the reported bug, which is about PID-file interpretation, not creation.

Screenshots / Logs

Verification

$ python3 -m py_compile gateway/status.py tests/gateway/test_status.py
OK

$ uv run --no-project --with pytest --with pytest-xdist --with pyyaml \
       --with python-dotenv --with prompt_toolkit --with rich --with httpx \
       --with fastapi --with pydantic python -m pytest \
       tests/gateway/test_status.py::TestGatewayPidState -q
...........                                                              [100%]
11 passed in 0.51s

(11 = 10 existing + 1 new regression case, all green.)

Changed files

  • gateway/status.py (modified, +10/-0)
  • tests/gateway/test_status.py (modified, +40/-0)

PR #13957: fix(skills): raise system-prompt skill description limit to match runtime tool (#13944)

Description (problem / solution / changelog)

What does this PR do?

The skill index injected into the system prompt hard-truncated every skill description to 60 characters, while the runtime skills_list() tool (tools/skills_tool.py) allowed up to 1024. The LLM saw a vague prefix in the system prompt — where the routing decision is actually made — and only got the full description after deciding to call skills_list(). That's backwards: the trigger criteria need to be visible at system-prompt time so the model can decide whether to route to the skill.

Example before vs after:

Before:  "Complete guide to using and extending Hermes Agent — CLI ..."
After:   "Complete guide to using and extending Hermes Agent — CLI tooling, skill authoring, and gateway integration"

Fix: introduce SKILL_INDEX_MAX_DESCRIPTION_LENGTH = 1024 in agent/skill_utils.py and use it in extract_skill_description(). Descriptions under the limit are returned verbatim; over-limit ones are truncated to exactly SKILL_INDEX_MAX_DESCRIPTION_LENGTH with a trailing "..." included in the budget (same contract the runtime tool uses).

Related Issue

Fixes #13944

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • agent/skill_utils.py (+19 / −4): new SKILL_INDEX_MAX_DESCRIPTION_LENGTH constant; extract_skill_description() uses it instead of the hardcoded 60; docstring updated to describe the contract + issue context.
  • tests/agent/test_extract_skill_description.py (+68, new file): 8 regression cases covering empty / short / boundary / long-below / at-new-limit / over-new-limit / strip() preservation, plus one that locks in equality with tools.skills_tool.MAX_DESCRIPTION_LENGTH so the two paths can't silently drift apart again.

Core diff:

-def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
-    """Extract a truncated description from parsed frontmatter."""
+SKILL_INDEX_MAX_DESCRIPTION_LENGTH = 1024
+
+
+def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
+    """Extract a (possibly truncated) description from parsed frontmatter.
+
+    Descriptions under ``SKILL_INDEX_MAX_DESCRIPTION_LENGTH`` are returned
+    verbatim. Longer ones are truncated to that length, with a trailing
+    ``"..."`` included in the budget.
+    """
     raw_desc = frontmatter.get("description", "")
     if not raw_desc:
         return ""
     desc = str(raw_desc).strip().strip("'\"")
-    if len(desc) > 60:
-        return desc[:57] + "..."
+    if len(desc) > SKILL_INDEX_MAX_DESCRIPTION_LENGTH:
+        return desc[: SKILL_INDEX_MAX_DESCRIPTION_LENGTH - 3] + "..."
     return desc

How to Test

Before:

from agent.skill_utils import extract_skill_description
desc = "Complete guide to using and extending Hermes Agent — CLI tooling, skill authoring, and gateway integration"
extract_skill_description({"description": desc})
# 'Complete guide to using and extending Hermes Agent — CLI...'   # 60 chars

After: returns the full description verbatim (108 chars).

pytest tests/agent/test_extract_skill_description.py -q

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(skills):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run pytest tests/agent/test_extract_skill_description.py -q and all tests pass
  • I've added tests for my changes (8 new regression cases)
  • I've tested on my platform: macOS 26.5 (arm64), Python 3.11.14 via uv

Documentation & Housekeeping

  • Documentation updates — N/A (internal constant; docstring updated in-place)
  • cli-config.yaml.example — N/A (no new config)
  • CONTRIBUTING.md / AGENTS.md — N/A
  • Cross-platform impact — string slicing, no OS-specific paths
  • Tool descriptions/schemas — N/A

Screenshots / Logs

Verification

$ uv run --with pytest --with pytest-xdist python -m pytest \
    tests/agent/test_extract_skill_description.py -v
........                                                                 [100%]
8 passed in 0.56s

Not in scope

The issue author's long-term suggestion (prompt_builder consuming skills_tool's metadata rather than maintaining a parallel parsing implementation) is out of scope for this PR — that's an architectural refactor that deserves its own proposal and review. This PR implements the reporter's minimum fix and adds a regression guard so the two paths can't silently drift apart again.

Changed files

  • agent/skill_utils.py (modified, +18/-3)
  • tests/agent/test_extract_skill_description.py (added, +65/-0)

PR #13937: fix(skills): honor platform_disabled config in gateway-built system prompts (#13851)

Description (problem / solution / changelog)

What does this PR do?

build_skills_system_prompt() resolves the active platform via HERMES_PLATFORM (os.environ) and HERMES_SESSION_PLATFORM (contextvar). When the gateway builds system prompts on an async task that doesn't inherit the session contextvars — the common path on the Signal gateway — both lookups return empty and skills.platform_disabled.<platform> is silently ignored. Every skill ships into the system prompt, which inflates the combined system-prompt+tools payload above the ~25K-char threshold where local LLMs (gemma4:26b, hermes3, qwen3:14b, mistral-small3.1) stop calling tools and just reply with text. Hermes Agent becomes effectively unusable with local models when many skills are installed.

The fix adds an explicit platform= parameter to build_skills_system_prompt() that is authoritative for both get_disabled_skill_names() resolution and the cache key. run_agent.AIAgent already carries self.platform from the gateway constructor (AIAgent(platform=platform_key, ...) in gateway/run.py:9700), so we thread it through the only caller site.

Backward compatible: platform=None retains the existing env/contextvar fallback chain so CLI, cron, and web-server callers are unchanged.

Related Issue

Fixes #13851

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • agent/prompt_builder.py (+19 / −3): add platform kwarg to build_skills_system_prompt(); it takes priority over HERMES_PLATFORM / HERMES_SESSION_PLATFORM when resolving the platform hint that's passed into get_disabled_skill_names() and used as the cache-key dimension. Updated docstring to describe the gateway contextvar-propagation problem the new parameter solves.
  • run_agent.py (+1): pass self.platform when calling build_skills_system_prompt from the agent's system-prompt build path. AIAgent already stores the platform on self.platform (line 821) from the gateway constructor.
  • tests/agent/test_prompt_builder.py (+115): four new regression tests under TestBuildSkillsSystemPrompt.

Implementation diff (core change):

 def build_skills_system_prompt(
     available_tools: "set[str] | None" = None,
     available_toolsets: "set[str] | None" = None,
+    platform: "str | None" = None,
 ) -> str:
     ...
     _platform_hint = (
-        os.environ.get("HERMES_PLATFORM")
+        platform
+        or os.environ.get("HERMES_PLATFORM")
         or get_session_env("HERMES_SESSION_PLATFORM")
         or ""
     )
-    disabled = get_disabled_skill_names()
+    disabled = get_disabled_skill_names(platform=_platform_hint or None)
 # run_agent.py
             skills_prompt = build_skills_system_prompt(
                 available_tools=self.valid_tool_names,
                 available_toolsets=avail_toolsets,
+                platform=self.platform,
             )

How to Test

Reporter's scenario:

# config.yaml
skills:
  platform_disabled:
    signal:
      - apple-notes
      - apple-reminders
      # ... 107 skill names
hermes gateway run --replace
# Send a message via Signal.

Before: all 129 skills appear in the system-prompt index, combined payload ~53K chars, local LLMs don't emit tool calls. After: only the non-disabled skills appear, payload drops proportionally, local LLMs recover the ability to call tools.

Automated regression suite:

pytest tests/agent/test_prompt_builder.py::TestBuildSkillsSystemPrompt -q

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(skills):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run the scoped tests (tests/agent/test_prompt_builder.py::TestBuildSkillsSystemPrompt) and all tests pass
  • I've added tests for my changes (4 new regression cases)
  • I've tested on my platform: macOS 26.5 (arm64), Python 3.11.14 via uv

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (platform= is an internal kwarg on an internal helper; docstring updated in-line)
  • I've updated cli-config.yaml.example — N/A (no new config)
  • I've updated CONTRIBUTING.md / AGENTS.md — N/A
  • I've considered cross-platform impact — pure Python, no OS-specific code paths touched
  • I've updated tool descriptions/schemas — N/A

Not in scope

The issue also mentions platform_toolsets.signal being ignored (26 tools passed to the model instead of the configured 7). That's a separate architectural problem — platform_toolsets filtering lives in hermes_cli/tools_config.py and the gateway invocation path doesn't consult it the same way the skills path does. It belongs in its own PR with its own regression suite, and the skills fix here already delivers the 16K-char savings from the 107 disabled skills — often enough to get the payload back under the local-LLM threshold on its own.

Screenshots / Logs

Regression tests added

TestScenario
test_explicit_platform_param_disables_skills_for_that_platformbuild_skills_system_prompt(platform="signal") excludes skills listed under skills.platform_disabled.signal
test_explicit_platform_param_wins_over_env_varsCaller-provided platform= overrides HERMES_PLATFORM env var
test_platform_none_falls_back_to_envNo platform= kwarg → legacy env-based resolution still works (backward compat)
test_platform_in_cache_key_prevents_cross_platform_leakBack-to-back calls with different platforms return correctly filtered outputs (no cache collision)

Verification

$ python3 -m py_compile agent/prompt_builder.py run_agent.py tests/agent/test_prompt_builder.py
OK

$ uv run --with pytest --with pytest-xdist python -m pytest \
    tests/agent/test_prompt_builder.py::TestBuildSkillsSystemPrompt -q
..............                                                           [100%]
14 passed in 0.66s

(14 tests = 10 existing + 4 new regression cases, all green. No existing behavior regressed.)

Changed files

  • agent/prompt_builder.py (modified, +19/-3)
  • run_agent.py (modified, +1/-0)
  • tests/agent/test_prompt_builder.py (modified, +115/-0)

PR #18989: fix(doctor): use get_env_value for API keys and base URLs instead of os.getenv

Description (problem / solution / changelog)

Problem

hermes_cli/doctor.py uses os.getenv() to read API keys and base URL env vars, which only reads from os.environ. Values stored in ~/.hermes/.env (the standard location for API keys) are silently missed.

This causes:

  • API key blindspot: doctor reports keys as "not configured" when they exist only in ~/.hermes/.env
  • Base URL fallback: providers with custom BASE_URL in .env (like DASHSCOPE_BASE_URL) fall back to wrong registry defaults, causing false "invalid key" reports

Same pattern as #18757 (fixed in auth.py) — os.getenv() vs get_env_value() inconsistency.

Fix

Replace os.getenv() with get_env_value() (with os.getenv() fallback) at 3 call sites:

  1. Line 998: openrouter_key = os.getenv("OPENROUTER_API_KEY")_get_env_value("OPENROUTER_API_KEY") or os.getenv("OPENROUTER_API_KEY")
  2. Line 1111: _key = os.getenv(_ev, "") in provider loop → _get_env_value(_ev) or os.getenv(_ev, "")
  3. Line 1123: _base = os.getenv(_base_env, "") for base URL → _get_env_value(_base_env) or os.getenv(_base_env, "")

The fallback to os.getenv() ensures values injected directly into the process environment (e.g., Docker, CI) still work.

Related

  • #18757 — same fix applied to auth.py
  • Skill docs note this as a known standing issue: doctor.py line 1122 (_base = os.getenv(_base_env, "")) still uses os.getenv

Changed files

  • hermes_cli/doctor.py (modified, +4/-3)

PR #19021: test(credential_pool): align with .env-first seed precedence

Description (problem / solution / changelog)

Summary

Fixes one Tests failure observed on main (and therefore propagating to every open PR):

FAILED tests/tools/test_credential_pool_env_fallback.py::TestCredentialPoolSeedsFromDotEnv::test_os_environ_still_wins_over_dotenv
  AssertionError: assert 'sk-dotenv-stale' == 'sk-env-fresh-xyz'

Reference run: 25250051126 on 5d3be898a.

Root cause

The test docstring says:

get_env_value checks os.environ first — verify seeding picks that up.

…which was true for _seed_from_env until commit 2ef1ad280 ("fix: prefer ~/.hermes/.env over os.environ when seeding credential pool", fixes #18254). That commit introduced a private _get_env_prefer_dotenv helper and deliberately flipped the precedence:

# Prefer ~/.hermes/.env over os.environ — the user's config file is the
# authoritative source for Hermes credentials. Stale env vars from parent
# processes (Codex CLI, test scripts, etc.) should not override deliberate
# changes to the .env file.
def _get_env_prefer_dotenv(key: str) -> str:
    env_file = load_env()
    val = env_file.get(key) or os.environ.get(key) or ""
    return val.strip()

The fix was correct (#18254 reported real users hitting silent 401s with stale auth.json caches), but the unit test was never updated to match.

Fix

Rename + rewrite the test to assert the current, deliberate behaviour: .env wins over os.environ, with a docstring explaining why (stale shell env vars from parent processes shadowing deliberate .env edits, leading to cached 401s in auth.json).

The cousin Auth* tests below already exercise os.environ-first semantics for _resolve_api_key_provider_secret (which still uses get_env_value), so the two precedence policies are both pinned now.

Validation

$ pytest tests/tools/test_credential_pool_env_fallback.py -q
9 passed in 1.59s

Refs

  • #18254 — bug report that motivated the precedence flip
  • 2ef1ad280 — the precedence flip commit
  • #18757 — cousin: os.getenvget_env_value fix on auth.py base_url path

Scope

  • ✅ No production code change (test-only)
  • ✅ All 9 tests in the file pass
  • ✅ Test name + docstring now accurately describe the contract

Out of scope

The other ~10 main-CI failures — separate focused PRs (#18972, #18974, #18977, #18979 already up).

Changed files

  • tests/tools/test_credential_pool_env_fallback.py (modified, +17/-5)

Code Example

XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/v1

---

auxiliary:
     title_generation:
       provider: xiaomi
       model: mimo-v2.5
       base_url: ''

---

Title generation failed: Error code: 401 - {'error': {'message': 'Invalid API Key', 'param': 'Please provide valid API Key', 'code': '401', 'type': 'invalid_key'}}

---

# API key — uses get_env_value()  (reads .env file)
for env_var in pconfig.api_key_env_vars:
    val = (get_env_value(env_var) or "").strip()

# base_url — uses os.getenv()  (does NOT read .env file)
env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip()

---

env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")

---

# auth.pyresolve_api_key_provider_credentials()
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip()

# runtime_provider.py — same change
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")
RAW_BUFFERClick to expand / collapse

Bug Description

resolve_api_key_provider_credentials() in hermes_cli/auth.py resolves base_url_env_var using os.getenv(), which does not read from ~/.hermes/.env. Providers with a custom base URL stored only in .env (not exported in the shell) get the wrong endpoint — the default inference_base_url from PROVIDER_REGISTRY.

The same function correctly uses get_env_value() for API key resolution, creating an inconsistency: API keys are found, but base URLs are not.

The identical pattern also exists in hermes_cli/runtime_provider.py.

Steps to Reproduce

  1. Configure a provider with a custom base URL (e.g., Xiaomi with token-plan-cn.xiaomimimo.com instead of the default api.xiaomimimo.com)
  2. Set the base URL in ~/.hermes/.env only — do not export it in the shell:
    XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/v1
  3. Set the same provider as an auxiliary task provider in config.yaml:
    auxiliary:
      title_generation:
        provider: xiaomi
        model: mimo-v2.5
        base_url: ''
  4. Start a new conversation via gateway (Telegram/Discord/CLI)
  5. Observe that the auxiliary task (e.g., auto-title) fails with 401

Expected Behavior

The auxiliary client should resolve XIAOMI_BASE_URL from ~/.hermes/.env via get_env_value() and use https://token-plan-cn.xiaomimimo.com/v1 — the same way it resolves XIAOMI_API_KEY.

Actual Behavior

The auxiliary client uses os.getenv("XIAOMI_BASE_URL") which returns None (env var not in os.environ), then falls back to pconfig.inference_base_url (https://api.xiaomimimo.com/v1). The request hits the wrong endpoint and returns 401.

Affected Component

  • Configuration (config.yaml, .env, hermes setup)
  • Agent Core (conversation loop, context compression, memory)

Debug Report

N/A — this is a code-level bug identifiable from source. No environment-specific info needed.

Operating System

Ubuntu 24.04 (ARM64)

Python Version

3.12

Hermes Version

v0.12.0 (2026.4.30) — persists on origin/main (98c98821f)

Additional Logs / Traceback

Title generation failed: Error code: 401 - {'error': {'message': 'Invalid API Key', 'param': 'Please provide valid API Key', 'code': '401', 'type': 'invalid_key'}}

Root Cause Analysis

hermes_cli/auth.py L3529-3531resolve_api_key_provider_credentials():

# API key — uses get_env_value() ✅ (reads .env file)
for env_var in pconfig.api_key_env_vars:
    val = (get_env_value(env_var) or "").strip()

# base_url — uses os.getenv() ❌ (does NOT read .env file)
env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip()

hermes_cli/runtime_provider.py — same pattern:

env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")

Proposed Fix

Replace os.getenv with get_env_value in both locations:

# auth.py — resolve_api_key_provider_credentials()
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip()

# runtime_provider.py — same change
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")

Same Bug Class as #15914 and #17140

This is the same os.getenv() vs get_env_value() pattern fixed in:

  • PR #16101 (closed #15914) — API key + credential_pool
  • PR #17434 (closed #17140) — TTS/STT tools

The base_url_env_var resolution was missed in both fixes.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Replace os.getenv() with get_env_value() in hermes_cli/auth.py and hermes_cli/runtime_provider.py to fix the inconsistency in resolving base URLs from .env files.

Guidance

  • Identify the locations where os.getenv() is used for resolving base URLs in hermes_cli/auth.py and hermes_cli/runtime_provider.py.
  • Replace these instances with get_env_value() to ensure consistency with API key resolution.
  • Verify that the change resolves the issue by testing with a custom base URL set in .env and not exported in the shell.
  • Review previous fixes (PR #16101 and PR #17434) to understand the context and ensure the change is applied correctly.

Example

# auth.py — resolve_api_key_provider_credentials()
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip()

# runtime_provider.py — same change
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")

Notes

This fix assumes that get_env_value() is correctly implemented to read from .env files. If this is not the case, additional changes may be required.

Recommendation

Apply the workaround by replacing os.getenv() with get_env_value() in the specified locations, as this is a direct fix for the identified issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING