hermes - ✅(Solved) Fix [Bug]: resolve_api_key_provider_credentials() uses os.getenv for base_url_env_var — misses ~/.hermes/.env values [19 pull requests, 1 participants]

PositionZer0 · 2026-05-02T09:03:16Z

[hermes] PR 17246: fix: resolve 7 identified issues automated - Repository: NousResearch/hermes-agent - Author: Sldark23 - State: open | merged: False - Link:… # PR #17246: fix: resolve 7 identified issues [automated] - Repository: NousResearch/hermes-agent - Author: Sldark23 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/17246 ## Description (problem / solution / changelog) ## Summary This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in `NousResearch/hermes-agent`. > Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass. ## Issues resolved 1. **#18757** - `resolve_api_key_provider_credentials()` misses `~/.hermes/.env` for `base_url_env_var` - Replaced `os.getenv(...)` with `get_env_value(...)` in API-key provider credential resolution. - Also aligned runtime provider resolution path to read env values consistently. 2. **#18705** - `load_hermes_dotenv()` overrides runtime env vars (`override=True`) - Switched user env loading to `override=False` so runtime-injected env vars keep precedence. - Updated function docstring behavior notes accordingly. 3. **#18722** - Cron jobs with `next_run_at: null` skipped forever; non-dict `origin` crash - Added recovery for recurring `cron/interval` jobs by recomputing `next_run_at`. - Hardened `_resolve_origin()` to tolerate non-dict origin payloads. 4. **#18742** - Kimi/Moonshot via aggregators misses reasoning-mode detection - `_needs_kimi_tool_reasoning()` now also detects Moonshot/Kimi model slugs via `is_moonshot_model(...)`. 5. **#18744** - `constraints_path` dead config (not loaded) - Implemented optional loading of `constraints_path` content into system prompt composition. 6. **#18778** - Gateway scoped lock stale detection no-op on macOS/Windows - Added cross-platform process start time/cmdline detection using `psutil` fallback. - Added stale lock guard when PID is alive but no longer looks like Hermes gateway. ## Files modified - `hermes_cli/auth.py` - `hermes_cli/runtime_provider.py` - `hermes_cli/env_loader.py` - `cron/jobs.py` - `cron/scheduler.py` - `run_agent.py` - `gateway/status.py` ## Commit list - `fix(auth): resolve base_url_env_var via get_env_value in provider credentials` - `fix(env): preserve runtime environment precedence over .env values` - `fix(cron): recover missing next_run_at for recurring jobs and guard origin type` - `fix(agent): improve moonshot model detection and load constraints_path prompt block` - `fix(gateway): harden scoped lock stale detection on macOS/windows` ## Changed files - `Dockerfile` (modified, +3/-2) - `acp_adapter/session.py` (modified, +12/-0) - `agent/auxiliary_client.py` (modified, +280/-28) - `agent/context_compressor.py` (modified, +496/-52) - `agent/title_generator.py` (modified, +2/-2) - `agent/transports/chat_completions.py` (modified, +14/-0) - `agent/usage_pricing.py` (modified, +4/-0) - `cli-config.yaml.example` (modified, +5/-0) - `cli.py` (modified, +27/-3) - `cron/jobs.py` (modified, +10/-2) - `cron/scheduler.py` (modified, +14/-4) - `docker/entrypoint.sh` (modified, +9/-1) - `gateway/channel_directory.py` (modified, +14/-4) - `gateway/platforms/discord.py` (modified, +33/-7) - `gateway/platforms/email.py` (modified, +12/-2) - `gateway/platforms/feishu.py` (modified, +34/-1) - `gateway/platforms/qqbot/adapter.py` (modified, +8/-2) - `gateway/platforms/telegram_network.py` (modified, +7/-2) - `gateway/platforms/weixin.py` (modified, +10/-1) - `gateway/run.py` (modified, +129/-32) - `gateway/status.py` (modified, +37/-2) - `hermes_cli/auth.py` (modified, +4/-4) - `hermes_cli/commands.py` (modified, +1/-1) - `hermes_cli/config.py` (modified, +271/-40) - `hermes_cli/copilot_auth.py` (modified, +1/-1) - `hermes_cli/doctor.py` (modified, +6/-1) - `hermes_cli/env_loader.py` (modified, +5/-4) - `hermes_cli/gateway.py` (modified, +16/-13) - `hermes_cli/main.py` (modified, +69/-3) - `hermes_cli/memory_setup.py` (modified, +1/-1) - `hermes_cli/model_switch.py` (modified, +6/-1) - `hermes_cli/models.py` (modified, +60/-2) - `hermes_cli/profiles.py` (modified, +16/-3) - `hermes_cli/runtime_provider.py` (modified, +17/-14) - `hermes_cli/setup.py` (modified, +8/-2) - `hermes_cli/slack_cli.py` (modified, +1/-2) - `hermes_cli/status.py` (modified, +17/-2) - `hermes_cli/web_server.py` (modified, +1/-1) - `hermes_constants.py` (modified, +16/-3) - `model_tools.py` (modified, +44/-13) - `run_agent.py` (modified, +413/-82) - `setup-hermes.sh` (modified, +23/-12) - `skills/red-teaming/godmode/scripts/load_godmode.py` (modified, +9/-8) - `tests/agent/test_context_compressor.py` (modified, +389/-0) - `tests/agent/transports/test_chat_completions.py` (modified, +11/-0) - `tests/gateway/test_compr

hermes2026-05-02 09:03:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18757•Fetched 2026-05-03 04:54:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

PositionZer0

Participants

PositionZer0

Timeline (top)

cross-referenced ×19labeled ×5referenced ×1

Error Message

Additional Logs / Traceback

Title generation failed: Error code: 401 - {'error': {'message': 'Invalid API Key', 'param': 'Please provide valid API Key', 'code': '401', 'type': 'invalid_key'}}

Root Cause

hermes_cli/auth.py L3529-3531 — resolve_api_key_provider_credentials():

# API key — uses get_env_value() ✅ (reads .env file)
for env_var in pconfig.api_key_env_vars:
    val = (get_env_value(env_var) or "").strip()

# base_url — uses os.getenv() ❌ (does NOT read .env file)
env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip()

hermes_cli/runtime_provider.py — same pattern:

env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")

Fix Action

Fixed

Fixed by PR: fix: resolve 7 identified issues [automated] (https://github.com/NousResearch/hermes-agent/pull/17246)
Fixed by PR: fix(auth): use get_env_value for base_url_env_var instead of os.getenv (#18757) (https://github.com/NousResearch/hermes-agent/pull/18797)
Fixed by PR: fix: use get_env_value() for base_url_env_var resolution (https://github.com/NousResearch/hermes-agent/pull/18908)
Fixed by PR: fix(doctor): read env vars from .env and default to China DashScope endpoint (https://github.com/NousResearch/hermes-agent/pull/18910)
Fixed by PR: fix(auth): resolve base_url_env_var via get_env_value everywhere (closes #18757) (https://github.com/NousResearch/hermes-agent/pull/18948)

PR fix notes

PR #17246: fix: resolve 7 identified issues [automated]

Repository: NousResearch/hermes-agent
Author: Sldark23
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17246

Description (problem / solution / changelog)

Summary

This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in NousResearch/hermes-agent.

Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass.

Issues resolved

#18757 - resolve_api_key_provider_credentials() misses ~/.hermes/.env for base_url_env_var
- Replaced os.getenv(...) with get_env_value(...) in API-key provider credential resolution.
- Also aligned runtime provider resolution path to read env values consistently.
#18705 - load_hermes_dotenv() overrides runtime env vars (override=True)
- Switched user env loading to override=False so runtime-injected env vars keep precedence.
- Updated function docstring behavior notes accordingly.
#18722 - Cron jobs with next_run_at: null skipped forever; non-dict origin crash
- Added recovery for recurring cron/interval jobs by recomputing next_run_at.
- Hardened _resolve_origin() to tolerate non-dict origin payloads.
#18742 - Kimi/Moonshot via aggregators misses reasoning-mode detection
- _needs_kimi_tool_reasoning() now also detects Moonshot/Kimi model slugs via is_moonshot_model(...).
#18744 - constraints_path dead config (not loaded)
- Implemented optional loading of constraints_path content into system prompt composition.
#18778 - Gateway scoped lock stale detection no-op on macOS/Windows
- Added cross-platform process start time/cmdline detection using psutil fallback.
- Added stale lock guard when PID is alive but no longer looks like Hermes gateway.

Files modified

hermes_cli/auth.py
hermes_cli/runtime_provider.py
hermes_cli/env_loader.py
cron/jobs.py
cron/scheduler.py
run_agent.py
gateway/status.py

Commit list

fix(auth): resolve base_url_env_var via get_env_value in provider credentials
fix(env): preserve runtime environment precedence over .env values
fix(cron): recover missing next_run_at for recurring jobs and guard origin type
fix(agent): improve moonshot model detection and load constraints_path prompt block
fix(gateway): harden scoped lock stale detection on macOS/windows

Changed files

Dockerfile (modified, +3/-2)
acp_adapter/session.py (modified, +12/-0)
agent/auxiliary_client.py (modified, +280/-28)
agent/context_compressor.py (modified, +496/-52)
agent/title_generator.py (modified, +2/-2)
agent/transports/chat_completions.py (modified, +14/-0)
agent/usage_pricing.py (modified, +4/-0)
cli-config.yaml.example (modified, +5/-0)
cli.py (modified, +27/-3)
cron/jobs.py (modified, +10/-2)
cron/scheduler.py (modified, +14/-4)
docker/entrypoint.sh (modified, +9/-1)
gateway/channel_directory.py (modified, +14/-4)
gateway/platforms/discord.py (modified, +33/-7)
gateway/platforms/email.py (modified, +12/-2)
gateway/platforms/feishu.py (modified, +34/-1)
gateway/platforms/qqbot/adapter.py (modified, +8/-2)
gateway/platforms/telegram_network.py (modified, +7/-2)
gateway/platforms/weixin.py (modified, +10/-1)
gateway/run.py (modified, +129/-32)
gateway/status.py (modified, +37/-2)
hermes_cli/auth.py (modified, +4/-4)
hermes_cli/commands.py (modified, +1/-1)
hermes_cli/config.py (modified, +271/-40)
hermes_cli/copilot_auth.py (modified, +1/-1)
hermes_cli/doctor.py (modified, +6/-1)
hermes_cli/env_loader.py (modified, +5/-4)
hermes_cli/gateway.py (modified, +16/-13)
hermes_cli/main.py (modified, +69/-3)
hermes_cli/memory_setup.py (modified, +1/-1)
hermes_cli/model_switch.py (modified, +6/-1)
hermes_cli/models.py (modified, +60/-2)
hermes_cli/profiles.py (modified, +16/-3)
hermes_cli/runtime_provider.py (modified, +17/-14)
hermes_cli/setup.py (modified, +8/-2)
hermes_cli/slack_cli.py (modified, +1/-2)
hermes_cli/status.py (modified, +17/-2)
hermes_cli/web_server.py (modified, +1/-1)
hermes_constants.py (modified, +16/-3)
model_tools.py (modified, +44/-13)
run_agent.py (modified, +413/-82)
setup-hermes.sh (modified, +23/-12)
skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
tests/agent/test_context_compressor.py (modified, +389/-0)
tests/agent/transports/test_chat_completions.py (modified, +11/-0)
tests/gateway/test_compress_command.py (modified, +49/-0)
tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
tests/hermes_cli/test_config.py (modified, +17/-0)
tests/run_agent/test_413_compression.py (modified, +81/-1)
tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
tests/run_agent/test_run_agent.py (modified, +100/-13)
tests/tools/test_skill_manager_tool.py (modified, +270/-0)
tools/approval.py (modified, +1/-1)
tools/delegate_tool.py (modified, +4/-1)
tools/environments/docker.py (modified, +36/-5)
tools/environments/local.py (modified, +8/-1)
tools/file_operations.py (modified, +70/-67)
tools/file_tools.py (modified, +13/-2)
tools/send_message_tool.py (modified, +72/-2)
tools/session_search_tool.py (modified, +2/-2)
tools/skill_manager_tool.py (modified, +82/-21)
tools/skills_tool.py (modified, +13/-1)
tools/terminal_tool.py (modified, +6/-0)
tools/tool_backend_helpers.py (modified, +15/-5)
tools/tts_tool.py (modified, +27/-16)
tools/voice_mode.py (modified, +23/-10)
toolsets.py (modified, +14/-1)
tui_gateway/server.py (modified, +5/-3)
ui-tui/src/app/turnController.ts (modified, +1/-1)
ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
ui-tui/src/gatewayTypes.ts (modified, +1/-0)
utils.py (modified, +9/-0)
uv.lock (modified, +161/-2)
website/docs/reference/environment-variables.md (modified, +1/-1)

PR #18797: fix(auth): use get_env_value for base_url_env_var instead of os.getenv (#18757)

Repository: NousResearch/hermes-agent
Author: shellybotmoyer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18797

Description (problem / solution / changelog)

Fix #18757: Use `get_env_value()` for `base_url_env_var` instead of `os.getenv()`

Problem

resolve_api_key_provider_credentials() in hermes_cli/auth.py resolves base_url_env_var using os.getenv(), which does not read from ~/.hermes/.env. Providers with a custom base URL stored only in .env (not exported in the shell environment) silently fall back to the wrong endpoint — the default inference_base_url from PROVIDER_REGISTRY.

Fix

Replace os.getenv() with get_env_value() for base_url_env_var in auth.py (4 sites) and runtime_provider.py (1 site)
This ensures custom base URLs stored in ~/.hermes/.env are properly resolved

Testing

Verified that providers with base_url_env_var set in ~/.hermes/.env now correctly resolve their custom endpoints
get_env_value() already handles both os.environ and .env file lookups

Changed files

hermes_cli/auth.py (modified, +11/-4)
hermes_cli/runtime_provider.py (modified, +3/-1)

PR #18908: fix: use get_env_value() for base_url_env_var resolution

Repository: NousResearch/hermes-agent
Author: zons-zhaozhy
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18908

Description (problem / solution / changelog)

Summary

resolve_api_key_provider_credentials() and related functions used os.getenv() for base_url_env_var, which does NOT read ~/.hermes/.env. Providers with custom base URLs stored only in .env hit the wrong endpoint.

Root Cause

API key resolution already correctly uses get_env_value() — this was an inconsistency where keys were found but base URLs were not.

Fix

Replace os.getenv() with get_env_value() at all 5 occurrences:

File	Function
auth.py	`get_api_key_provider_status()`
auth.py	`get_external_process_provider_status()`
auth.py	`resolve_api_key_provider_credentials()`
auth.py	`resolve_external_process_provider_credentials()`
runtime_provider.py	`_resolve_explicit_runtime()`

Same Bug Class

#15914 → PR #16101 (api_key + credential_pool)
#17140 → PR #17434 (TTS/STT tools)

The base_url_env_var resolution was missed in both prior fixes.

Scope

2 files, +14 / -5 lines
No behavioral change for env vars already in os.environ
Purely fixes the .env file reading gap

Fixes #18757

Changed files

hermes_cli/auth.py (modified, +12/-4)
hermes_cli/runtime_provider.py (modified, +2/-1)

PR #18910: fix(doctor): read env vars from .env and default to China DashScope endpoint

Repository: NousResearch/hermes-agent
Author: zons-zhaozhy
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18910

Description (problem / solution / changelog)

Summary

hermes doctor API-key health checks had two bugs:

Bug 1: env vars from .env invisible to doctor

os.getenv() does not read ~/.hermes/.env. Keys and base URLs stored only in .env (not exported to the shell) were invisible to all 16 api-key provider health checks.

Fix: Replace os.getenv() with get_env_value() for both API key and base URL resolution.

Bug 2: DashScope default URL is international-only

The default health-check URL was dashscope-intl.aliyuncs.com (international). China-region keys — the vast majority of DashScope users — are valid only on dashscope.aliyuncs.com. Doctor reported these as invalid.

Fix: Default to dashscope.aliyuncs.com (China). Users with DASHSCOPE_BASE_URL set are unaffected.

Same Bug Class

#14134 (PR #18906) — api_key drift on provider switch
#15914 (PR #16101) — api_key + credential_pool
#17140 (PR #17434) — TTS/STT tools
#18757 (PR #18908) — base_url_env_var in auth.py/runtime_provider.py

Testing

23 existing doctor tests: all pass, zero regression.

Scope

1 file: hermes_cli/doctor.py (+5 / -4)
Affects all 16 api-key provider health checks (env var reading)
DashScope default URL change

Fixes #18904

Changed files

hermes_cli/doctor.py (modified, +28/-6)

PR #18948: fix(auth): resolve base_url_env_var via get_env_value everywhere (closes #18757)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18948

Description (problem / solution / changelog)

Closes #18757

resolve_api_key_provider_credentials() and friends read base_url_env_var via os.getenv(), which never consults ~/.hermes/.env. A user who sets, e.g., XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/v1 only in the dotenv file silently falls back to the registry default and gets 401s on auxiliary tasks. API keys are read correctly via get_env_value() — base URLs are not. This is the same bug class fixed for API keys in #16101 and for TTS/STT in #17434.

Why this PR rather than #17246

#17246 (diff) targets the same issue but only patches 2 of 6 buggy spots, and despite the PR description claiming runtime_provider.py was "aligned", the diff doesn't touch that file at all.

File	Function	#17246	This PR
`hermes_cli/auth.py`	`get_api_key_provider_status`	✅	✅
`hermes_cli/auth.py`	`resolve_api_key_provider_credentials`	✅	✅
`hermes_cli/auth.py`	`get_external_process_provider_status` (Copilot ACP)	❌	✅
`hermes_cli/auth.py`	`resolve_external_process_provider_credentials`	❌	✅
`hermes_cli/runtime_provider.py`	`resolve_runtime_provider` (api_key branch)	❌ (claimed but absent)	✅
`hermes_cli/model_switch.py`	`_refresh_curated_models` builtin endpoint dedup	❌	✅

#17246 also bundles 5 unrelated fixes (cron, env precedence, moonshot detection, gateway lock, etc.) — this PR is scoped solely to #18757 so it can land without dragging the rest along.

Behavior

get_env_value() already preserves shell-export precedence — os.environ wins when the variable is exported, dotenv is the fallback. Existing deployments are unaffected; users who relied on the dotenv file finally get the right endpoint.

Tests

New tests/hermes_cli/test_base_url_dotenv_resolution.py (7 tests):

resolve_api_key_provider_credentials / get_api_key_provider_status read base URL from ~/.hermes/.env (Xiaomi)
resolve_external_process_provider_credentials / *_status read base URL from ~/.hermes/.env (Copilot ACP)
resolve_runtime_provider honours dotenv on the api_key branch
model_switch dedup helper is wired through get_env_value
Regression guard: shell exports still beat dotenv values

522 passed, 522 warnings in 6.24s

Run on the full intersection of test files touching the changed modules — full green, no regressions.

Files changed

hermes_cli/auth.py — top-level get_env_value import + 4 call sites
hermes_cli/runtime_provider.py — 1 call site
hermes_cli/model_switch.py — 1 call site (with os.environ fallback for safety)
tests/hermes_cli/test_base_url_dotenv_resolution.py — new

Closes #18757.

Changed files

hermes_cli/auth.py (modified, +20/-5)
hermes_cli/model_switch.py (modified, +7/-1)
hermes_cli/runtime_provider.py (modified, +5/-1)
tests/hermes_cli/test_base_url_dotenv_resolution.py (added, +192/-0)

PR #18788: fix(web/dashboard): skip xterm.js WebGL renderer on Safari to fix Unicode box-drawing glyphs

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18788

Description (problem / solution / changelog)

Summary

Fixes #18773. In the dashboard Chat tab, Safari's WebGL renderer mangles Unicode box-drawing characters (╔╗║╚╝, ██╗, etc.) used by the HERMES AGENT banner and TUI borders — they fragment into blocks instead of forming proper shapes. Chrome and Firefox WebGL render the same glyphs correctly.

This is a known xterm.js + Safari WebKit interaction (also affects VS Code Server, JupyterLab, etc.).

Fix

web/src/pages/ChatPage.tsx: skip the WebglAddon on Safari and let xterm.js fall back to the default DOM renderer, which renders the box-drawing glyphs faithfully on Safari.

A short isSafariBrowser() helper detects macOS/iOS Safari without false positives on Chromium derivatives:

Requires the Safari/ UA token.
Excludes Chromium fingerprints (Chrome/, Chromium/, CriOS/) — Chromium-based browsers all advertise Safari/ in their UA for legacy compat.
Excludes other WebKit-wrapping shells we know don't hit the bug (FxiOS/, EdgiOS/, Android UAs).

Existing WebGL gate (terminalTierWidthPx(host) >= 768) is preserved, so wide layouts on Chrome/Firefox/Edge still get the crisp WebGL rendering. Only Safari at any width goes to the DOM renderer.

Why not the issue's proposed `rendererType: 'dom'`?

The proposed new Terminal({ rendererType: 'dom' }) option is from xterm.js v4 and was removed in v5+. The repo is on @xterm/xterm@^6.0.0, where renderer choice is controlled via addons (WebglAddon, CanvasAddon). Skipping the WebglAddon is the modern equivalent.

Verification

tsc -b: clean.
vite build: clean (1.65s).
Diff scope: web/src/pages/ChatPage.tsx only, +38/-1.

What I did NOT change

Chrome/Firefox/Edge WebGL path (still active for wide layouts).
Mobile/narrow layout fallback (< 768px → DOM renderer; unchanged).
The Safari path uses the default DOM renderer, not the canvas addon — adding @xterm/addon-canvas would be a bigger dependency change and the DOM renderer already renders box-drawing correctly on Safari per the issue and xterm.js docs.

cc @bb @W0921

Closes #18773

Changed files

web/src/pages/ChatPage.tsx (modified, +38/-1)

PR #17349: fix(compressor): shrink protect_first_n on recompaction (#17344)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17349

Description (problem / solution / changelog)

Closes #17344.

Bug

Reporter traced a 6-session compression chain in which every child session carried the identical original first user request — as if no progress had been made. After resume (or on new sessions opened post-compression), the model re-executes the original first task instead of continuing from the handoff summary's ## Active Task.

Root cause

ContextCompressor protects protect_first_n=3 messages at the head — [system, user1, assistant1]. On every cycle that head is preserved verbatim:

[system + compaction note, user1 (ORIGINAL), assistant1, summary, …tail…, latest_user]

SUMMARY_PREFIX says "resume from ## Active Task," but user1 (ORIGINAL) is sitting right next to it as a still-prominent user-role message. The model latches onto the first plausible unanswered request and re-executes it — structured summary prose loses against direct attention on a user message. After 6 cycles that same user1 has been re-anchored 6 times.

Fix

On the second and subsequent compactions — detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time — shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool, and the structured ## Active Task section becomes the sole steering signal as designed.

The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default.

effective_protect_first_n = self.protect_first_n
if self._is_recompaction(messages) and self.protect_first_n > 1:
    effective_protect_first_n = 1

Detection signal

Reuses the existing compaction note already written to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring ("earlier conversation turns have been compacted into a handoff summary") so PR #17301 — which expands the note text — will not break detection. New helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper.

Why not just strengthen `SUMMARY_PREFIX`?

The prefix already says "Respond ONLY to the latest user message that appears AFTER this summary." Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: "the model responds as if the session had just started." That's an architectural problem, not a wording problem.

Coordination with PR #17301 / #17251

PR #17301 (open, by @HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies "background reference" framing to memory and skills. Both fixes stem from the same root concern (compaction handoff misinterpreted by the model) but are orthogonal — #17301 carves out exceptions inside SUMMARY_PREFIX text; this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter.

Out of scope

The reporter also flagged parent_session_id = NULL observations on chained sessions. That's a separate DB-write concern — run_agent.py:8891 explicitly passes parent_session_id=old_session_id and resolve_resume_session_id (#15000) handles chain-walking. If NULL is observed it's likely a different write-path failure and deserves its own bug. This PR stays focused on the message-level fix that unbreaks the user-visible "restarts first task" behaviour.

Tests

TestIsRecompaction (6 cases) — sentinel detection edge cases:

test_fresh_system_prompt_is_not_recompaction
test_system_prompt_with_compaction_note_is_recompaction
test_empty_messages_safe
test_non_system_first_message_is_not_recompaction
test_multimodal_system_content_is_inspected
test_garbage_content_does_not_raise

TestRecompactionShrinksProtectFirstN (5 cases) — behavioural:

test_first_compaction_preserves_first_exchange_in_head (control)
test_recompaction_demotes_first_exchange_to_summary (the bug)
test_recompaction_preserves_latest_user_message_in_tail
test_recompaction_keeps_protect_first_n_attribute_unchanged
test_protect_first_n_one_no_op_for_recompaction

TestRecompactionMinForCompressGate (1 case) — _min_for_compress early-return uses the effective (post-shrink) head count.

$ python -m pytest tests/agent/test_context_compressor.py \
                   tests/agent/test_context_compressor_recompaction.py \
                   tests/run_agent/test_compression_boundary_hook.py \
                   tests/run_agent/test_compression_persistence.py \
                   tests/run_agent/test_413_compression.py -q
99 passed in 8.26s

87 pre-existing + 12 new, zero regressions.

Changed files

agent/context_compressor.py (modified, +48/-2)
tests/agent/test_context_compressor_recompaction.py (added, +234/-0)

PR #17329: fix(delegate): surface tool_trace on N-API-call subagent timeouts (#17308)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17329

Description (problem / solution / changelog)

Closes #17308.

Problem

When a subagent under delegate_task times out after making >0 API calls, the lead agent gets a vague string and nothing else:

Subagent timed out after 120s with 3 API call(s) completed — likely stuck on a slow API call or unresponsive network request.

There's no way to tell apart the two failure modes:

Tool finished, next LLM request hung — the tool itself is fine; the provider froze.
Tool itself hung — network partition, blocked I/O, etc.

This was the gap between the two existing diagnostic paths:

Path	Coverage
Normal completion (#1175)	`tool_trace` in return dict
0-API-call timeout (#15105)	`diagnostic_path` with structured log
N-API-call timeout	None ← this PR

Fix

Three pieces:

1. Extract a shared trace builder

The normal-completion branch already reconstructs tool_trace from result['messages']. Pulled that loop out into a module-level _build_tool_trace_from_messages() helper so both branches use one implementation.

2. Reconstruct trace on the N-API-call timeout branch

In _run_single_child's timeout branch (when is_timeout and child_api_calls > 0):

Read child._session_messages and run it through the helper.
If the trace tail has no matching tool-role response → mark status='in_progress' (the tool itself is hung).
Read get_activity_summary().current_tool. If it disagrees with the trace tail, prefer it — the tool-role write can lag because the agent writes the assistant message first and the tool response only after the tool returns.

3. Surface the diagnostics

Return dict now carries tool_trace, last_tool, last_tool_status, current_tool. Error message gets a last_tool=X (status=Y) suffix so it shows up in logs and the lead's prompt:

Subagent timed out after 120s with 3 API call(s) completed — likely stuck on a slow API call or unresponsive network request. last_tool=terminal (status=in_progress)

0-API-call timeouts (diagnostic_path branch) and non-timeout errors leave the new fields empty/None so consumers don't read stale data.

Tests

Added two test classes in tests/tools/test_delegate_subagent_timeout_diagnostic.py:

TestRunSingleChildTimeoutToolTrace — end-to-end through _run_single_child with a tiny timeout:

test_timeout_after_completed_tool_marks_status_ok — tool returned cleanly → status=ok, current_tool=None
test_timeout_inside_running_tool_marks_status_in_progress — tool never returned → status=in_progress, current_tool set
test_timeout_with_tool_error_preserves_error_status — error responses keep status=error
test_timeout_with_parallel_tool_calls_pairs_by_id — out-of-order replies still pair correctly
test_zero_api_call_timeout_skips_tool_trace — 0-API branch keeps the new fields empty (no stale data alongside diagnostic_path)
test_timeout_with_no_session_messages_attr_does_not_crash — degrades to empty trace if _session_messages is absent

TestBuildToolTraceFromMessages — direct unit tests for the extracted helper (non-list input, non-dict entries, assistants without tool_calls, tool responses without tool_call_id).

$ python -m pytest tests/tools/test_delegate_subagent_timeout_diagnostic.py -q
.................                                                       [100%]
17 passed in 3.88s

Combined with the existing test_delegate.py suite: 137/137 pass.

Changed files

tests/tools/test_delegate_subagent_timeout_diagnostic.py (modified, +254/-0)
tools/delegate_tool.py (modified, +113/-32)

PR #17325: fix(telegram): stop large videos from triggering infinite model fallback (#17302)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17325

Description (problem / solution / changelog)

Closes #17302.

Summary

When a Telegram video > 20 MB hits the bot, getFile() raises BadRequest("File is too big"). The current handler catches the exception, logs a warning, and falls through to handle_message(event) with an effectively empty event — the agent then burns through every fallback model (15+ retries in the reporter's logs) trying to respond to nothing.

This PR fixes it with three layered defenses, mirroring the size-check pattern that the non-video document branch already uses:

1. Pre-check `file_size` before downloading

Both the native msg.video branch and the video-as-document branch now verify file_size <= 20 MB before calling get_file(). Oversize / unverifiable videos short-circuit with a user-visible message ("Telegram's Bot API limits file downloads to 20 MB…") and message_type=VIDEO. The agent gets a meaningful event instead of a blank one.

2. Trap "File is too big" inside the `except` block

For forwarded-video / edited-message edge cases where file_size lies, the BadRequest is now caught at runtime and event.text is set to an explanatory message instead of being left blank. This is the surgical fix that prevents the fallback storm even when the pre-check is bypassed.

3. Optional opt-out: `telegram.extra.ignore_videos: true`

When set, video messages (native and video/* MIME documents) are dropped at the top of _handle_media_message, before any work is done. Other media (PDFs, photos, voice, audio) is unaffected.

Why not the issue's exact diff

The issue's proposed diff calls self._send_safe_message(...) which doesn't exist in this codebase (only self._bot.send_message(...) does). I kept the spirit of the suggestion — the friendly text message — but routed it through the existing event.text + handle_message() path that the document handler already uses for "Unsupported document type" and "too large or unverifiable", so the fix is consistent with the surrounding code rather than introducing a new send pattern.

Tests

Added 8 new tests in tests/gateway/test_telegram_documents.py::TestVideoDownloadBlock:

test_oversize_native_video_short_circuits_with_friendly_text
test_unverifiable_native_video_size_short_circuits (parity with the existing document file_size=None security fix)
test_oversize_video_document_short_circuits
test_native_video_get_file_too_big_does_not_send_blank_event
test_video_document_get_file_too_big_does_not_send_blank_event
test_ignore_videos_config_skips_native_video
test_ignore_videos_config_skips_video_documents
test_ignore_videos_does_not_block_pdfs

The existing _make_video() helper now defaults file_size=1024 so the prior happy-path test still passes through the new gate.

$ python -m pytest tests/gateway/test_telegram_documents.py -q
............................................                          [100%]
44 passed in 3.50s

python3 -c "import ast; ast.parse(open('gateway/platforms/telegram.py').read())" clean.

Out of scope

The reporter's _send_safe_message reply-via-Telegram pattern is more invasive than needed; the agent-event path is sufficient and consistent with how the document handler already communicates blocked uploads. Happy to add a direct reply if maintainers prefer it.

Changed files

gateway/platforms/telegram.py (modified, +82/-4)
tests/gateway/test_telegram_documents.py (modified, +120/-1)

PR #17323: docs(run_agent): note xiaomi/MiMo empirical exclusion from reasoning whitelist (#17314)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17323

Description (problem / solution / changelog)

Closes #17314 (the suggested follow-up #1 — inline doc).

Summary

#17314 empirically established that xiaomi/MiMo (mimo-v2.5-pro) accepts reasoning_effort at the schema layer but produces statistically indistinguishable reasoning depth, length, and accuracy across none / low / medium / high (4 efforts × N=5 on AIME 2025 II P2; Mann-Whitney U pairwise p > 0.1 on every pair; 100% accuracy on all 20 trials; identical solution paths in inspected reasoning_content).

The conservative default in _supports_reasoning_extra_body() — not whitelisting xiaomi — is therefore correct. Forwarding the field would just ship a no-op.

This PR adds a docstring note documenting the empirical test so a future PR doesn't "complete the list" by adding xiaomi without first re-verifying server-side behavior.

Changes

run_agent.py: extended _supports_reasoning_extra_body() docstring with an "Empirically excluded providers" section noting the xiaomi/MiMo result, the test methodology, and the link back to #17314.
No code-path changes. No behavior changes. AST-parses clean.

Not included (intentional)

The issue's optional follow-up #2 — surfacing a startup warning when agent.reasoning_effort is set in config.yaml for a provider that won't forward it — is broader-scope (touches every excluded provider, not just xiaomi) and is left as a separate optional follow-up so this PR stays a minimal, mergeable doc-only change.

Tests

python3 -c "import ast; ast.parse(open('run_agent.py').read())" clean.
Docs-only change; no functional test surface.

Changed files

run_agent.py (modified, +14/-0)

PR #16381: fix(doctor): use importlib.util.find_spec for editable-install detection

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16381

Description (problem / solution / changelog)

Closes #16365.

Problem

hermes doctor reports ⚠ tinker-atropos found but not installed even when the package is installed editably (uv pip install -e ./tinker-atropos) and importable from the same interpreter — exactly the case @rbrowning85 hit.

$ python -c "import tinker_atropos; print('OK')"
OK
$ uv pip list | grep tinker
tinker-atropos 0.1.0 (editable install)
$ hermes doctor
◆ Submodules
⚠ tinker-atropos found but not installed (run: uv pip install -e ./tinker-atropos)

Root cause: the doctor check uses __import__("tinker_atropos") inside a try/except ImportError. That probe can fail in launcher contexts (e.g. ~/.local/bin/hermes) whose sys.path or import-machinery state differs from the active shell — particularly for editable installs hooked through .pth shims — even though the spec is locatable via importlib.

Fix

Adopt the issue's preferred approach (option 2) and centralize it in a _module_available() helper:

def _module_available(module: str) -> bool:
    import importlib.util
    try:
        return importlib.util.find_spec(module) is not None
    except (ImportError, ValueError):
        return False

find_spec only checks importability — never executes the module body — so it's:

Reliable for editable installs across launcher contexts.
Side-effect free.
Robust to a transitive dep failing to import (we only care whether tinker_atropos itself is locatable).
Defensive against ValueError from malformed spec strings and ImportError from a parent package failing to load.

Then swap the __import__ probe in the tinker-atropos branch (hermes_cli/doctor.py) for _module_available("tinker_atropos").

I deliberately scoped this to the tinker-atropos check rather than the general required_packages / optional_packages loops, since those check user-facing third-party libs where the existing import-with-side-effects probe is fine and the install-cmd advice is the goal anyway. Happy to broaden if maintainers prefer.

Tests

tests/hermes_cli/test_doctor.py::TestModuleAvailable — 5 new tests:

test_returns_true_for_stdlib_module — sanity check (json).
test_returns_false_for_missing_module — non-existent module returns False.
test_returns_false_on_value_error — malformed spec swallowed.
test_returns_false_when_parent_package_fails — parent-package ImportError swallowed.
test_does_not_execute_module_body — confirms find_spec path doesn't import the body, the property that fixes the editable-install case.

pytest tests/hermes_cli/test_doctor.py -q
28 passed (23 pre-existing + 5 new)

Changed files

hermes_cli/doctor.py (modified, +27/-3)
tests/hermes_cli/test_doctor.py (modified, +48/-0)

PR #16380: fix(error_classifier): gate absolute msg/token heuristics to small context windows

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16380

Description (problem / solution / changelog)

Closes #16351.

Problem

agent/error_classifier.py flagged non-context errors as context_overflow in long-context (1M) Codex/GPT-5.x sessions, purely because num_messages > 80 (generic 400) or num_messages > 200 (disconnect) — even when approx_tokens was a fraction of the actual budget.

Repro from the issue:

classify_api_error(
    FakeHTTP400(),
    provider="openai-codex",
    model="gpt-5.5",
    approx_tokens=74320,
    context_length=1_000_000,
    num_messages=432,
)
# Before: FailoverReason.context_overflow (retryable=True, should_compress=True)
# After:  FailoverReason.format_error      (retryable=False, should_compress=False)

That sent format errors into the compression/probe-down path, causing unnecessary compaction and stale handoff pollution on 1M sessions.

Fix

Apply exactly the gate suggested in the issue body: scope absolute token/message-count fallbacks to context_length <= 256000. Relative pressure thresholds (> 0.6 for disconnect, > 0.4 for generic 400) still fire on any context size.

# server disconnect path
is_large = approx_tokens > context_length * 0.6 or (
    context_length <= 256000 and (approx_tokens > 120000 or num_messages > 200)
)

# generic 400 path
is_large = approx_tokens > context_length * 0.4 or (
    context_length <= 256000 and (approx_tokens > 80000 or num_messages > 80)
)

Existing behavior for ~128K/200K context windows is unchanged.

Tests

tests/agent/test_error_classifier.py — 4 new tests covering the 1M-context regime:

test_400_generic_1m_context_high_message_count_not_overflow — exact repro from issue (74K tokens, 432 msgs, 1M ctx) → format_error.
test_400_generic_1m_context_relative_pressure_still_overflow — 500K tokens / 1M ctx still → context_overflow.
test_disconnect_1m_context_high_message_count_is_timeout — 150K tokens, 300 msgs, 1M ctx → timeout.
test_disconnect_1m_context_relative_pressure_still_overflow — 700K tokens / 1M ctx still → context_overflow.

pytest tests/agent/test_error_classifier.py -q
122 passed (118 pre-existing + 4 new)

Changed files

agent/error_classifier.py (modified, +6/-2)
tests/agent/test_error_classifier.py (modified, +62/-0)

PR #16373: fix(memory): return existing entry previews on zero-match in replace/remove

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16373

Description (problem / solution / changelog)

Closes #16266.

Problem

memory.replace / memory.remove zero-match returned a bare "No entry matched '...'" error with no context about what entries actually exist. LLMs that paraphrased instead of substring-matching would burn a full tool round trip every time, producing the consistent [error] → retry-success pattern @ahmadhawamdah documented.

Fix

Mirror the multi-match branch behavior on zero-match: return the same 80-char truncated previews under a new existing_entries field so the next LLM call can pick a correct old_text. Symmetric in remove().

Diff

tools/memory_tool.py — replace + remove zero-match returns:

previews = [e[:80] + ("..." if len(e) > 80 else "") for e in entries]
return {
    "success": False,
    "error": f"No entry matched '{old_text}'.",
    "existing_entries": previews,
}

Additive on the error response — no schema change for callers.

Tests

tests/tools/test_memory_tool.py — 2 new tests:

test_replace_no_match_returns_existing_previews — asserts existing_entries shape, truncation, ellipsis bound.
test_remove_no_match_returns_existing_previews — same for remove.

pytest tests/tools/test_memory_tool.py -q
35 passed (33 pre-existing + 2 new)

Changed files

tests/tools/test_memory_tool.py (modified, +19/-0)
tools/memory_tool.py (modified, +12/-2)

PR #14354: fix(ssh): forward skill-allowlisted env vars over SSH via SendEnv

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14354

Description (problem / solution / changelog)

Fixes #14091

The bug

tools/env_passthrough.py already builds an allowlist of env vars that should reach sandboxed environments — populated from skill required_environment_variables frontmatter and terminal.env_passthrough config. The local and code_execution backends consult is_env_passthrough() / get_all_passthrough(), but SSHEnvironment never did.

Result: when terminal_backend: ssh, every ssh ... bash -c subprocess inherits a stripped child environment that excludes the allowlist, and the remote bash session sees no skill-declared variables — even with AcceptEnv * configured on the remote sshd.

The reporter's diagnosis is correct: the SSH command was missing -o SendEnv=NAME.

Fix

In _build_ssh_command():

Append -o SendEnv=<NAME> for every var in get_all_passthrough() that is actually present in os.environ.
Names are sorted for deterministic command construction (so ControlMaster connection reuse stays stable).

In _run_bash():

-o SendEnv only forwards names; the OpenSSH client reads values from its own process environment. Pass them explicitly via a new _build_subprocess_env() so allowlisted vars are guaranteed to be present even if a future caller scrubs the parent env.

Failure to import or call get_all_passthrough() is non-fatal — SSH still works, just without forwarding (matches the existing best-effort posture in skills_tool.py).

Why this is the right layer

Same allowlist source as local / code_execution — single source of truth (get_all_passthrough()), so security guarantees stay consistent (Hermes provider credentials are still blocked from skill registration per GHSA-rhgp-j443-p4rf).
Doesn't bypass remote sshd policy — admins still need AcceptEnv on the remote (the issue notes AcceptEnv *); we just stop silently dropping the names client-side.
Zero behavior change when no skill/config registers anything — empty allowlist ⇒ no SendEnv flags, no env override.

Verification

uv run --frozen --python 3.11 --extra dev pytest -o addopts='' \
  tests/tools/test_ssh_environment.py \
  tests/tools/test_ssh_bulk_upload.py \
  tests/tools/test_sync_back_backends.py -q
52 passed, 11 skipped

7 new targeted tests in TestBuildSSHCommand:

test_no_send_env_when_no_passthrough_registered — zero behavior change when allowlist empty
test_send_env_added_for_registered_passthrough_var — the actual #14091 case
test_send_env_skips_unset_vars — allowlisted-but-unset vars don't leak as empty SendEnv lines
test_send_env_is_deterministic — sorted order for ControlMaster reuse stability
test_passthrough_failure_is_non_fatal — SSH keeps working if env_passthrough breaks
test_subprocess_env_includes_passthrough_values — values propagated to ssh client process env
test_subprocess_env_is_none_when_no_passthrough — don't override default child-env semantics unnecessarily

Notes

Minimal, focused diff. Uses the existing env_passthrough infrastructure rather than introducing a parallel mechanism.
Doesn't touch skills_tool.py's setup_needed reporting — that's accurate for the registration side; the bug was purely on the SSH consumer side.
No new dependencies.

Changed files

tests/tools/test_ssh_environment.py (modified, +83/-0)
tools/environments/ssh.py (modified, +47/-1)

PR #14332: fix(gateway): treat recycled PID with unreadable start_time as stale (#14176)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14332

Description (problem / solution / changelog)

What does this PR do?

gateway/status.py::find_gateway_pids() iterates over the PIDs recorded in ~/.hermes/gateway.lock to decide whether the gateway is "still running". For each candidate it:

Checks the PID is alive (os.kill(pid, 0)).
Compares the recorded start_time against the live process's start_time to detect PID recycling.
Falls back to _looks_like_gateway_process(pid) / _record_looks_like_gateway(record) heuristics.

When the recycled PID is owned by a different UID (typical on Linux when /proc/<pid>/stat is owned by another user, or under rootless container setups), _get_process_start_time returns None. The recorded-vs-live mismatch check then can't fire (current_start is None), and _looks_like_gateway_process can give a false positive on any long-lived python or hermes-related process the user happens to own. Result: the gateway thinks it's still running, refuses to start, and the user has to manually rm ~/.hermes/gateway.pid to recover.

Reporter (#14176) sees this in production with a systemd user service that restarts the gateway nightly — every few weeks the next PID up the queue lands on a recycled foreign PID, the lock file goes stale, and hermes gateway start fails with "Gateway already running".

Fix: be conservative. When the PID record carries a recorded_start but we can't read the candidate's current_start, skip the candidate (treat as stale) instead of falling through to the heuristic. Outside /proc-readable territory we don't have enough information to confirm this is the same gateway process, so prefer "no" over "maybe".

Related Issue

Fixes #14176

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

gateway/status.py (+10 / −0): in the find_gateway_pids() candidate loop, skip any PID whose recorded start_time exists but whose live start_time is unreadable. Same code path as the existing recorded-vs-live mismatch case, just covering the unreadable variant.
tests/gateway/test_status.py (+40 / −0): one new regression case under TestGatewayPidState, test_get_running_pid_treats_recycled_pid_with_unreadable_start_time_as_stale. Monkeypatches _get_process_start_time to return None and _looks_like_gateway_process to return True (the strongest stress for the false-positive path) and asserts the PID file is cleaned and get_running_pid() returns None.

Core diff:

         recorded_start = record.get("start_time")
         current_start = _get_process_start_time(pid)
         if recorded_start is not None and current_start is not None and current_start != recorded_start:
             continue
+        # If the PID record carries a recorded start_time but we can't read
+        # the current process's start_time, the PID may have been recycled by
+        # the OS to a process the current user can't introspect (typical on
+        # Linux when /proc/<pid>/stat is owned by another UID). The downstream
+        # _looks_like_gateway_process heuristic can give a false positive in
+        # that situation — e.g. another long-lived python process — leaving
+        # a stale PID file that blocks future starts. Be conservative and
+        # skip this candidate. See #14176.
+        if recorded_start is not None and current_start is None:
+            continue

         if _looks_like_gateway_process(pid) or _record_looks_like_gateway(record):
             return pid

How to Test

Reporter-style repro on a Linux host:

Run the gateway, kill -9 the parent process to leave ~/.hermes/gateway.pid and ~/.hermes/gateway.lock populated with the dead PID.
Start a long-lived python process under a different UID (e.g. another hermes daemon under another account) that the test user can see via ps but NOT via /proc/<pid>/stat. Note its PID.
Edit the lock file to point at that recycled PID, keeping the original start_time field intact.
Run hermes gateway start.

Before: refuses to start with "Gateway already running". After: detects the start_time mismatch is unverifiable, treats the entry as stale, cleans the lock file, and starts a fresh gateway.

Automated regression suite:

pytest tests/gateway/test_status.py::TestGatewayPidState -q

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(gateway):)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run pytest tests/gateway/test_status.py::TestGatewayPidState -q and all tests pass (11/11)
I've added tests for my changes
I've tested on my platform: macOS 26.5 (arm64), Python 3.11.14 via uv

Documentation & Housekeeping

Documentation updates — N/A (internal helper, no user-visible API change beyond bug fix)
cli-config.yaml.example — N/A (no new config)
CONTRIBUTING.md / AGENTS.md — N/A
Cross-platform impact considered — change is conservative on every platform; the false positive fix matters most on Linux but doesn't regress macOS/Windows behavior
Tool descriptions/schemas — N/A

Not in scope

The reporter's bash script idea (a dedicated hermes gateway clean-pid command) — that's nice-to-have but a separate UX surface; the in-band fix here is the higher-impact change since it stops the bad state from forming.
Auditing gateway.target / Restart= semantics in the example systemd unit (the issue's secondary note) — that's a docs change for docs/deploy/ that deserves its own PR.
Hardening atexit-vs-SIGKILL paths so a kill -9 of the gateway doesn't leave a PID file behind — a real concern but out of scope for the reported bug, which is about PID-file interpretation, not creation.

Screenshots / Logs

Verification

$ python3 -m py_compile gateway/status.py tests/gateway/test_status.py
OK

$ uv run --no-project --with pytest --with pytest-xdist --with pyyaml \
       --with python-dotenv --with prompt_toolkit --with rich --with httpx \
       --with fastapi --with pydantic python -m pytest \
       tests/gateway/test_status.py::TestGatewayPidState -q
...........                                                              [100%]
11 passed in 0.51s

(11 = 10 existing + 1 new regression case, all green.)

Changed files

gateway/status.py (modified, +10/-0)
tests/gateway/test_status.py (modified, +40/-0)

PR #13957: fix(skills): raise system-prompt skill description limit to match runtime tool (#13944)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/13957

Description (problem / solution / changelog)

What does this PR do?

The skill index injected into the system prompt hard-truncated every skill description to 60 characters, while the runtime skills_list() tool (tools/skills_tool.py) allowed up to 1024. The LLM saw a vague prefix in the system prompt — where the routing decision is actually made — and only got the full description after deciding to call skills_list(). That's backwards: the trigger criteria need to be visible at system-prompt time so the model can decide whether to route to the skill.

Example before vs after:

Before:  "Complete guide to using and extending Hermes Agent — CLI ..."
After:   "Complete guide to using and extending Hermes Agent — CLI tooling, skill authoring, and gateway integration"

Fix: introduce SKILL_INDEX_MAX_DESCRIPTION_LENGTH = 1024 in agent/skill_utils.py and use it in extract_skill_description(). Descriptions under the limit are returned verbatim; over-limit ones are truncated to exactly SKILL_INDEX_MAX_DESCRIPTION_LENGTH with a trailing "..." included in the budget (same contract the runtime tool uses).

Related Issue

Fixes #13944

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

agent/skill_utils.py (+19 / −4): new SKILL_INDEX_MAX_DESCRIPTION_LENGTH constant; extract_skill_description() uses it instead of the hardcoded 60; docstring updated to describe the contract + issue context.
tests/agent/test_extract_skill_description.py (+68, new file): 8 regression cases covering empty / short / boundary / long-below / at-new-limit / over-new-limit / strip() preservation, plus one that locks in equality with tools.skills_tool.MAX_DESCRIPTION_LENGTH so the two paths can't silently drift apart again.

Core diff:

-def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
-    """Extract a truncated description from parsed frontmatter."""
+SKILL_INDEX_MAX_DESCRIPTION_LENGTH = 1024
+
+
+def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
+    """Extract a (possibly truncated) description from parsed frontmatter.
+
+    Descriptions under ``SKILL_INDEX_MAX_DESCRIPTION_LENGTH`` are returned
+    verbatim. Longer ones are truncated to that length, with a trailing
+    ``"..."`` included in the budget.
+    """
     raw_desc = frontmatter.get("description", "")
     if not raw_desc:
         return ""
     desc = str(raw_desc).strip().strip("'\"")
-    if len(desc) > 60:
-        return desc[:57] + "..."
+    if len(desc) > SKILL_INDEX_MAX_DESCRIPTION_LENGTH:
+        return desc[: SKILL_INDEX_MAX_DESCRIPTION_LENGTH - 3] + "..."
     return desc

How to Test

Before:

from agent.skill_utils import extract_skill_description
desc = "Complete guide to using and extending Hermes Agent — CLI tooling, skill authoring, and gateway integration"
extract_skill_description({"description": desc})
# 'Complete guide to using and extending Hermes Agent — CLI...'   # 60 chars

After: returns the full description verbatim (108 chars).

pytest tests/agent/test_extract_skill_description.py -q

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(skills):)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run pytest tests/agent/test_extract_skill_description.py -q and all tests pass
I've added tests for my changes (8 new regression cases)
I've tested on my platform: macOS 26.5 (arm64), Python 3.11.14 via uv

Documentation & Housekeeping

Documentation updates — N/A (internal constant; docstring updated in-place)
cli-config.yaml.example — N/A (no new config)
CONTRIBUTING.md / AGENTS.md — N/A
Cross-platform impact — string slicing, no OS-specific paths
Tool descriptions/schemas — N/A

Screenshots / Logs

Verification

$ uv run --with pytest --with pytest-xdist python -m pytest \
    tests/agent/test_extract_skill_description.py -v
........                                                                 [100%]
8 passed in 0.56s

Not in scope

The issue author's long-term suggestion (prompt_builder consuming skills_tool's metadata rather than maintaining a parallel parsing implementation) is out of scope for this PR — that's an architectural refactor that deserves its own proposal and review. This PR implements the reporter's minimum fix and adds a regression guard so the two paths can't silently drift apart again.

Changed files

agent/skill_utils.py (modified, +18/-3)
tests/agent/test_extract_skill_description.py (added, +65/-0)

PR #13937: fix(skills): honor platform_disabled config in gateway-built system prompts (#13851)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/13937

Description (problem / solution / changelog)

What does this PR do?

build_skills_system_prompt() resolves the active platform via HERMES_PLATFORM (os.environ) and HERMES_SESSION_PLATFORM (contextvar). When the gateway builds system prompts on an async task that doesn't inherit the session contextvars — the common path on the Signal gateway — both lookups return empty and skills.platform_disabled.<platform> is silently ignored. Every skill ships into the system prompt, which inflates the combined system-prompt+tools payload above the ~25K-char threshold where local LLMs (gemma4:26b, hermes3, qwen3:14b, mistral-small3.1) stop calling tools and just reply with text. Hermes Agent becomes effectively unusable with local models when many skills are installed.

The fix adds an explicit platform= parameter to build_skills_system_prompt() that is authoritative for both get_disabled_skill_names() resolution and the cache key. run_agent.AIAgent already carries self.platform from the gateway constructor (AIAgent(platform=platform_key, ...) in gateway/run.py:9700), so we thread it through the only caller site.

Backward compatible: platform=None retains the existing env/contextvar fallback chain so CLI, cron, and web-server callers are unchanged.

Related Issue

Fixes #13851

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

agent/prompt_builder.py (+19 / −3): add platform kwarg to build_skills_system_prompt(); it takes priority over HERMES_PLATFORM / HERMES_SESSION_PLATFORM when resolving the platform hint that's passed into get_disabled_skill_names() and used as the cache-key dimension. Updated docstring to describe the gateway contextvar-propagation problem the new parameter solves.
run_agent.py (+1): pass self.platform when calling build_skills_system_prompt from the agent's system-prompt build path. AIAgent already stores the platform on self.platform (line 821) from the gateway constructor.
tests/agent/test_prompt_builder.py (+115): four new regression tests under TestBuildSkillsSystemPrompt.

Implementation diff (core change):

 def build_skills_system_prompt(
     available_tools: "set[str] | None" = None,
     available_toolsets: "set[str] | None" = None,
+    platform: "str | None" = None,
 ) -> str:
     ...
     _platform_hint = (
-        os.environ.get("HERMES_PLATFORM")
+        platform
+        or os.environ.get("HERMES_PLATFORM")
         or get_session_env("HERMES_SESSION_PLATFORM")
         or ""
     )
-    disabled = get_disabled_skill_names()
+    disabled = get_disabled_skill_names(platform=_platform_hint or None)

 # run_agent.py
             skills_prompt = build_skills_system_prompt(
                 available_tools=self.valid_tool_names,
                 available_toolsets=avail_toolsets,
+                platform=self.platform,
             )

How to Test

Reporter's scenario:

# config.yaml
skills:
  platform_disabled:
    signal:
      - apple-notes
      - apple-reminders
      # ... 107 skill names

hermes gateway run --replace
# Send a message via Signal.

Before: all 129 skills appear in the system-prompt index, combined payload ~53K chars, local LLMs don't emit tool calls. After: only the non-disabled skills appear, payload drops proportionally, local LLMs recover the ability to call tools.

Automated regression suite:

pytest tests/agent/test_prompt_builder.py::TestBuildSkillsSystemPrompt -q

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(skills):)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run the scoped tests (tests/agent/test_prompt_builder.py::TestBuildSkillsSystemPrompt) and all tests pass
I've added tests for my changes (4 new regression cases)
I've tested on my platform: macOS 26.5 (arm64), Python 3.11.14 via uv

Documentation & Housekeeping

I've updated relevant documentation — N/A (platform= is an internal kwarg on an internal helper; docstring updated in-line)
I've updated cli-config.yaml.example — N/A (no new config)
I've updated CONTRIBUTING.md / AGENTS.md — N/A
I've considered cross-platform impact — pure Python, no OS-specific code paths touched
I've updated tool descriptions/schemas — N/A

Not in scope

The issue also mentions platform_toolsets.signal being ignored (26 tools passed to the model instead of the configured 7). That's a separate architectural problem — platform_toolsets filtering lives in hermes_cli/tools_config.py and the gateway invocation path doesn't consult it the same way the skills path does. It belongs in its own PR with its own regression suite, and the skills fix here already delivers the 16K-char savings from the 107 disabled skills — often enough to get the payload back under the local-LLM threshold on its own.

Screenshots / Logs

Regression tests added

Test	Scenario
`test_explicit_platform_param_disables_skills_for_that_platform`	`build_skills_system_prompt(platform="signal")` excludes skills listed under `skills.platform_disabled.signal`
`test_explicit_platform_param_wins_over_env_vars`	Caller-provided `platform=` overrides `HERMES_PLATFORM` env var
`test_platform_none_falls_back_to_env`	No `platform=` kwarg → legacy env-based resolution still works (backward compat)
`test_platform_in_cache_key_prevents_cross_platform_leak`	Back-to-back calls with different platforms return correctly filtered outputs (no cache collision)

Verification

$ python3 -m py_compile agent/prompt_builder.py run_agent.py tests/agent/test_prompt_builder.py
OK

$ uv run --with pytest --with pytest-xdist python -m pytest \
    tests/agent/test_prompt_builder.py::TestBuildSkillsSystemPrompt -q
..............                                                           [100%]
14 passed in 0.66s

(14 tests = 10 existing + 4 new regression cases, all green. No existing behavior regressed.)

Changed files

agent/prompt_builder.py (modified, +19/-3)
run_agent.py (modified, +1/-0)
tests/agent/test_prompt_builder.py (modified, +115/-0)

PR #18989: fix(doctor): use get_env_value for API keys and base URLs instead of os.getenv

Repository: NousResearch/hermes-agent
Author: shellybotmoyer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18989

Description (problem / solution / changelog)

Problem

hermes_cli/doctor.py uses os.getenv() to read API keys and base URL env vars, which only reads from os.environ. Values stored in ~/.hermes/.env (the standard location for API keys) are silently missed.

This causes:

API key blindspot: doctor reports keys as "not configured" when they exist only in ~/.hermes/.env
Base URL fallback: providers with custom BASE_URL in .env (like DASHSCOPE_BASE_URL) fall back to wrong registry defaults, causing false "invalid key" reports

Same pattern as #18757 (fixed in auth.py) — os.getenv() vs get_env_value() inconsistency.

Fix

Replace os.getenv() with get_env_value() (with os.getenv() fallback) at 3 call sites:

Line 998: openrouter_key = os.getenv("OPENROUTER_API_KEY") → _get_env_value("OPENROUTER_API_KEY") or os.getenv("OPENROUTER_API_KEY")
Line 1111: _key = os.getenv(_ev, "") in provider loop → _get_env_value(_ev) or os.getenv(_ev, "")
Line 1123: _base = os.getenv(_base_env, "") for base URL → _get_env_value(_base_env) or os.getenv(_base_env, "")

The fallback to os.getenv() ensures values injected directly into the process environment (e.g., Docker, CI) still work.

#18757 — same fix applied to auth.py
Skill docs note this as a known standing issue: doctor.py line 1122 (_base = os.getenv(_base_env, "")) still uses os.getenv

Changed files

hermes_cli/doctor.py (modified, +4/-3)

PR #19021: test(credential_pool): align with .env-first seed precedence

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19021

Description (problem / solution / changelog)

Summary

Fixes one Tests failure observed on main (and therefore propagating to every open PR):

FAILED tests/tools/test_credential_pool_env_fallback.py::TestCredentialPoolSeedsFromDotEnv::test_os_environ_still_wins_over_dotenv
  AssertionError: assert 'sk-dotenv-stale' == 'sk-env-fresh-xyz'

Reference run: 25250051126 on 5d3be898a.

Root cause

The test docstring says:

get_env_value checks os.environ first — verify seeding picks that up.

…which was true for _seed_from_env until commit 2ef1ad280 ("fix: prefer ~/.hermes/.env over os.environ when seeding credential pool", fixes #18254). That commit introduced a private _get_env_prefer_dotenv helper and deliberately flipped the precedence:

# Prefer ~/.hermes/.env over os.environ — the user's config file is the
# authoritative source for Hermes credentials. Stale env vars from parent
# processes (Codex CLI, test scripts, etc.) should not override deliberate
# changes to the .env file.
def _get_env_prefer_dotenv(key: str) -> str:
    env_file = load_env()
    val = env_file.get(key) or os.environ.get(key) or ""
    return val.strip()

The fix was correct (#18254 reported real users hitting silent 401s with stale auth.json caches), but the unit test was never updated to match.

Fix

Rename + rewrite the test to assert the current, deliberate behaviour: .env wins over os.environ, with a docstring explaining why (stale shell env vars from parent processes shadowing deliberate .env edits, leading to cached 401s in auth.json).

The cousin Auth* tests below already exercise os.environ-first semantics for _resolve_api_key_provider_secret (which still uses get_env_value), so the two precedence policies are both pinned now.

Validation

$ pytest tests/tools/test_credential_pool_env_fallback.py -q
9 passed in 1.59s

Refs

#18254 — bug report that motivated the precedence flip
2ef1ad280 — the precedence flip commit
#18757 — cousin: os.getenv → get_env_value fix on auth.py base_url path

Scope

✅ No production code change (test-only)
✅ All 9 tests in the file pass
✅ Test name + docstring now accurately describe the contract

Out of scope

The other ~10 main-CI failures — separate focused PRs (#18972, #18974, #18977, #18979 already up).

Changed files

tests/tools/test_credential_pool_env_fallback.py (modified, +17/-5)

Code Example

XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/v1

---

auxiliary:
     title_generation:
       provider: xiaomi
       model: mimo-v2.5
       base_url: ''

---

Title generation failed: Error code: 401 - {'error': {'message': 'Invalid API Key', 'param': 'Please provide valid API Key', 'code': '401', 'type': 'invalid_key'}}

---

# API key — uses get_env_value() ✅ (reads .env file)
for env_var in pconfig.api_key_env_vars:
    val = (get_env_value(env_var) or "").strip()

# base_url — uses os.getenv() ❌ (does NOT read .env file)
env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip()

---

env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")

---

# auth.py — resolve_api_key_provider_credentials()
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip()

# runtime_provider.py — same change
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")

RAW_BUFFERClick to expand / collapse

Bug Description

resolve_api_key_provider_credentials() in hermes_cli/auth.py resolves base_url_env_var using os.getenv(), which does not read from ~/.hermes/.env. Providers with a custom base URL stored only in .env (not exported in the shell) get the wrong endpoint — the default inference_base_url from PROVIDER_REGISTRY.

The same function correctly uses get_env_value() for API key resolution, creating an inconsistency: API keys are found, but base URLs are not.

The identical pattern also exists in hermes_cli/runtime_provider.py.

Steps to Reproduce

Configure a provider with a custom base URL (e.g., Xiaomi with token-plan-cn.xiaomimimo.com instead of the default api.xiaomimimo.com)
Set the base URL in ~/.hermes/.env only — do not export it in the shell:
```
XIAOMI_BASE_URL=https://token-plan-cn.xiaomimimo.com/v1
```

Set the same provider as an auxiliary task provider in config.yaml:

auxiliary:
  title_generation:
    provider: xiaomi
    model: mimo-v2.5
    base_url: ''

Start a new conversation via gateway (Telegram/Discord/CLI)
Observe that the auxiliary task (e.g., auto-title) fails with 401

Expected Behavior

The auxiliary client should resolve XIAOMI_BASE_URL from ~/.hermes/.env via get_env_value() and use https://token-plan-cn.xiaomimimo.com/v1 — the same way it resolves XIAOMI_API_KEY.

Actual Behavior

The auxiliary client uses os.getenv("XIAOMI_BASE_URL") which returns None (env var not in os.environ), then falls back to pconfig.inference_base_url (https://api.xiaomimimo.com/v1). The request hits the wrong endpoint and returns 401.

Affected Component

Configuration (config.yaml, .env, hermes setup)
Agent Core (conversation loop, context compression, memory)

Debug Report

N/A — this is a code-level bug identifiable from source. No environment-specific info needed.

Operating System

Ubuntu 24.04 (ARM64)

Python Version

3.12

Hermes Version

v0.12.0 (2026.4.30) — persists on origin/main (98c98821f)

Additional Logs / Traceback

Title generation failed: Error code: 401 - {'error': {'message': 'Invalid API Key', 'param': 'Please provide valid API Key', 'code': '401', 'type': 'invalid_key'}}

Root Cause Analysis

hermes_cli/auth.py L3529-3531 — resolve_api_key_provider_credentials():

# API key — uses get_env_value() ✅ (reads .env file)
for env_var in pconfig.api_key_env_vars:
    val = (get_env_value(env_var) or "").strip()

# base_url — uses os.getenv() ❌ (does NOT read .env file)
env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip()

hermes_cli/runtime_provider.py — same pattern:

env_url = ""
if pconfig.base_url_env_var:
    env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")

Proposed Fix

Replace os.getenv with get_env_value in both locations:

# auth.py — resolve_api_key_provider_credentials()
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip()

# runtime_provider.py — same change
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")

Same Bug Class as #15914 and #17140

This is the same os.getenv() vs get_env_value() pattern fixed in:

PR #16101 (closed #15914) — API key + credential_pool
PR #17434 (closed #17140) — TTS/STT tools

The base_url_env_var resolution was missed in both fixes.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Replace os.getenv() with get_env_value() in hermes_cli/auth.py and hermes_cli/runtime_provider.py to fix the inconsistency in resolving base URLs from .env files.

Guidance

Identify the locations where os.getenv() is used for resolving base URLs in hermes_cli/auth.py and hermes_cli/runtime_provider.py.
Replace these instances with get_env_value() to ensure consistency with API key resolution.
Verify that the change resolves the issue by testing with a custom base URL set in .env and not exported in the shell.
Review previous fixes (PR #16101 and PR #17434) to understand the context and ensure the change is applied correctly.

Example

# auth.py — resolve_api_key_provider_credentials()
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip()

# runtime_provider.py — same change
env_url = ""
if pconfig.base_url_env_var:
    env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")

Notes

This fix assumes that get_env_value() is correctly implemented to read from .env files. If this is not the case, additional changes may be required.

Recommendation

Apply the workaround by replacing os.getenv() with get_env_value() in the specified locations, as this is a direct fix for the identified issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug]: resolve_api_key_provider_credentials() uses os.getenv for base_url_env_var — misses ~/.hermes/.env values [19 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback

Root Cause

Fix Action

Fixed

PR fix notes

PR #17246: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

Issues resolved

Files modified

Commit list

Changed files

PR #18797: fix(auth): use get_env_value for base_url_env_var instead of os.getenv (#18757)

Description (problem / solution / changelog)

Fix #18757: Use get_env_value() for base_url_env_var instead of os.getenv()

Problem

Fix

Testing

Changed files

PR #18908: fix: use get_env_value() for base_url_env_var resolution

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Same Bug Class

Scope

Changed files

PR #18910: fix(doctor): read env vars from .env and default to China DashScope endpoint

Description (problem / solution / changelog)

Summary

Bug 1: env vars from .env invisible to doctor

Bug 2: DashScope default URL is international-only

Same Bug Class

Testing

Scope

Changed files

PR #18948: fix(auth): resolve base_url_env_var via get_env_value everywhere (closes #18757)

Description (problem / solution / changelog)

Closes #18757

Why this PR rather than #17246

Behavior

Tests

Files changed

Changed files

PR #18788: fix(web/dashboard): skip xterm.js WebGL renderer on Safari to fix Unicode box-drawing glyphs

Description (problem / solution / changelog)

Summary

Fix

Why not the issue's proposed rendererType: 'dom'?

Verification

What I did NOT change

Changed files

PR #17349: fix(compressor): shrink protect_first_n on recompaction (#17344)

Description (problem / solution / changelog)

Bug

Root cause

Fix

Detection signal

Why not just strengthen SUMMARY_PREFIX?

Coordination with PR #17301 / #17251

Out of scope

Tests

Changed files

PR #17329: fix(delegate): surface tool_trace on N-API-call subagent timeouts (#17308)

Description (problem / solution / changelog)

Problem

Fix

1. Extract a shared trace builder

2. Reconstruct trace on the N-API-call timeout branch

3. Surface the diagnostics

Tests

Changed files

PR #17325: fix(telegram): stop large videos from triggering infinite model fallback (#17302)

Description (problem / solution / changelog)

Summary

1. Pre-check file_size before downloading

Fix #18757: Use `get_env_value()` for `base_url_env_var` instead of `os.getenv()`

Why not the issue's proposed `rendererType: 'dom'`?

Why not just strengthen `SUMMARY_PREFIX`?

1. Pre-check `file_size` before downloading

2. Trap "File is too big" inside the `except` block

3. Optional opt-out: `telegram.extra.ignore_videos: true`