hermes - ✅(Solved) Fix cron: jobs with null next_run_at silently skipped; non-dict origin crashes ticker [4 pull requests, 1 comments, 2 participants]

liyoungc · 2026-05-02T07:54:59Z

[hermes] PR 18735: fix: resolve 7 identified issues automated - Repository: NousResearch/hermes-agent - Author: Sldark23 - State: open | merged: False - Link:… # PR #18735: fix: resolve 7 identified issues [automated] - Repository: NousResearch/hermes-agent - Author: Sldark23 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/18735 ## Description (problem / solution / changelog) ## Summary This automated PR resolves 7 identified upstream issues focusing on reliability, cross-platform behavior, and security hardening. ## Resolved issues 1. **#18722** — cron jobs with `next_run_at: null` now recover for recurring schedules; scheduler now tolerates non-dict `origin` values. - Files: `cron/jobs.py`, `cron/scheduler.py`, `tests/cron/test_jobs.py`, `tests/cron/test_scheduler.py` 2. **#18705** — dotenv loading no longer overrides runtime-injected environment variables. - Files: `hermes_cli/env_loader.py`, `tests/hermes_cli/test_env_loader.py` 3. **#18659** — `scan_skill_commands` no longer clears cached commands before a successful rescan. - Files: `agent/skill_commands.py` 4. **#18675** — skill fallback file scan now skips heavy dependency directories and enforces a file cap. - Files: `agent/skill_commands.py` 5. **#18617** — context compressor now synchronizes `threshold_percent` correctly across model switch, fallback activation, and primary restoration. - Files: `run_agent.py` 6. **#18681** — custom provider `/model` path now correctly carries provider credentials during model verification path in this branch baseline (already included in upstream branch state; preserved in final branch history). - Files: `gateway/run.py` (resolved in branch baseline) 7. **#18707** — request debug dumps are now redacted before writing to disk/stdout to avoid plaintext secret leakage. - Files: `run_agent.py` ## Validation - `python3 -m py_compile run_agent.py cron/jobs.py cron/scheduler.py hermes_cli/env_loader.py agent/skill_commands.py gateway/run.py` - `pytest -n 0 tests/hermes_cli/test_env_loader.py tests/gateway/test_model_command_custom_providers.py tests/cron/test_jobs.py::TestGetDueJobs::test_broken_cron_without_next_run_is_recovered tests/cron/test_scheduler.py::TestResolveOrigin::test_non_dict_origin_tolerated tests/agent/test_skill_commands.py tests/agent/test_skill_commands_reload.py` ## Changed files - `Dockerfile` (modified, +3/-2) - `acp_adapter/session.py` (modified, +12/-0) - `agent/auxiliary_client.py` (modified, +280/-28) - `agent/context_compressor.py` (modified, +496/-52) - `agent/skill_commands.py` (modified, +18/-4) - `agent/title_generator.py` (modified, +2/-2) - `agent/transports/chat_completions.py` (modified, +14/-0) - `agent/usage_pricing.py` (modified, +4/-0) - `cli-config.yaml.example` (modified, +5/-0) - `cli.py` (modified, +27/-3) - `cron/jobs.py` (modified, +13/-2) - `cron/scheduler.py` (modified, +14/-4) - `docker/entrypoint.sh` (modified, +9/-1) - `gateway/channel_directory.py` (modified, +14/-4) - `gateway/platforms/discord.py` (modified, +33/-7) - `gateway/platforms/email.py` (modified, +12/-2) - `gateway/platforms/feishu.py` (modified, +34/-1) - `gateway/platforms/qqbot/adapter.py` (modified, +8/-2) - `gateway/platforms/telegram_network.py` (modified, +7/-2) - `gateway/platforms/weixin.py` (modified, +10/-1) - `gateway/run.py` (modified, +129/-32) - `gateway/status.py` (modified, +8/-1) - `hermes_cli/auth.py` (modified, +2/-2) - `hermes_cli/commands.py` (modified, +1/-1) - `hermes_cli/config.py` (modified, +271/-40) - `hermes_cli/copilot_auth.py` (modified, +1/-1) - `hermes_cli/doctor.py` (modified, +6/-1) - `hermes_cli/env_loader.py` (modified, +5/-4) - `hermes_cli/gateway.py` (modified, +16/-13) - `hermes_cli/main.py` (modified, +69/-3) - `hermes_cli/memory_setup.py` (modified, +1/-1) - `hermes_cli/model_switch.py` (modified, +6/-1) - `hermes_cli/models.py` (modified, +60/-2) - `hermes_cli/profiles.py` (modified, +16/-3) - `hermes_cli/runtime_provider.py` (modified, +16/-13) - `hermes_cli/setup.py` (modified, +8/-2) - `hermes_cli/slack_cli.py` (modified, +1/-2) - `hermes_cli/status.py` (modified, +17/-2) - `hermes_cli/web_server.py` (modified, +1/-1) - `hermes_constants.py` (modified, +16/-3) - `model_tools.py` (modified, +44/-13) - `run_agent.py` (modified, +408/-84) - `setup-hermes.sh` (modified, +23/-12) - `skills/red-teaming/godmode/scripts/load_godmode.py` (modified, +9/-8) - `tests/agent/test_context_compressor.py` (modified, +389/-0) - `tests/agent/transports/test_chat_completions.py` (modified, +11/-0) - `tests/cron/test_jobs.py` (modified, +26/-0) - `tests/cron/test_scheduler.py` (modified, +4/-0) - `tests/gateway/test_compress_command.py` (modified, +49/-0) - `tests/hermes_cli/test_api_key_providers.py` (modified, +5/-5) - `tests/hermes_cli/test_config.py` (modified, +17/-0) - `tests/hermes_cli/test_env_loader.py` (modified, +6/-6) - `tests/run_agent/test_413_compression.py` (modified, +81/-1) - `tests/run_agent/test_compression_boundary_hook

hermes2026-05-02 07:54:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18722•Fetched 2026-05-03 04:54:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liyoungc

Participants

alt-glitch

liyoungc

Timeline (top)

cross-referenced ×4labeled ×3commented ×1referenced ×1

Error Message

ERROR cron.scheduler: Error processing job <id>: 'str' object has no attribute 'get' …on every fire attempt. Job's last_status: error, last_error: "'str' object has no attribute 'get'". mark_job_run does record the failure, but every subsequent fire crashes the same way until origin is fixed manually.

Root Cause

Cause: cron/scheduler.py:127 _resolve_origin() does origin.get("platform") on whatever job.get("origin") returns. The function checks if not origin (falsy short-circuit), but a non-empty string passes that guard and then hits AttributeError. In practice this happened because a migration script tagged jobs with a free-form provenance string (e.g. "combined-digest-replaces-x-ai-and-email-triage-20260503") instead of either null or {platform, chat_id}.

Fix Action

Fix / Workaround

Cause: cron/jobs.py get_due_jobs() (around the loop at L794–L834) only attempts recovery via _recoverable_oneshot_run_at(), which is hard-gated to kind: once. For recurring kinds, the helper returns None → continue → job is silently skipped on every tick. The loader assumes the only path into jobs.json is add_job(), which populates next_run_at via compute_next_run() at line 526. Any external writer (jq, a migration script, the dashboard's REST patch endpoint that forgets to set the field, etc.) that creates a recurring entry without that field leaves the job unfireable.

Fix: when the schedule is cron / interval and next_run_at is missing, recompute via compute_next_run(schedule, now.isoformat()) instead of returning None. The existing one-shot grace-window path is untouched. Patch + tests below.

Fix: add isinstance(origin, dict) guard; non-dict origin (string, list, int…) is treated the same as missing origin. Patch + tests below.

PR fix notes

PR #18735: fix: resolve 7 identified issues [automated]

Repository: NousResearch/hermes-agent
Author: Sldark23
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18735

Description (problem / solution / changelog)

Summary

This automated PR resolves 7 identified upstream issues focusing on reliability, cross-platform behavior, and security hardening.

Resolved issues

#18722 — cron jobs with next_run_at: null now recover for recurring schedules; scheduler now tolerates non-dict origin values.
- Files: cron/jobs.py, cron/scheduler.py, tests/cron/test_jobs.py, tests/cron/test_scheduler.py
#18705 — dotenv loading no longer overrides runtime-injected environment variables.
- Files: hermes_cli/env_loader.py, tests/hermes_cli/test_env_loader.py
#18659 — scan_skill_commands no longer clears cached commands before a successful rescan.
- Files: agent/skill_commands.py
#18675 — skill fallback file scan now skips heavy dependency directories and enforces a file cap.
- Files: agent/skill_commands.py
#18617 — context compressor now synchronizes threshold_percent correctly across model switch, fallback activation, and primary restoration.
- Files: run_agent.py
#18681 — custom provider /model path now correctly carries provider credentials during model verification path in this branch baseline (already included in upstream branch state; preserved in final branch history).
- Files: gateway/run.py (resolved in branch baseline)
#18707 — request debug dumps are now redacted before writing to disk/stdout to avoid plaintext secret leakage.
- Files: run_agent.py

Validation

python3 -m py_compile run_agent.py cron/jobs.py cron/scheduler.py hermes_cli/env_loader.py agent/skill_commands.py gateway/run.py
pytest -n 0 tests/hermes_cli/test_env_loader.py tests/gateway/test_model_command_custom_providers.py tests/cron/test_jobs.py::TestGetDueJobs::test_broken_cron_without_next_run_is_recovered tests/cron/test_scheduler.py::TestResolveOrigin::test_non_dict_origin_tolerated tests/agent/test_skill_commands.py tests/agent/test_skill_commands_reload.py

Changed files

Dockerfile (modified, +3/-2)
acp_adapter/session.py (modified, +12/-0)
agent/auxiliary_client.py (modified, +280/-28)
agent/context_compressor.py (modified, +496/-52)
agent/skill_commands.py (modified, +18/-4)
agent/title_generator.py (modified, +2/-2)
agent/transports/chat_completions.py (modified, +14/-0)
agent/usage_pricing.py (modified, +4/-0)
cli-config.yaml.example (modified, +5/-0)
cli.py (modified, +27/-3)
cron/jobs.py (modified, +13/-2)
cron/scheduler.py (modified, +14/-4)
docker/entrypoint.sh (modified, +9/-1)
gateway/channel_directory.py (modified, +14/-4)
gateway/platforms/discord.py (modified, +33/-7)
gateway/platforms/email.py (modified, +12/-2)
gateway/platforms/feishu.py (modified, +34/-1)
gateway/platforms/qqbot/adapter.py (modified, +8/-2)
gateway/platforms/telegram_network.py (modified, +7/-2)
gateway/platforms/weixin.py (modified, +10/-1)
gateway/run.py (modified, +129/-32)
gateway/status.py (modified, +8/-1)
hermes_cli/auth.py (modified, +2/-2)
hermes_cli/commands.py (modified, +1/-1)
hermes_cli/config.py (modified, +271/-40)
hermes_cli/copilot_auth.py (modified, +1/-1)
hermes_cli/doctor.py (modified, +6/-1)
hermes_cli/env_loader.py (modified, +5/-4)
hermes_cli/gateway.py (modified, +16/-13)
hermes_cli/main.py (modified, +69/-3)
hermes_cli/memory_setup.py (modified, +1/-1)
hermes_cli/model_switch.py (modified, +6/-1)
hermes_cli/models.py (modified, +60/-2)
hermes_cli/profiles.py (modified, +16/-3)
hermes_cli/runtime_provider.py (modified, +16/-13)
hermes_cli/setup.py (modified, +8/-2)
hermes_cli/slack_cli.py (modified, +1/-2)
hermes_cli/status.py (modified, +17/-2)
hermes_cli/web_server.py (modified, +1/-1)
hermes_constants.py (modified, +16/-3)
model_tools.py (modified, +44/-13)
run_agent.py (modified, +408/-84)
setup-hermes.sh (modified, +23/-12)
skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
tests/agent/test_context_compressor.py (modified, +389/-0)
tests/agent/transports/test_chat_completions.py (modified, +11/-0)
tests/cron/test_jobs.py (modified, +26/-0)
tests/cron/test_scheduler.py (modified, +4/-0)
tests/gateway/test_compress_command.py (modified, +49/-0)
tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
tests/hermes_cli/test_config.py (modified, +17/-0)
tests/hermes_cli/test_env_loader.py (modified, +6/-6)
tests/run_agent/test_413_compression.py (modified, +81/-1)
tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
tests/run_agent/test_run_agent.py (modified, +100/-13)
tests/tools/test_skill_manager_tool.py (modified, +270/-0)
tools/approval.py (modified, +1/-1)
tools/delegate_tool.py (modified, +4/-1)
tools/environments/docker.py (modified, +36/-5)
tools/environments/local.py (modified, +8/-1)
tools/file_operations.py (modified, +70/-67)
tools/file_tools.py (modified, +13/-2)
tools/send_message_tool.py (modified, +72/-2)
tools/session_search_tool.py (modified, +2/-2)
tools/skill_manager_tool.py (modified, +82/-21)
tools/skills_tool.py (modified, +13/-1)
tools/terminal_tool.py (modified, +6/-0)
tools/tool_backend_helpers.py (modified, +15/-5)
tools/tts_tool.py (modified, +27/-16)
tools/voice_mode.py (modified, +23/-10)
toolsets.py (modified, +14/-1)
tui_gateway/server.py (modified, +5/-3)
ui-tui/src/app/turnController.ts (modified, +1/-1)
ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
ui-tui/src/gatewayTypes.ts (modified, +1/-0)
utils.py (modified, +9/-0)
uv.lock (modified, +161/-2)
website/docs/reference/environment-variables.md (modified, +1/-1)

PR #17246: fix: resolve 7 identified issues [automated]

Repository: NousResearch/hermes-agent
Author: Sldark23
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17246

Description (problem / solution / changelog)

Summary

This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in NousResearch/hermes-agent.

Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass.

Issues resolved

#18757 - resolve_api_key_provider_credentials() misses ~/.hermes/.env for base_url_env_var
- Replaced os.getenv(...) with get_env_value(...) in API-key provider credential resolution.
- Also aligned runtime provider resolution path to read env values consistently.
#18705 - load_hermes_dotenv() overrides runtime env vars (override=True)
- Switched user env loading to override=False so runtime-injected env vars keep precedence.
- Updated function docstring behavior notes accordingly.
#18722 - Cron jobs with next_run_at: null skipped forever; non-dict origin crash
- Added recovery for recurring cron/interval jobs by recomputing next_run_at.
- Hardened _resolve_origin() to tolerate non-dict origin payloads.
#18742 - Kimi/Moonshot via aggregators misses reasoning-mode detection
- _needs_kimi_tool_reasoning() now also detects Moonshot/Kimi model slugs via is_moonshot_model(...).
#18744 - constraints_path dead config (not loaded)
- Implemented optional loading of constraints_path content into system prompt composition.
#18778 - Gateway scoped lock stale detection no-op on macOS/Windows
- Added cross-platform process start time/cmdline detection using psutil fallback.
- Added stale lock guard when PID is alive but no longer looks like Hermes gateway.

Files modified

hermes_cli/auth.py
hermes_cli/runtime_provider.py
hermes_cli/env_loader.py
cron/jobs.py
cron/scheduler.py
run_agent.py
gateway/status.py

Commit list

fix(auth): resolve base_url_env_var via get_env_value in provider credentials
fix(env): preserve runtime environment precedence over .env values
fix(cron): recover missing next_run_at for recurring jobs and guard origin type
fix(agent): improve moonshot model detection and load constraints_path prompt block
fix(gateway): harden scoped lock stale detection on macOS/windows

Changed files

Dockerfile (modified, +3/-2)
acp_adapter/session.py (modified, +12/-0)
agent/auxiliary_client.py (modified, +280/-28)
agent/context_compressor.py (modified, +496/-52)
agent/title_generator.py (modified, +2/-2)
agent/transports/chat_completions.py (modified, +14/-0)
agent/usage_pricing.py (modified, +4/-0)
cli-config.yaml.example (modified, +5/-0)
cli.py (modified, +27/-3)
cron/jobs.py (modified, +10/-2)
cron/scheduler.py (modified, +14/-4)
docker/entrypoint.sh (modified, +9/-1)
gateway/channel_directory.py (modified, +14/-4)
gateway/platforms/discord.py (modified, +33/-7)
gateway/platforms/email.py (modified, +12/-2)
gateway/platforms/feishu.py (modified, +34/-1)
gateway/platforms/qqbot/adapter.py (modified, +8/-2)
gateway/platforms/telegram_network.py (modified, +7/-2)
gateway/platforms/weixin.py (modified, +10/-1)
gateway/run.py (modified, +129/-32)
gateway/status.py (modified, +37/-2)
hermes_cli/auth.py (modified, +4/-4)
hermes_cli/commands.py (modified, +1/-1)
hermes_cli/config.py (modified, +271/-40)
hermes_cli/copilot_auth.py (modified, +1/-1)
hermes_cli/doctor.py (modified, +6/-1)
hermes_cli/env_loader.py (modified, +5/-4)
hermes_cli/gateway.py (modified, +16/-13)
hermes_cli/main.py (modified, +69/-3)
hermes_cli/memory_setup.py (modified, +1/-1)
hermes_cli/model_switch.py (modified, +6/-1)
hermes_cli/models.py (modified, +60/-2)
hermes_cli/profiles.py (modified, +16/-3)
hermes_cli/runtime_provider.py (modified, +17/-14)
hermes_cli/setup.py (modified, +8/-2)
hermes_cli/slack_cli.py (modified, +1/-2)
hermes_cli/status.py (modified, +17/-2)
hermes_cli/web_server.py (modified, +1/-1)
hermes_constants.py (modified, +16/-3)
model_tools.py (modified, +44/-13)
run_agent.py (modified, +413/-82)
setup-hermes.sh (modified, +23/-12)
skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
tests/agent/test_context_compressor.py (modified, +389/-0)
tests/agent/transports/test_chat_completions.py (modified, +11/-0)
tests/gateway/test_compress_command.py (modified, +49/-0)
tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
tests/hermes_cli/test_config.py (modified, +17/-0)
tests/run_agent/test_413_compression.py (modified, +81/-1)
tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
tests/run_agent/test_run_agent.py (modified, +100/-13)
tests/tools/test_skill_manager_tool.py (modified, +270/-0)
tools/approval.py (modified, +1/-1)
tools/delegate_tool.py (modified, +4/-1)
tools/environments/docker.py (modified, +36/-5)
tools/environments/local.py (modified, +8/-1)
tools/file_operations.py (modified, +70/-67)
tools/file_tools.py (modified, +13/-2)
tools/send_message_tool.py (modified, +72/-2)
tools/session_search_tool.py (modified, +2/-2)
tools/skill_manager_tool.py (modified, +82/-21)
tools/skills_tool.py (modified, +13/-1)
tools/terminal_tool.py (modified, +6/-0)
tools/tool_backend_helpers.py (modified, +15/-5)
tools/tts_tool.py (modified, +27/-16)
tools/voice_mode.py (modified, +23/-10)
toolsets.py (modified, +14/-1)
tui_gateway/server.py (modified, +5/-3)
ui-tui/src/app/turnController.ts (modified, +1/-1)
ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
ui-tui/src/gatewayTypes.ts (modified, +1/-0)
utils.py (modified, +9/-0)
uv.lock (modified, +161/-2)
website/docs/reference/environment-variables.md (modified, +1/-1)

PR #19013: fix(cron): treat non-dict origin as missing instead of crashing tick

Repository: NousResearch/hermes-agent
Author: Tranquil-Flow
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19013

Description (problem / solution / changelog)

What does this PR do?

_resolve_origin called origin.get('platform') on whatever job.get('origin') returned. The leading if not origin: return None short-circuited the falsy cases (None, empty dict, "") but a non-empty string passed that guard and crashed with AttributeError: 'str' object has no attribute 'get' on every fire attempt. Observed in the wild after a migration script tagged jobs with free-form provenance strings (e.g. "combined-digest-replaces-x-and-y-20260503").

mark_job_run did record last_status: error, last_error: "'str' object has no attribute 'get'" once, but the next tick re-loaded the same poisoned origin and crashed identically. The job stayed enabled and accumulated cascading errors until origin was patched manually.

Replace the falsy guard with isinstance(origin, dict). Non-dict origins (string, int, list, tuple, float — anything that survived a hand-edit, JSON-script write, or migration) are now treated the same as a missing origin: the job continues with deliver falling back through its normal home-channel path instead of crashing the scheduler loop.

Scope: the non-dict-origin crash sub-bug from #18722. The next_run_at: null recurring-job recovery (the second sub-bug) is independently addressed by the in-flight #18825, which extends the never-silently-disable defense from #16265 to get_due_jobs(). Either one can land first.

Related Issue

Fixes #18722 (non-dict origin crash; recurring-job recovery covered by #18825)

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

cron/scheduler.py — _resolve_origin guards isinstance(origin, dict) before .get() calls; updated docstring with the production trigger pattern.
tests/cron/test_scheduler.py — TestResolveOrigin.test_non_dict_origin_returns_none_instead_of_crashing parametrises over the non-dict shapes (str, int, list, tuple, float).

How to Test

Edit `~/.hermes/cron/jobs.json` and set a job's origin to a string like "my-migration-tag".
Restart the gateway / wait for the cron tick.
Before this PR: every fire crashes with AttributeError; after: job runs with default delivery routing.
`pytest tests/cron/test_scheduler.py::TestResolveOrigin -q` → 10/10 pass.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits
I searched for existing PRs (note: #18825 covers the sibling sub-bug; explicitly out of scope here)
My PR contains only changes related to this fix
I've run pytest tests/ -q and the touched suite passes
I've added tests for my changes
I've tested on my platform: macOS 15.x

Documentation & Housekeeping

I've updated relevant documentation — N/A
I've updated cli-config.yaml.example if I added/changed config keys — N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture — N/A
I've considered cross-platform impact — N/A
I've updated tool descriptions/schemas if I changed tool behavior — N/A

Changed files

cron/scheduler.py (modified, +12/-2)
tests/cron/test_scheduler.py (modified, +23/-0)

PR #19066: fix(cron): recover null next_run_at jobs and tolerate non-dict origin

Repository: NousResearch/hermes-agent
Author: EthanGuo-coder
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19066

Description (problem / solution / changelog)

Related work

References #18735 (open competing fix from an automated bulk PR touching 79 files). This PR is a focused single-issue contribution and adds the missing interval-recovery test variant.

What does this PR do?

Fixes two robustness gaps in the cron subsystem.

Bug 1 — silent skip of recoverable jobs. get_due_jobs() skipped any cron/interval job whose next_run_at was null (which happens when users hand-edit jobs.json, on partial migrations, or after a crash mid-write). Those jobs would never run again until the user re-saved them. The fix recomputes next_run_at via compute_next_run() for cron/interval schedules with a missing timestamp, persists the recovered value, and continues evaluation in the same tick.
Bug 2 — non-dict origin crashes the ticker. _resolve_origin() indexed origin["chat_id"] without checking type, so a string or any non-dict origin (legacy serialized jobs, hand-edited entries) raised TypeError and aborted the ticker for the rest of the cycle. The fix guards with isinstance(origin, dict) before key access. Pass-D codex review caught a related miss in _deliver_result(), which was doing its own raw dict check; that path now routes through _resolve_origin() so the tolerance is consistent across the file.

Related Issue

Fixes #18722

Type of Change

Bug fix (non-breaking change that fixes an issue)

Changes Made

cron/jobs.py — get_due_jobs() recovers cron/interval jobs with next_run_at is None by recomputing via compute_next_run() instead of returning early.
cron/scheduler.py — _resolve_origin() checks isinstance(origin, dict) before subscripting; _deliver_result() now delegates origin resolution to _resolve_origin() so non-dict origins do not crash result delivery.
tests/cron/test_jobs.py — new TestGetDueJobs::test_broken_cron_without_next_run_is_recovered and test_broken_interval_without_next_run_is_recovered cover both schedule kinds.
tests/cron/test_scheduler.py — new TestResolveOrigin::test_string_origin_is_tolerated and test_non_dict_origin_is_tolerated cover the type-guard.

How to Test

Check out this branch and install dev deps (pip install -e '.[dev]').

Run the four new regression tests directly:

pytest -o "addopts=" \
  tests/cron/test_jobs.py::TestGetDueJobs::test_broken_cron_without_next_run_is_recovered \
  tests/cron/test_jobs.py::TestGetDueJobs::test_broken_interval_without_next_run_is_recovered \
  tests/cron/test_scheduler.py::TestResolveOrigin::test_string_origin_is_tolerated \
  tests/cron/test_scheduler.py::TestResolveOrigin::test_non_dict_origin_is_tolerated

Run the cron subsystem suite to confirm no regressions: pytest tests/cron/ -q.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(cron): ..., test(cron): ...)
I searched for existing PRs to make sure this isn't a duplicate (see "Related work" above)
My PR contains only changes related to this fix (no unrelated commits)
I've run the cron subsystem tests (pytest tests/cron/ -q) and all 289 tests pass, including the 4 new regression tests
I've added tests for my changes
I've tested on my platform: Ubuntu 24.04

Documentation & Housekeeping

I've updated relevant documentation — N/A (internal robustness fix, no public surface changed)
I've updated cli-config.yaml.example if I added/changed config keys — N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact — N/A (no OS-specific code paths touched)
I've updated tool descriptions/schemas if I changed tool behavior — N/A

Changed files

cron/jobs.py (modified, +19/-2)
cron/scheduler.py (modified, +8/-3)
tests/cron/test_jobs.py (modified, +68/-0)
tests/cron/test_scheduler.py (modified, +8/-0)

Code Example

from cron.jobs import save_jobs, get_due_jobs, get_job
save_jobs([{
    "id": "repro",
    "name": "AI Daily Digest",
    "prompt": "...",
    "schedule": {"kind": "cron", "expr": "0 12 * * *", "display": "0 12 * * *"},
    "schedule_display": "0 12 * * *",
    "repeat": {"times": None, "completed": 0},
    "enabled": True,
    "state": "scheduled",
    "next_run_at": None, "last_run_at": None, "last_status": None,
    "last_error": None, "deliver": "local", "origin": None,
}])
get_due_jobs()  # returns [], next_run_at still None — and stays None forever

---

ERROR cron.scheduler: Error processing job <id>: 'str' object has no attribute 'get'

---

--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -795,17 +795,32 @@ def get_due_jobs() -> List[Dict[str, Any]]:
         if not job.get("enabled", True):
             continue

         next_run = job.get("next_run_at")
         if not next_run:
+            schedule = job.get("schedule", {})
+            kind = schedule.get("kind")
+
+            # One-shot jobs use a small grace window via the dedicated helper.
             recovered_next = _recoverable_oneshot_run_at(
-                job.get("schedule", {}),
+                schedule,
                 now,
                 last_run_at=job.get("last_run_at"),
             )
+            recovery_kind = "one-shot" if recovered_next else None
+
+            # Recurring jobs (cron / interval) reach here only when something
+            # — typically a direct jobs.json edit that bypassed add_job() —
+            # left next_run_at unset.  Without this branch, such jobs are
+            # silently skipped forever; recompute next_run_at from the
+            # schedule so they pick up at their next scheduled tick.
+            if not recovered_next and kind in ("cron", "interval"):
+                recovered_next = compute_next_run(schedule, now.isoformat())
+                if recovered_next:
+                    recovery_kind = kind
+
             if not recovered_next:
                 continue

             job["next_run_at"] = recovered_next
             next_run = recovered_next
             logger.info(
-                "Job '%s' had no next_run_at; recovering one-shot run at %s",
+                "Job '%s' had no next_run_at; recovering %s run at %s",
                 job.get("name", job["id"]),
+                recovery_kind,
                 recovered_next,
             )

---

--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -123,11 +123,18 @@ class _OutboundContextStub:

 def _resolve_origin(job: dict) -> Optional[dict]:
-    """Extract origin info from a job, preserving any extra routing metadata."""
+    """Extract origin info from a job, preserving any extra routing metadata.
+
+    ``origin`` is expected to be either ``None`` or a dict shaped like
+    ``{"platform": ..., "chat_id": ..., "thread_id": ...}``.  Tolerate
+    other shapes (most commonly: a free-form string identifier left by
+    a script that wrote jobs.json directly) by returning ``None`` rather
+    than crashing the whole tick with ``AttributeError``.
+    """
     origin = job.get("origin")
-    if not origin:
+    if not origin or not isinstance(origin, dict):
         return None
     platform = origin.get("platform")
     chat_id = origin.get("chat_id")
     if platform and chat_id:
         return origin
     return None

RAW_BUFFERClick to expand / collapse

Two related robustness gaps in the cron subsystem became visible when ops scripts wrote directly into ~/.hermes/cron/jobs.json instead of going through add_job() / dashboard / API. Both manifested in the same incident; reporting them together since the fix is small and shares one PR.

Bug 1 — `kind: cron` / `kind: interval` jobs with `next_run_at: null` are silently skipped forever

Symptom: Job appears in jobs.json with enabled: true, state: scheduled, next_run_at: null, last_run_at: null indefinitely. Other crons fire normally. No log entry indicates it's being skipped.

Repro:

from cron.jobs import save_jobs, get_due_jobs, get_job
save_jobs([{
    "id": "repro",
    "name": "AI Daily Digest",
    "prompt": "...",
    "schedule": {"kind": "cron", "expr": "0 12 * * *", "display": "0 12 * * *"},
    "schedule_display": "0 12 * * *",
    "repeat": {"times": None, "completed": 0},
    "enabled": True,
    "state": "scheduled",
    "next_run_at": None, "last_run_at": None, "last_status": None,
    "last_error": None, "deliver": "local", "origin": None,
}])
get_due_jobs()  # returns [], next_run_at still None — and stays None forever

Bug 2 — `_resolve_origin` crashes with `'str' object has no attribute 'get'` when `origin` is a string

Symptom:

ERROR cron.scheduler: Error processing job <id>: 'str' object has no attribute 'get'

…on every fire attempt. Job's last_status: error, last_error: "'str' object has no attribute 'get'". mark_job_run does record the failure, but every subsequent fire crashes the same way until origin is fixed manually.

Fix: add isinstance(origin, dict) guard; non-dict origin (string, list, int…) is treated the same as missing origin. Patch + tests below.

Patch

--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -795,17 +795,32 @@ def get_due_jobs() -> List[Dict[str, Any]]:
         if not job.get("enabled", True):
             continue

         next_run = job.get("next_run_at")
         if not next_run:
+            schedule = job.get("schedule", {})
+            kind = schedule.get("kind")
+
+            # One-shot jobs use a small grace window via the dedicated helper.
             recovered_next = _recoverable_oneshot_run_at(
-                job.get("schedule", {}),
+                schedule,
                 now,
                 last_run_at=job.get("last_run_at"),
             )
+            recovery_kind = "one-shot" if recovered_next else None
+
+            # Recurring jobs (cron / interval) reach here only when something
+            # — typically a direct jobs.json edit that bypassed add_job() —
+            # left next_run_at unset.  Without this branch, such jobs are
+            # silently skipped forever; recompute next_run_at from the
+            # schedule so they pick up at their next scheduled tick.
+            if not recovered_next and kind in ("cron", "interval"):
+                recovered_next = compute_next_run(schedule, now.isoformat())
+                if recovered_next:
+                    recovery_kind = kind
+
             if not recovered_next:
                 continue

             job["next_run_at"] = recovered_next
             next_run = recovered_next
             logger.info(
-                "Job '%s' had no next_run_at; recovering one-shot run at %s",
+                "Job '%s' had no next_run_at; recovering %s run at %s",
                 job.get("name", job["id"]),
+                recovery_kind,
                 recovered_next,
             )

--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -123,11 +123,18 @@ class _OutboundContextStub:

 def _resolve_origin(job: dict) -> Optional[dict]:
-    """Extract origin info from a job, preserving any extra routing metadata."""
+    """Extract origin info from a job, preserving any extra routing metadata.
+
+    ``origin`` is expected to be either ``None`` or a dict shaped like
+    ``{"platform": ..., "chat_id": ..., "thread_id": ...}``.  Tolerate
+    other shapes (most commonly: a free-form string identifier left by
+    a script that wrote jobs.json directly) by returning ``None`` rather
+    than crashing the whole tick with ``AttributeError``.
+    """
     origin = job.get("origin")
-    if not origin:
+    if not origin or not isinstance(origin, dict):
         return None
     platform = origin.get("platform")
     chat_id = origin.get("chat_id")
     if platform and chat_id:
         return origin
     return None

New tests

tests/cron/test_jobs.py::TestGetDueJobs:

test_broken_cron_without_next_run_is_recovered — cron-kind null next_run_at gets recomputed
test_broken_interval_without_next_run_is_recovered — same for interval

tests/cron/test_scheduler.py::TestResolveOrigin:

test_string_origin_is_tolerated — string origin returns None, no crash
test_non_dict_origin_is_tolerated — list/int origin returns None

All 289 existing cron tests still pass sequentially. (Two parallel-mode flakes under xdist are pre-existing and unrelated; same tests pass in isolation.)

Environment

hermes-agent commit: upstream/main as of 2026-05-02
Python 3.14, croniter installed
Encountered on a Docker deployment (Linux Debian, container running upstream image)

extent analysis

TL;DR

To fix the issues, update the cron/jobs.py and cron/scheduler.py files with the provided patches to handle next_run_at: null for recurring jobs and to tolerate non-dict origin values.

Guidance

Apply the patch: Update cron/jobs.py to recompute next_run_at for recurring jobs when it's missing, using the provided patch.
Add origin tolerance: Update cron/scheduler.py to tolerate non-dict origin values by adding an isinstance(origin, dict) guard.
Run new tests: Add and run the new tests (test_broken_cron_without_next_run_is_recovered, test_broken_interval_without_next_run_is_recovered, test_string_origin_is_tolerated, test_non_dict_origin_is_tolerated) to ensure the fixes work as expected.
Verify job execution: After applying the patches, verify that jobs with next_run_at: null are executed correctly and that jobs with non-dict origin values no longer crash the scheduler.

Example

The provided patch for cron/jobs.py demonstrates how to recompute next_run_at for recurring jobs:

if not recovered_next and kind in ("cron", "interval"):
    recovered_next = compute_next_run(schedule, now.isoformat())
    if recovered_next:
        recovery_kind = kind

Notes

The fixes assume that the compute_next_run function is correctly implemented and that the schedule dictionary contains the necessary information to compute the next run time.

Recommendation

Apply the workaround by updating the cron/jobs.py and cron/scheduler.py files with the provided patches, as this will fix the issues with recurring jobs and non-dict origin values.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix cron: jobs with null next_run_at silently skipped; non-dict origin crashes ticker [4 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #18735: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

Resolved issues

Validation

Changed files

PR #17246: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

Issues resolved

Files modified

Commit list

Changed files

PR #19013: fix(cron): treat non-dict origin as missing instead of crashing tick

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Changed files

PR #19066: fix(cron): recover null next_run_at jobs and tolerate non-dict origin

Description (problem / solution / changelog)

Related work

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Changed files

Code Example

Bug 1 — kind: cron / kind: interval jobs with next_run_at: null are silently skipped forever

Bug 2 — _resolve_origin crashes with 'str' object has no attribute 'get' when origin is a string

Patch

New tests

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug 1 — `kind: cron` / `kind: interval` jobs with `next_run_at: null` are silently skipped forever

Bug 2 — `_resolve_origin` crashes with `'str' object has no attribute 'get'` when `origin` is a string