hermes - ✅(Solved) Fix cron: jobs with null next_run_at silently skipped; non-dict origin crashes ticker [4 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18722Fetched 2026-05-03 04:54:42
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×4labeled ×3commented ×1referenced ×1

Error Message

ERROR cron.scheduler: Error processing job <id>: 'str' object has no attribute 'get' …on every fire attempt. Job's last_status: error, last_error: "'str' object has no attribute 'get'". mark_job_run does record the failure, but every subsequent fire crashes the same way until origin is fixed manually.

Root Cause

Cause: cron/scheduler.py:127 _resolve_origin() does origin.get("platform") on whatever job.get("origin") returns. The function checks if not origin (falsy short-circuit), but a non-empty string passes that guard and then hits AttributeError. In practice this happened because a migration script tagged jobs with a free-form provenance string (e.g. "combined-digest-replaces-x-ai-and-email-triage-20260503") instead of either null or {platform, chat_id}.

Fix Action

Fix / Workaround

Cause: cron/jobs.py get_due_jobs() (around the loop at L794–L834) only attempts recovery via _recoverable_oneshot_run_at(), which is hard-gated to kind: once. For recurring kinds, the helper returns Nonecontinue → job is silently skipped on every tick. The loader assumes the only path into jobs.json is add_job(), which populates next_run_at via compute_next_run() at line 526. Any external writer (jq, a migration script, the dashboard's REST patch endpoint that forgets to set the field, etc.) that creates a recurring entry without that field leaves the job unfireable.

Fix: when the schedule is cron / interval and next_run_at is missing, recompute via compute_next_run(schedule, now.isoformat()) instead of returning None. The existing one-shot grace-window path is untouched. Patch + tests below.

Fix: add isinstance(origin, dict) guard; non-dict origin (string, list, int…) is treated the same as missing origin. Patch + tests below.

PR fix notes

PR #18735: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

This automated PR resolves 7 identified upstream issues focusing on reliability, cross-platform behavior, and security hardening.

Resolved issues

  1. #18722 — cron jobs with next_run_at: null now recover for recurring schedules; scheduler now tolerates non-dict origin values.

    • Files: cron/jobs.py, cron/scheduler.py, tests/cron/test_jobs.py, tests/cron/test_scheduler.py
  2. #18705 — dotenv loading no longer overrides runtime-injected environment variables.

    • Files: hermes_cli/env_loader.py, tests/hermes_cli/test_env_loader.py
  3. #18659scan_skill_commands no longer clears cached commands before a successful rescan.

    • Files: agent/skill_commands.py
  4. #18675 — skill fallback file scan now skips heavy dependency directories and enforces a file cap.

    • Files: agent/skill_commands.py
  5. #18617 — context compressor now synchronizes threshold_percent correctly across model switch, fallback activation, and primary restoration.

    • Files: run_agent.py
  6. #18681 — custom provider /model path now correctly carries provider credentials during model verification path in this branch baseline (already included in upstream branch state; preserved in final branch history).

    • Files: gateway/run.py (resolved in branch baseline)
  7. #18707 — request debug dumps are now redacted before writing to disk/stdout to avoid plaintext secret leakage.

    • Files: run_agent.py

Validation

  • python3 -m py_compile run_agent.py cron/jobs.py cron/scheduler.py hermes_cli/env_loader.py agent/skill_commands.py gateway/run.py
  • pytest -n 0 tests/hermes_cli/test_env_loader.py tests/gateway/test_model_command_custom_providers.py tests/cron/test_jobs.py::TestGetDueJobs::test_broken_cron_without_next_run_is_recovered tests/cron/test_scheduler.py::TestResolveOrigin::test_non_dict_origin_tolerated tests/agent/test_skill_commands.py tests/agent/test_skill_commands_reload.py

Changed files

  • Dockerfile (modified, +3/-2)
  • acp_adapter/session.py (modified, +12/-0)
  • agent/auxiliary_client.py (modified, +280/-28)
  • agent/context_compressor.py (modified, +496/-52)
  • agent/skill_commands.py (modified, +18/-4)
  • agent/title_generator.py (modified, +2/-2)
  • agent/transports/chat_completions.py (modified, +14/-0)
  • agent/usage_pricing.py (modified, +4/-0)
  • cli-config.yaml.example (modified, +5/-0)
  • cli.py (modified, +27/-3)
  • cron/jobs.py (modified, +13/-2)
  • cron/scheduler.py (modified, +14/-4)
  • docker/entrypoint.sh (modified, +9/-1)
  • gateway/channel_directory.py (modified, +14/-4)
  • gateway/platforms/discord.py (modified, +33/-7)
  • gateway/platforms/email.py (modified, +12/-2)
  • gateway/platforms/feishu.py (modified, +34/-1)
  • gateway/platforms/qqbot/adapter.py (modified, +8/-2)
  • gateway/platforms/telegram_network.py (modified, +7/-2)
  • gateway/platforms/weixin.py (modified, +10/-1)
  • gateway/run.py (modified, +129/-32)
  • gateway/status.py (modified, +8/-1)
  • hermes_cli/auth.py (modified, +2/-2)
  • hermes_cli/commands.py (modified, +1/-1)
  • hermes_cli/config.py (modified, +271/-40)
  • hermes_cli/copilot_auth.py (modified, +1/-1)
  • hermes_cli/doctor.py (modified, +6/-1)
  • hermes_cli/env_loader.py (modified, +5/-4)
  • hermes_cli/gateway.py (modified, +16/-13)
  • hermes_cli/main.py (modified, +69/-3)
  • hermes_cli/memory_setup.py (modified, +1/-1)
  • hermes_cli/model_switch.py (modified, +6/-1)
  • hermes_cli/models.py (modified, +60/-2)
  • hermes_cli/profiles.py (modified, +16/-3)
  • hermes_cli/runtime_provider.py (modified, +16/-13)
  • hermes_cli/setup.py (modified, +8/-2)
  • hermes_cli/slack_cli.py (modified, +1/-2)
  • hermes_cli/status.py (modified, +17/-2)
  • hermes_cli/web_server.py (modified, +1/-1)
  • hermes_constants.py (modified, +16/-3)
  • model_tools.py (modified, +44/-13)
  • run_agent.py (modified, +408/-84)
  • setup-hermes.sh (modified, +23/-12)
  • skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
  • tests/agent/test_context_compressor.py (modified, +389/-0)
  • tests/agent/transports/test_chat_completions.py (modified, +11/-0)
  • tests/cron/test_jobs.py (modified, +26/-0)
  • tests/cron/test_scheduler.py (modified, +4/-0)
  • tests/gateway/test_compress_command.py (modified, +49/-0)
  • tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
  • tests/hermes_cli/test_config.py (modified, +17/-0)
  • tests/hermes_cli/test_env_loader.py (modified, +6/-6)
  • tests/run_agent/test_413_compression.py (modified, +81/-1)
  • tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
  • tests/run_agent/test_run_agent.py (modified, +100/-13)
  • tests/tools/test_skill_manager_tool.py (modified, +270/-0)
  • tools/approval.py (modified, +1/-1)
  • tools/delegate_tool.py (modified, +4/-1)
  • tools/environments/docker.py (modified, +36/-5)
  • tools/environments/local.py (modified, +8/-1)
  • tools/file_operations.py (modified, +70/-67)
  • tools/file_tools.py (modified, +13/-2)
  • tools/send_message_tool.py (modified, +72/-2)
  • tools/session_search_tool.py (modified, +2/-2)
  • tools/skill_manager_tool.py (modified, +82/-21)
  • tools/skills_tool.py (modified, +13/-1)
  • tools/terminal_tool.py (modified, +6/-0)
  • tools/tool_backend_helpers.py (modified, +15/-5)
  • tools/tts_tool.py (modified, +27/-16)
  • tools/voice_mode.py (modified, +23/-10)
  • toolsets.py (modified, +14/-1)
  • tui_gateway/server.py (modified, +5/-3)
  • ui-tui/src/app/turnController.ts (modified, +1/-1)
  • ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
  • ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
  • ui-tui/src/gatewayTypes.ts (modified, +1/-0)
  • utils.py (modified, +9/-0)
  • uv.lock (modified, +161/-2)
  • website/docs/reference/environment-variables.md (modified, +1/-1)

PR #17246: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in NousResearch/hermes-agent.

Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass.

Issues resolved

  1. #18757 - resolve_api_key_provider_credentials() misses ~/.hermes/.env for base_url_env_var

    • Replaced os.getenv(...) with get_env_value(...) in API-key provider credential resolution.
    • Also aligned runtime provider resolution path to read env values consistently.
  2. #18705 - load_hermes_dotenv() overrides runtime env vars (override=True)

    • Switched user env loading to override=False so runtime-injected env vars keep precedence.
    • Updated function docstring behavior notes accordingly.
  3. #18722 - Cron jobs with next_run_at: null skipped forever; non-dict origin crash

    • Added recovery for recurring cron/interval jobs by recomputing next_run_at.
    • Hardened _resolve_origin() to tolerate non-dict origin payloads.
  4. #18742 - Kimi/Moonshot via aggregators misses reasoning-mode detection

    • _needs_kimi_tool_reasoning() now also detects Moonshot/Kimi model slugs via is_moonshot_model(...).
  5. #18744 - constraints_path dead config (not loaded)

    • Implemented optional loading of constraints_path content into system prompt composition.
  6. #18778 - Gateway scoped lock stale detection no-op on macOS/Windows

    • Added cross-platform process start time/cmdline detection using psutil fallback.
    • Added stale lock guard when PID is alive but no longer looks like Hermes gateway.

Files modified

  • hermes_cli/auth.py
  • hermes_cli/runtime_provider.py
  • hermes_cli/env_loader.py
  • cron/jobs.py
  • cron/scheduler.py
  • run_agent.py
  • gateway/status.py

Commit list

  • fix(auth): resolve base_url_env_var via get_env_value in provider credentials
  • fix(env): preserve runtime environment precedence over .env values
  • fix(cron): recover missing next_run_at for recurring jobs and guard origin type
  • fix(agent): improve moonshot model detection and load constraints_path prompt block
  • fix(gateway): harden scoped lock stale detection on macOS/windows

Changed files

  • Dockerfile (modified, +3/-2)
  • acp_adapter/session.py (modified, +12/-0)
  • agent/auxiliary_client.py (modified, +280/-28)
  • agent/context_compressor.py (modified, +496/-52)
  • agent/title_generator.py (modified, +2/-2)
  • agent/transports/chat_completions.py (modified, +14/-0)
  • agent/usage_pricing.py (modified, +4/-0)
  • cli-config.yaml.example (modified, +5/-0)
  • cli.py (modified, +27/-3)
  • cron/jobs.py (modified, +10/-2)
  • cron/scheduler.py (modified, +14/-4)
  • docker/entrypoint.sh (modified, +9/-1)
  • gateway/channel_directory.py (modified, +14/-4)
  • gateway/platforms/discord.py (modified, +33/-7)
  • gateway/platforms/email.py (modified, +12/-2)
  • gateway/platforms/feishu.py (modified, +34/-1)
  • gateway/platforms/qqbot/adapter.py (modified, +8/-2)
  • gateway/platforms/telegram_network.py (modified, +7/-2)
  • gateway/platforms/weixin.py (modified, +10/-1)
  • gateway/run.py (modified, +129/-32)
  • gateway/status.py (modified, +37/-2)
  • hermes_cli/auth.py (modified, +4/-4)
  • hermes_cli/commands.py (modified, +1/-1)
  • hermes_cli/config.py (modified, +271/-40)
  • hermes_cli/copilot_auth.py (modified, +1/-1)
  • hermes_cli/doctor.py (modified, +6/-1)
  • hermes_cli/env_loader.py (modified, +5/-4)
  • hermes_cli/gateway.py (modified, +16/-13)
  • hermes_cli/main.py (modified, +69/-3)
  • hermes_cli/memory_setup.py (modified, +1/-1)
  • hermes_cli/model_switch.py (modified, +6/-1)
  • hermes_cli/models.py (modified, +60/-2)
  • hermes_cli/profiles.py (modified, +16/-3)
  • hermes_cli/runtime_provider.py (modified, +17/-14)
  • hermes_cli/setup.py (modified, +8/-2)
  • hermes_cli/slack_cli.py (modified, +1/-2)
  • hermes_cli/status.py (modified, +17/-2)
  • hermes_cli/web_server.py (modified, +1/-1)
  • hermes_constants.py (modified, +16/-3)
  • model_tools.py (modified, +44/-13)
  • run_agent.py (modified, +413/-82)
  • setup-hermes.sh (modified, +23/-12)
  • skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
  • tests/agent/test_context_compressor.py (modified, +389/-0)
  • tests/agent/transports/test_chat_completions.py (modified, +11/-0)
  • tests/gateway/test_compress_command.py (modified, +49/-0)
  • tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
  • tests/hermes_cli/test_config.py (modified, +17/-0)
  • tests/run_agent/test_413_compression.py (modified, +81/-1)
  • tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
  • tests/run_agent/test_run_agent.py (modified, +100/-13)
  • tests/tools/test_skill_manager_tool.py (modified, +270/-0)
  • tools/approval.py (modified, +1/-1)
  • tools/delegate_tool.py (modified, +4/-1)
  • tools/environments/docker.py (modified, +36/-5)
  • tools/environments/local.py (modified, +8/-1)
  • tools/file_operations.py (modified, +70/-67)
  • tools/file_tools.py (modified, +13/-2)
  • tools/send_message_tool.py (modified, +72/-2)
  • tools/session_search_tool.py (modified, +2/-2)
  • tools/skill_manager_tool.py (modified, +82/-21)
  • tools/skills_tool.py (modified, +13/-1)
  • tools/terminal_tool.py (modified, +6/-0)
  • tools/tool_backend_helpers.py (modified, +15/-5)
  • tools/tts_tool.py (modified, +27/-16)
  • tools/voice_mode.py (modified, +23/-10)
  • toolsets.py (modified, +14/-1)
  • tui_gateway/server.py (modified, +5/-3)
  • ui-tui/src/app/turnController.ts (modified, +1/-1)
  • ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
  • ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
  • ui-tui/src/gatewayTypes.ts (modified, +1/-0)
  • utils.py (modified, +9/-0)
  • uv.lock (modified, +161/-2)
  • website/docs/reference/environment-variables.md (modified, +1/-1)

PR #19013: fix(cron): treat non-dict origin as missing instead of crashing tick

Description (problem / solution / changelog)

What does this PR do?

_resolve_origin called origin.get('platform') on whatever job.get('origin') returned. The leading if not origin: return None short-circuited the falsy cases (None, empty dict, "") but a non-empty string passed that guard and crashed with AttributeError: 'str' object has no attribute 'get' on every fire attempt. Observed in the wild after a migration script tagged jobs with free-form provenance strings (e.g. "combined-digest-replaces-x-and-y-20260503").

mark_job_run did record last_status: error, last_error: "'str' object has no attribute 'get'" once, but the next tick re-loaded the same poisoned origin and crashed identically. The job stayed enabled and accumulated cascading errors until origin was patched manually.

Replace the falsy guard with isinstance(origin, dict). Non-dict origins (string, int, list, tuple, float — anything that survived a hand-edit, JSON-script write, or migration) are now treated the same as a missing origin: the job continues with deliver falling back through its normal home-channel path instead of crashing the scheduler loop.

Scope: the non-dict-origin crash sub-bug from #18722. The next_run_at: null recurring-job recovery (the second sub-bug) is independently addressed by the in-flight #18825, which extends the never-silently-disable defense from #16265 to get_due_jobs(). Either one can land first.

Related Issue

Fixes #18722 (non-dict origin crash; recurring-job recovery covered by #18825)

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • cron/scheduler.py_resolve_origin guards isinstance(origin, dict) before .get() calls; updated docstring with the production trigger pattern.
  • tests/cron/test_scheduler.pyTestResolveOrigin.test_non_dict_origin_returns_none_instead_of_crashing parametrises over the non-dict shapes (str, int, list, tuple, float).

How to Test

  1. Edit `~/.hermes/cron/jobs.json` and set a job's origin to a string like "my-migration-tag".
  2. Restart the gateway / wait for the cron tick.
  3. Before this PR: every fire crashes with AttributeError; after: job runs with default delivery routing.
  4. `pytest tests/cron/test_scheduler.py::TestResolveOrigin -q` → 10/10 pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs (note: #18825 covers the sibling sub-bug; explicitly out of scope here)
  • My PR contains only changes related to this fix
  • I've run pytest tests/ -q and the touched suite passes
  • I've added tests for my changes
  • I've tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I've updated relevant documentation — N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture — N/A
  • I've considered cross-platform impact — N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Changed files

  • cron/scheduler.py (modified, +12/-2)
  • tests/cron/test_scheduler.py (modified, +23/-0)

PR #19066: fix(cron): recover null next_run_at jobs and tolerate non-dict origin

Description (problem / solution / changelog)

Related work

References #18735 (open competing fix from an automated bulk PR touching 79 files). This PR is a focused single-issue contribution and adds the missing interval-recovery test variant.

What does this PR do?

Fixes two robustness gaps in the cron subsystem.

  • Bug 1 — silent skip of recoverable jobs. get_due_jobs() skipped any cron/interval job whose next_run_at was null (which happens when users hand-edit jobs.json, on partial migrations, or after a crash mid-write). Those jobs would never run again until the user re-saved them. The fix recomputes next_run_at via compute_next_run() for cron/interval schedules with a missing timestamp, persists the recovered value, and continues evaluation in the same tick.
  • Bug 2 — non-dict origin crashes the ticker. _resolve_origin() indexed origin["chat_id"] without checking type, so a string or any non-dict origin (legacy serialized jobs, hand-edited entries) raised TypeError and aborted the ticker for the rest of the cycle. The fix guards with isinstance(origin, dict) before key access. Pass-D codex review caught a related miss in _deliver_result(), which was doing its own raw dict check; that path now routes through _resolve_origin() so the tolerance is consistent across the file.

Related Issue

Fixes #18722

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Changes Made

  • cron/jobs.pyget_due_jobs() recovers cron/interval jobs with next_run_at is None by recomputing via compute_next_run() instead of returning early.
  • cron/scheduler.py_resolve_origin() checks isinstance(origin, dict) before subscripting; _deliver_result() now delegates origin resolution to _resolve_origin() so non-dict origins do not crash result delivery.
  • tests/cron/test_jobs.py — new TestGetDueJobs::test_broken_cron_without_next_run_is_recovered and test_broken_interval_without_next_run_is_recovered cover both schedule kinds.
  • tests/cron/test_scheduler.py — new TestResolveOrigin::test_string_origin_is_tolerated and test_non_dict_origin_is_tolerated cover the type-guard.

How to Test

  1. Check out this branch and install dev deps (pip install -e '.[dev]').
  2. Run the four new regression tests directly:
    pytest -o "addopts=" \
      tests/cron/test_jobs.py::TestGetDueJobs::test_broken_cron_without_next_run_is_recovered \
      tests/cron/test_jobs.py::TestGetDueJobs::test_broken_interval_without_next_run_is_recovered \
      tests/cron/test_scheduler.py::TestResolveOrigin::test_string_origin_is_tolerated \
      tests/cron/test_scheduler.py::TestResolveOrigin::test_non_dict_origin_is_tolerated
  3. Run the cron subsystem suite to confirm no regressions: pytest tests/cron/ -q.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(cron): ..., test(cron): ...)
  • I searched for existing PRs to make sure this isn't a duplicate (see "Related work" above)
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run the cron subsystem tests (pytest tests/cron/ -q) and all 289 tests pass, including the 4 new regression tests
  • I've added tests for my changes
  • I've tested on my platform: Ubuntu 24.04

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (internal robustness fix, no public surface changed)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact — N/A (no OS-specific code paths touched)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Changed files

  • cron/jobs.py (modified, +19/-2)
  • cron/scheduler.py (modified, +8/-3)
  • tests/cron/test_jobs.py (modified, +68/-0)
  • tests/cron/test_scheduler.py (modified, +8/-0)

Code Example

from cron.jobs import save_jobs, get_due_jobs, get_job
save_jobs([{
    "id": "repro",
    "name": "AI Daily Digest",
    "prompt": "...",
    "schedule": {"kind": "cron", "expr": "0 12 * * *", "display": "0 12 * * *"},
    "schedule_display": "0 12 * * *",
    "repeat": {"times": None, "completed": 0},
    "enabled": True,
    "state": "scheduled",
    "next_run_at": None, "last_run_at": None, "last_status": None,
    "last_error": None, "deliver": "local", "origin": None,
}])
get_due_jobs()  # returns [], next_run_at still None — and stays None forever

---

ERROR cron.scheduler: Error processing job <id>: 'str' object has no attribute 'get'

---

--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -795,17 +795,32 @@ def get_due_jobs() -> List[Dict[str, Any]]:
         if not job.get("enabled", True):
             continue

         next_run = job.get("next_run_at")
         if not next_run:
+            schedule = job.get("schedule", {})
+            kind = schedule.get("kind")
+
+            # One-shot jobs use a small grace window via the dedicated helper.
             recovered_next = _recoverable_oneshot_run_at(
-                job.get("schedule", {}),
+                schedule,
                 now,
                 last_run_at=job.get("last_run_at"),
             )
+            recovery_kind = "one-shot" if recovered_next else None
+
+            # Recurring jobs (cron / interval) reach here only when something
+            # — typically a direct jobs.json edit that bypassed add_job()+            # left next_run_at unset.  Without this branch, such jobs are
+            # silently skipped forever; recompute next_run_at from the
+            # schedule so they pick up at their next scheduled tick.
+            if not recovered_next and kind in ("cron", "interval"):
+                recovered_next = compute_next_run(schedule, now.isoformat())
+                if recovered_next:
+                    recovery_kind = kind
+
             if not recovered_next:
                 continue

             job["next_run_at"] = recovered_next
             next_run = recovered_next
             logger.info(
-                "Job '%s' had no next_run_at; recovering one-shot run at %s",
+                "Job '%s' had no next_run_at; recovering %s run at %s",
                 job.get("name", job["id"]),
+                recovery_kind,
                 recovered_next,
             )

---

--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -123,11 +123,18 @@ class _OutboundContextStub:

 def _resolve_origin(job: dict) -> Optional[dict]:
-    """Extract origin info from a job, preserving any extra routing metadata."""
+    """Extract origin info from a job, preserving any extra routing metadata.
+
+    ``origin`` is expected to be either ``None`` or a dict shaped like
+    ``{"platform": ..., "chat_id": ..., "thread_id": ...}``.  Tolerate
+    other shapes (most commonly: a free-form string identifier left by
+    a script that wrote jobs.json directly) by returning ``None`` rather
+    than crashing the whole tick with ``AttributeError``.
+    """
     origin = job.get("origin")
-    if not origin:
+    if not origin or not isinstance(origin, dict):
         return None
     platform = origin.get("platform")
     chat_id = origin.get("chat_id")
     if platform and chat_id:
         return origin
     return None
RAW_BUFFERClick to expand / collapse

Two related robustness gaps in the cron subsystem became visible when ops scripts wrote directly into ~/.hermes/cron/jobs.json instead of going through add_job() / dashboard / API. Both manifested in the same incident; reporting them together since the fix is small and shares one PR.

Bug 1 — kind: cron / kind: interval jobs with next_run_at: null are silently skipped forever

Symptom: Job appears in jobs.json with enabled: true, state: scheduled, next_run_at: null, last_run_at: null indefinitely. Other crons fire normally. No log entry indicates it's being skipped.

Cause: cron/jobs.py get_due_jobs() (around the loop at L794–L834) only attempts recovery via _recoverable_oneshot_run_at(), which is hard-gated to kind: once. For recurring kinds, the helper returns Nonecontinue → job is silently skipped on every tick. The loader assumes the only path into jobs.json is add_job(), which populates next_run_at via compute_next_run() at line 526. Any external writer (jq, a migration script, the dashboard's REST patch endpoint that forgets to set the field, etc.) that creates a recurring entry without that field leaves the job unfireable.

Repro:

from cron.jobs import save_jobs, get_due_jobs, get_job
save_jobs([{
    "id": "repro",
    "name": "AI Daily Digest",
    "prompt": "...",
    "schedule": {"kind": "cron", "expr": "0 12 * * *", "display": "0 12 * * *"},
    "schedule_display": "0 12 * * *",
    "repeat": {"times": None, "completed": 0},
    "enabled": True,
    "state": "scheduled",
    "next_run_at": None, "last_run_at": None, "last_status": None,
    "last_error": None, "deliver": "local", "origin": None,
}])
get_due_jobs()  # returns [], next_run_at still None — and stays None forever

Fix: when the schedule is cron / interval and next_run_at is missing, recompute via compute_next_run(schedule, now.isoformat()) instead of returning None. The existing one-shot grace-window path is untouched. Patch + tests below.

Bug 2 — _resolve_origin crashes with 'str' object has no attribute 'get' when origin is a string

Symptom:

ERROR cron.scheduler: Error processing job <id>: 'str' object has no attribute 'get'

…on every fire attempt. Job's last_status: error, last_error: "'str' object has no attribute 'get'". mark_job_run does record the failure, but every subsequent fire crashes the same way until origin is fixed manually.

Cause: cron/scheduler.py:127 _resolve_origin() does origin.get("platform") on whatever job.get("origin") returns. The function checks if not origin (falsy short-circuit), but a non-empty string passes that guard and then hits AttributeError. In practice this happened because a migration script tagged jobs with a free-form provenance string (e.g. "combined-digest-replaces-x-ai-and-email-triage-20260503") instead of either null or {platform, chat_id}.

Fix: add isinstance(origin, dict) guard; non-dict origin (string, list, int…) is treated the same as missing origin. Patch + tests below.


Patch

--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -795,17 +795,32 @@ def get_due_jobs() -> List[Dict[str, Any]]:
         if not job.get("enabled", True):
             continue

         next_run = job.get("next_run_at")
         if not next_run:
+            schedule = job.get("schedule", {})
+            kind = schedule.get("kind")
+
+            # One-shot jobs use a small grace window via the dedicated helper.
             recovered_next = _recoverable_oneshot_run_at(
-                job.get("schedule", {}),
+                schedule,
                 now,
                 last_run_at=job.get("last_run_at"),
             )
+            recovery_kind = "one-shot" if recovered_next else None
+
+            # Recurring jobs (cron / interval) reach here only when something
+            # — typically a direct jobs.json edit that bypassed add_job() —
+            # left next_run_at unset.  Without this branch, such jobs are
+            # silently skipped forever; recompute next_run_at from the
+            # schedule so they pick up at their next scheduled tick.
+            if not recovered_next and kind in ("cron", "interval"):
+                recovered_next = compute_next_run(schedule, now.isoformat())
+                if recovered_next:
+                    recovery_kind = kind
+
             if not recovered_next:
                 continue

             job["next_run_at"] = recovered_next
             next_run = recovered_next
             logger.info(
-                "Job '%s' had no next_run_at; recovering one-shot run at %s",
+                "Job '%s' had no next_run_at; recovering %s run at %s",
                 job.get("name", job["id"]),
+                recovery_kind,
                 recovered_next,
             )
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -123,11 +123,18 @@ class _OutboundContextStub:

 def _resolve_origin(job: dict) -> Optional[dict]:
-    """Extract origin info from a job, preserving any extra routing metadata."""
+    """Extract origin info from a job, preserving any extra routing metadata.
+
+    ``origin`` is expected to be either ``None`` or a dict shaped like
+    ``{"platform": ..., "chat_id": ..., "thread_id": ...}``.  Tolerate
+    other shapes (most commonly: a free-form string identifier left by
+    a script that wrote jobs.json directly) by returning ``None`` rather
+    than crashing the whole tick with ``AttributeError``.
+    """
     origin = job.get("origin")
-    if not origin:
+    if not origin or not isinstance(origin, dict):
         return None
     platform = origin.get("platform")
     chat_id = origin.get("chat_id")
     if platform and chat_id:
         return origin
     return None

New tests

tests/cron/test_jobs.py::TestGetDueJobs:

  • test_broken_cron_without_next_run_is_recovered — cron-kind null next_run_at gets recomputed
  • test_broken_interval_without_next_run_is_recovered — same for interval

tests/cron/test_scheduler.py::TestResolveOrigin:

  • test_string_origin_is_tolerated — string origin returns None, no crash
  • test_non_dict_origin_is_tolerated — list/int origin returns None

All 289 existing cron tests still pass sequentially. (Two parallel-mode flakes under xdist are pre-existing and unrelated; same tests pass in isolation.)

Environment

  • hermes-agent commit: upstream/main as of 2026-05-02
  • Python 3.14, croniter installed
  • Encountered on a Docker deployment (Linux Debian, container running upstream image)

extent analysis

TL;DR

To fix the issues, update the cron/jobs.py and cron/scheduler.py files with the provided patches to handle next_run_at: null for recurring jobs and to tolerate non-dict origin values.

Guidance

  1. Apply the patch: Update cron/jobs.py to recompute next_run_at for recurring jobs when it's missing, using the provided patch.
  2. Add origin tolerance: Update cron/scheduler.py to tolerate non-dict origin values by adding an isinstance(origin, dict) guard.
  3. Run new tests: Add and run the new tests (test_broken_cron_without_next_run_is_recovered, test_broken_interval_without_next_run_is_recovered, test_string_origin_is_tolerated, test_non_dict_origin_is_tolerated) to ensure the fixes work as expected.
  4. Verify job execution: After applying the patches, verify that jobs with next_run_at: null are executed correctly and that jobs with non-dict origin values no longer crash the scheduler.

Example

The provided patch for cron/jobs.py demonstrates how to recompute next_run_at for recurring jobs:

if not recovered_next and kind in ("cron", "interval"):
    recovered_next = compute_next_run(schedule, now.isoformat())
    if recovered_next:
        recovery_kind = kind

Notes

The fixes assume that the compute_next_run function is correctly implemented and that the schedule dictionary contains the necessary information to compute the next run time.

Recommendation

Apply the workaround by updating the cron/jobs.py and cron/scheduler.py files with the provided patches, as this will fix the issues with recurring jobs and non-dict origin values.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix cron: jobs with null next_run_at silently skipped; non-dict origin crashes ticker [4 pull requests, 1 comments, 2 participants]