hermes - 💡(How to fix) Fix [Feature]: First-class Loop Contract — declarative budget / stop / refresh / scope for cron-backed agent loops

hermes2026-05-07 11:43:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

hermes cron create "15m" "Babysit PRs labeled codex-watch" \
  --name pr-babysitter \
  --skill pr-babysitter \
  --scope 'pull_requests.labels=codex-watch,exclude=main' \
  --budget 'max_attempts_per_target=1,max_runtime_min=20,max_files_changed=8,max_consecutive_failures=2' \
  --stop 'same_failure_seen_twice,merge_conflict_requires_human,tests_fail_after_one_fix' \
  --report 'destination=pr-comment,fields=summary,action_taken,tests_run,remaining_blocker' \
  --refresh 'git_fetch_before_run,read_latest_ci'

---

# ~/.hermes/cron/contracts/pr-babysitter.yaml
name: pr-babysitter
trigger:
  every: 15m
scope:
  include: { pull_requests: { labels: ["codex-watch"] } }
  exclude: { branches: [main] }
  writable_paths: ["**"]          # or specific globs
budget:
  max_attempts_per_target: 1
  max_runtime_min: 20
  max_files_changed: 8
  max_consecutive_failures: 2     # auto-pause after N runs fail
  max_spend_usd: 5.0              # optional, requires cost accounting
stop:
  - same_failure_seen_twice
  - merge_conflict_requires_human
  - tests_fail_after_one_fix
refresh:
  - git_fetch_before_run
  - read_latest_ci
report:
  destination: pr-comment          # or telegram, discord, local-file, etc.
  fields: [summary, action_taken, tests_run, remaining_blocker]

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Boris Cherny (Head of Claude Code) has been vocal in the last week on Sequoia's Training Data and "Why Coding Is Solved" podcasts that his personal workflow has shifted from one-shot agent sessions to dozens of persistent supervised loops:

"I just have a bunch of loops running at any time. I sort of feel like loops are the future at this point."

His production loops are mundane but load-bearing: PR babysitters (auto-rebase + CI fixes), CI health monitors (flaky test detection), Twitter/feedback clustering every 30 min. Anthropic has since shipped routines — a server-side equivalent so loops keep running with the laptop closed.

The Hermes equivalent already exists in pieces:

hermes cron + cronjob tool schedule recurring agent jobs
context_from=[job_id] chains job outputs
--no-agent --script supports pure watchdogs
webhook subscribe covers event triggers
delegate_task enables concurrent sub-agents
Gateway keeps jobs alive after a laptop sleeps (Hermes had this before Anthropic's routines)

What's missing is the contract layer. Right now every loop's guardrails (scope, budget, stop conditions, reporting) live in free-form prompt text. This surfaces two failure modes well-documented in the broader agent-loop discourse (and already hit in Hermes workflows):

Loops keep spending. A one-shot agent fails and stops; a cron-backed agent fails and comes back in 15 minutes to fail the same way. No per-job max_attempts / max_spend / max_consecutive_failures circuit breaker.
Loops act on stale assumptions. No enforced "refresh state before acting" step; prompt-level instructions are best-effort.

A third weakness — scope isolation — is partly handled today via workdir, but there's no declarative way to say "this loop is only allowed to touch files matching X" or "only open PRs, never push to main".

This issue proposes promoting the loop contract from prompt folklore to a first-class primitive.

Proposed Solution

Add optional declarative fields to hermes cron create (and the matching cronjob tool schema) that together form a Loop Contract:

hermes cron create "15m" "Babysit PRs labeled codex-watch" \
  --name pr-babysitter \
  --skill pr-babysitter \
  --scope 'pull_requests.labels=codex-watch,exclude=main' \
  --budget 'max_attempts_per_target=1,max_runtime_min=20,max_files_changed=8,max_consecutive_failures=2' \
  --stop 'same_failure_seen_twice,merge_conflict_requires_human,tests_fail_after_one_fix' \
  --report 'destination=pr-comment,fields=summary,action_taken,tests_run,remaining_blocker' \
  --refresh 'git_fetch_before_run,read_latest_ci'

Or equivalently, a declarative YAML contract next to jobs.json:

# ~/.hermes/cron/contracts/pr-babysitter.yaml
name: pr-babysitter
trigger:
  every: 15m
scope:
  include: { pull_requests: { labels: ["codex-watch"] } }
  exclude: { branches: [main] }
  writable_paths: ["**"]          # or specific globs
budget:
  max_attempts_per_target: 1
  max_runtime_min: 20
  max_files_changed: 8
  max_consecutive_failures: 2     # auto-pause after N runs fail
  max_spend_usd: 5.0              # optional, requires cost accounting
stop:
  - same_failure_seen_twice
  - merge_conflict_requires_human
  - tests_fail_after_one_fix
refresh:
  - git_fetch_before_run
  - read_latest_ci
report:
  destination: pr-comment          # or telegram, discord, local-file, etc.
  fields: [summary, action_taken, tests_run, remaining_blocker]

Minimum-viable implementation

These four circuit breakers — no new runtime architecture needed — already unlock most of the value:

max_consecutive_failures — auto-pause the job (existing hermes cron pause path) after N runs fail. Hook point: cron/scheduler.py tracks last_runs; add a failure streak and call pause_job() when it trips.
max_runtime_min per run — hard wall-clock cap independent of LLM iteration budget. Hook point: wrap the agent call with asyncio.wait_for() or equivalent timeout.
stop conditions as a per-job system-prompt prefix — auto-inject as structured guidance (not free-form "remember to stop if..."). Hook point: cron/jobs.py → job_spec → prompt builder.
refresh directives — auto-prepend git fetch origin / git pull --ff-only / equivalent before the agent gets control, so "stale assumptions" can't compound. Hook point: scheduler pre-run hook.

Scope enforcement (writable_paths) is the heaviest piece and can ship later — it needs a filesystem sandbox, which overlaps with #8164 (authorization certificate contract for unattended cron).

CLI ergonomics

hermes cron preset pr-babysitter / hermes cron preset ci-health / hermes cron preset feedback-cluster ship the three Boris-canonical loops as starter contracts. Users tweak from there.

Alternatives Considered

Leave it as prompt folklore + skills. What we have today. Works for people who build the muscle, but new users hit the "loop keeps spending" and "loop acts on yesterday's plan" traps. Also: the contract pattern is valuable enough that every serious team will re-invent it. Better to ship a canonical shape.
Build it as a skill only (e.g. loop-contract skill the user loads into every cron prompt). This is what I'll probably do next week regardless, and it covers the "stop conditions as prompt prefix" slice — but it can't enforce max_consecutive_failures or max_runtime_min, which must live in the scheduler. The skill-only version is strictly weaker.
Wait for Anthropic routines in Claude SDK. Even if that ships to third-party harnesses, it won't have Hermes' multi-platform gateway, context_from chaining, or per-job provider pinning. Hermes is already ahead on the capability layer — we should also own the contract layer.

Related work already in this repo

#491 — Webhook-Triggered Agent Sessions (same "loops + events = one durable queue" shape; inbound half of the hooks system)
#404 — Symphony-Style Autonomous Issue Resolution (the WORKFLOW.md file in that proposal is a richer instance of the same contract pattern)
#492 — Autonomous Skill Templates (OpenFang Hands) — the "skill + schedule + tool allowlist as one unit" angle
#5712 — Inject cron results into live gateway chat (fixes the "main agent doesn't know its loops did anything" half of the reporting story)
#9645 — Proactive Check-Ins (budget-aware / rate-limited cron at the messaging layer)
#8164 — MVP authorization certificate contract for unattended cron (overlaps on scope/permission enforcement)

None of these cover the unified loop contract. Each addresses one axis (trigger / permissions / reporting / workflow) in isolation. This proposal is the missing glue.

Scope

Medium. Core changes localized to:

cron/jobs.py — Job dataclass gets budget, stop, refresh, scope, report optional fields (backwards compatible; existing jobs serialize as today with these nulled)
cron/scheduler.py — failure-streak tracking, wall-clock timeout, pre-run refresh hooks, auto-pause on streak limit
hermes_cli/cron.py — new flags; preset subcommand loads bundled contract YAMLs
tools/cronjob_tools.py — matching fields on the cronjob tool schema

Scope enforcement (writable_paths) is out-of-scope for V1; revisit once #8164 lands.

Additional Context

Sequoia's "Coding's Printing Press Moment" with Boris Cherny (May 2026) — the primary source for /loop, routines, and Boris's loop inventory.
Developers Digest, Codex Loops: What Boris Cherny Gets Right About Managing Agent Work — distills Boris's scattered remarks into the trigger / scope / budget / stop / report contract shape this proposal adopts.
Karpathy's "loopy era" framing — same thesis from the other direction: developers orchestrate swarms, contracts are what make swarms useful instead of chaotic.

Why this is a primitive, not a skill

Per CONTRIBUTING.md's skill-vs-tool heuristic: skills are instructions, tools are capabilities. Loop contracts sit below both — they're the scheduler-level guarantees a skill would otherwise have to beg for in prompt text. max_consecutive_failures and per-run wall-clock timeout are enforceable only by the scheduler; a skill can document them but can't make them happen. That's why this belongs in cron/ core rather than ~/.hermes/skills/.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#index setup #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: First-class Loop Contract — declarative budget / stop / refresh / scope for cron-backed agent loops

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Minimum-viable implementation

CLI ergonomics

Alternatives Considered

Related work already in this repo

Scope

Additional Context

Why this is a primitive, not a skill

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: First-class Loop Contract — declarative budget / stop / refresh / scope for cron-backed agent loops

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Minimum-viable implementation

CLI ergonomics

Alternatives Considered

Related work already in this repo

Scope

Additional Context

Why this is a primitive, not a skill

Still need to ship something?

RELATED_DISCOVERY

TRENDING