hermes - 💡(How to fix) Fix RFC: OS-style skill scheduling (priority queues, resource budgets, deadlock detection, layered loading)

StepCodex · 2026-05-24T09:35:19Z

[hermes] I'd like to gauge interest in upstreaming an OS-style scheduling layer for skills — treating skills as bounded "processes" governed by a small kernel… I'd like to gauge interest in upstreaming an **OS-style scheduling layer for skills** — treating skills as bounded "processes" governed by a small kernel rather than capability-discovery functions auto-loaded on trigger match. This is orthogonal to "skills self-improve during use" (which is an ML angle) — it's a **resource-governance** angle that becomes relevant in multi-profile / heavy-tool deployments where the current ad-hoc model starts to break down. I've been running a partial version of this in production for ~3 weeks. Not all of it is in code yet — some pieces are enforced as conventions in `CLAUDE.md`-style docs and would benefit from kernel-level enforcement. ## Summary I'd like to gauge interest in upstreaming an **OS-style scheduling layer for skills** — treating skills as bounded "processes" governed by a small kernel rather than capability-discovery functions auto-loaded on trigger match. This is orthogonal to "skills self-improve during use" (which is an ML angle) — it's a **resource-governance** angle that becomes relevant in multi-profile / heavy-tool deployments where the current ad-hoc model starts to break down. I've been running a partial version of this in production for ~3 weeks. Not all of it is in code yet — some pieces are enforced as conventions in `CLAUDE.md`-style docs and would benefit from kernel-level enforcement. ## Why In a small / single-profile deployment, the existing skills model works great: skill matches trigger → loads → runs → unloads. In heavier deployments I've seen four classes of pain that look like classic OS problems: 1. **Stable prefix bloat** — too many skills auto-load and the system prompt grows to where prompt-cache effectiveness drops sharply. There's no built-in tiering of "load this skill only when X is true." 2. **Skill A → Skill B → Skill A trigger cycles** — no detection; the agent just loops until a context window fills. 3. **No preemption / priority** — a "creative" skill triggered mid-turn can starve a "critical infrastructure" skill of attention; there's no priority signal. 4. **No resource accounting** — a skill that spends 50K tokens generating fixtures looks identical (post-hoc) to one that spent 500 tokens; no way to set a budget per skill class, and no way to track which skills are token-heavy over time. These map cleanly onto OS scheduler problems, and OS solutions translate. ## Design sketch **Skill manifest extensions** (backward-compatible defaults): ```yaml # skills/my-skill/SKILL.md frontmatter name: my-skill priority: 50 # 0-100; default 50; higher = preempts lower tier: domain # meta | platform | engineering | domain | creative budget: max_tokens: 8000 max_tool_calls: 10 max_wall_seconds: 60 writes_to: # for mutex / queue (see RFC #31385... err, separate proposal) - .hermes/state/foo.json depends_on: # for deadlock detection - bash - file-read ``` **5-tier layered loading** (lazy, by tier): ``` meta → always loaded (skill discovery, kernel rules) platform → loaded when on the corresponding transport (Feishu/Discord/etc.) engineering → loaded when toolset includes Bash/Read/Edit/Grep domain → loaded on explicit slash command or strong context match creative → loaded only on explicit invocation ``` Lower tiers consume more prompt budget; user / config sets `max_loaded_tokens_per_tier`. **Scheduler core**: ``` when a skill is triggered: if priority(new) > priority(running) and not running.holding_lock: preempt running; resume after new completes else: enqueue at priority while running: check budget (tokens, tool calls, wall time) on budget exhausted: fail-soft with partial result + budget_exceeded marker on skill chain: maintain call graph; if cycle detected, refuse with explanation ``` **Process accounting log**: ```jsonl {"t":"2026-05-24T09:00:00Z","skill":"git-workflow","priority":50,"tier":"engineering","tokens":1240,"tool_calls":3,"wall_ms":4200,"outcome":"ok"} {"t":"...","skill":"diagram","priority":30,"tier":"creative","tokens":5800,"tool_calls":0,"wall_ms":12000,"outcome":"budget_exceeded","budget_field":"tokens"} ``` Lets users discover "which skills are my real cost centers" without per-turn debugging. ## Why this isn't covered by existing features - **`hermes skills enable/disable`** is a binary opt-in; this is dynamic priority + budget + tiering - **`auto_resume`** etc. budget guards are per-turn; this is per-skill - **"Skills self-improve during use"** is content-level; this is execution-level ## What this is NOT - Not a replacement for skill auto-discovery — the matcher still proposes; the scheduler decides whether to admit - Not an RL-trained policy — pure declarative manifest + simple scheduler (extensible if someone wants to learn priorities later) - Not a sandbox / security boundary (skills still run with the agent's full privileges) — purely resource governan

Root Cause

I'd like to gauge interest in upstreaming an OS-style scheduling layer for skills — treating skills as bounded "processes" governed by a small kernel rather than capability-discovery functions auto-loaded on trigger match.

This is orthogonal to "skills self-improve during use" (which is an ML angle) — it's a resource-governance angle that becomes relevant in multi-profile / heavy-tool deployments where the current ad-hoc model starts to break down.

I've been running a partial version of this in production for ~3 weeks. Not all of it is in code yet — some pieces are enforced as conventions in CLAUDE.md-style docs and would benefit from kernel-level enforcement.

Code Example

# skills/my-skill/SKILL.md frontmatter
name: my-skill
priority: 50              # 0-100; default 50; higher = preempts lower
tier: domain              # meta | platform | engineering | domain | creative
budget:
  max_tokens: 8000
  max_tool_calls: 10
  max_wall_seconds: 60
writes_to:                # for mutex / queue (see RFC #31385... err, separate proposal)
  - .hermes/state/foo.json
depends_on:               # for deadlock detection
  - bash
  - file-read

---

meta        → always loaded (skill discovery, kernel rules)
platform    → loaded when on the corresponding transport (Feishu/Discord/etc.)
engineering → loaded when toolset includes Bash/Read/Edit/Grep
domain      → loaded on explicit slash command or strong context match
creative    → loaded only on explicit invocation

---

when a skill is triggered:
  if priority(new) > priority(running) and not running.holding_lock:
    preempt running; resume after new completes
  else:
    enqueue at priority

while running:
  check budget (tokens, tool calls, wall time)
  on budget exhausted:
    fail-soft with partial result + budget_exceeded marker

on skill chain:
  maintain call graph; if cycle detected, refuse with explanation

---

{"t":"2026-05-24T09:00:00Z","skill":"git-workflow","priority":50,"tier":"engineering","tokens":1240,"tool_calls":3,"wall_ms":4200,"outcome":"ok"}
{"t":"...","skill":"diagram","priority":30,"tier":"creative","tokens":5800,"tool_calls":0,"wall_ms":12000,"outcome":"budget_exceeded","budget_field":"tokens"}

Summary

Why

In a small / single-profile deployment, the existing skills model works great: skill matches trigger → loads → runs → unloads. In heavier deployments I've seen four classes of pain that look like classic OS problems:

Stable prefix bloat — too many skills auto-load and the system prompt grows to where prompt-cache effectiveness drops sharply. There's no built-in tiering of "load this skill only when X is true."
Skill A → Skill B → Skill A trigger cycles — no detection; the agent just loops until a context window fills.
No preemption / priority — a "creative" skill triggered mid-turn can starve a "critical infrastructure" skill of attention; there's no priority signal.
No resource accounting — a skill that spends 50K tokens generating fixtures looks identical (post-hoc) to one that spent 500 tokens; no way to set a budget per skill class, and no way to track which skills are token-heavy over time.

These map cleanly onto OS scheduler problems, and OS solutions translate.

Design sketch

Skill manifest extensions (backward-compatible defaults):

# skills/my-skill/SKILL.md frontmatter
name: my-skill
priority: 50              # 0-100; default 50; higher = preempts lower
tier: domain              # meta | platform | engineering | domain | creative
budget:
  max_tokens: 8000
  max_tool_calls: 10
  max_wall_seconds: 60
writes_to:                # for mutex / queue (see RFC #31385... err, separate proposal)
  - .hermes/state/foo.json
depends_on:               # for deadlock detection
  - bash
  - file-read

5-tier layered loading (lazy, by tier):

meta        → always loaded (skill discovery, kernel rules)
platform    → loaded when on the corresponding transport (Feishu/Discord/etc.)
engineering → loaded when toolset includes Bash/Read/Edit/Grep
domain      → loaded on explicit slash command or strong context match
creative    → loaded only on explicit invocation

Lower tiers consume more prompt budget; user / config sets max_loaded_tokens_per_tier.

Scheduler core:

when a skill is triggered:
  if priority(new) > priority(running) and not running.holding_lock:
    preempt running; resume after new completes
  else:
    enqueue at priority

while running:
  check budget (tokens, tool calls, wall time)
  on budget exhausted:
    fail-soft with partial result + budget_exceeded marker

on skill chain:
  maintain call graph; if cycle detected, refuse with explanation

Process accounting log:

{"t":"2026-05-24T09:00:00Z","skill":"git-workflow","priority":50,"tier":"engineering","tokens":1240,"tool_calls":3,"wall_ms":4200,"outcome":"ok"}
{"t":"...","skill":"diagram","priority":30,"tier":"creative","tokens":5800,"tool_calls":0,"wall_ms":12000,"outcome":"budget_exceeded","budget_field":"tokens"}

Lets users discover "which skills are my real cost centers" without per-turn debugging.

Why this isn't covered by existing features

hermes skills enable/disable is a binary opt-in; this is dynamic priority + budget + tiering
auto_resume etc. budget guards are per-turn; this is per-skill
"Skills self-improve during use" is content-level; this is execution-level

What this is NOT

Not a replacement for skill auto-discovery — the matcher still proposes; the scheduler decides whether to admit
Not an RL-trained policy — pure declarative manifest + simple scheduler (extensible if someone wants to learn priorities later)
Not a sandbox / security boundary (skills still run with the agent's full privileges) — purely resource governance
Not opinionated about the underlying transport / model

Questions before I open a PR

In scope? This is a non-trivial kernel addition. Does it belong as a first-class feature, an optional skill, or out-of-tree?
Migration: existing skills have no priority / tier / budget — defaults should be backward compatible (priority=50, tier=domain, budget=unlimited). OK?
Scope split: full scheduler is a lot. Acceptable to start with just tiered loading + budget enforcement (no preemption, no cycle detection) as Phase 1?
Manifest schema versioning — bump SKILL.md frontmatter version, or additive fields with implicit defaults?

Not opening a PR yet. Related batch I just opened: #31385 (subprocess executor bridge), #31387 (drift hook — withdrawn in favor of #28946), #31388 (multi-profile memory), #31392 (agent-native task relay). Happy to sequence all of these however you prefer.

Thanks!

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix RFC: OS-style skill scheduling (priority queues, resource budgets, deadlock detection, layered loading)

Recommended Tools

GitHub issue graph ai analysis