hermes - 💡(How to fix) Fix refactor(skills): reduce ambiguity and hot-path token waste with progressive disclosure

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Target size

  • Operational workflow skill: 8k–15k chars when possible.
  • Complex high-risk skill: <= 15k–25k chars.
  • Long references may remain larger because they are lazy-loaded.

Fix Action

Fix / Workaround

Deployable unit: one PR/patch per high-traffic skill, not one giant rewrite.

RAW_BUFFERClick to expand / collapse

Goal

Reduce skill ambiguity and token waste by turning large / duplicated / wrapper-heavy skills into explicit progressive-disclosure skill packages.

The intent is not to make every skill shorter for its own sake. The intent is to keep hot-path SKILL.md files limited to trigger contract, routing, safety boundaries, execution skeleton, critical pitfalls, and verification; move long examples/source maps/incidents into on-demand references/, templates/, or scripts/; and remove or archive stale duplicates only after references are checked.

Source of Truth

  • User direction: “불필요한 요소들을 제거해서 skill의 모호함을 없애고 토큰의 낭비를 줄이는 효과”
  • Wiki-grounded plan:
    • harness-engineering/prompt-assets.md: progressive disclosure for prompt assets / skills
    • harness-engineering/agent-skills-ecosystem.md: use minimal workflow-relevant skill surfaces, not maximal installed skill count
    • principles/ai-response-cognitive-debt-contract.md: optimize for decision/verification cost, not mere brevity
    • index.md: subtractive design / minimal harness direction was found in index, but the referenced detailed page was not directly readable in this session
  • Local runtime/code evidence:
    • agent/skill_commands.py injects loaded skill content into message context
    • tools/skills_tool.py separates skills_list metadata, skill_view hot-path content, and linked-file lazy loading
    • tools/skill_manager_tool.py enforces SKILL.md max size around 100k chars
  • Inventory snapshot from the planning pass:
    • /Users/pjw/.hermes/skills: 113 skills, ~1,622,694 chars / estimated ~405k tokens
    • /Users/pjw/.hermes/hermes-agent/skills: 89 skills, ~1,011,873 chars / estimated ~253k tokens
    • P0 examples: ddalggak.backup.* ~106k chars, research-paper-writing ~102k, pjw-icloud-llm-wiki ~66k, ddalggak-cron-scheduler ~65k, github-pr-workflow ~60k

Non-Goals / Must Not Touch

  • Do not delete skills without an explicit archive/delete decision and rollback path.
  • Do not modify or delete /Users/pjw/.claude/skills/getwiki or /Users/pjw/.claude/skills/setwiki.
  • Do not mix user-local cleanup and repo-bundled cleanup in the same unbounded mutation.
  • Do not hide critical safety guardrails only in deep references.
  • Do not break cron jobs or workflows that name attached skills.

Work Plan

Unit 1 — Read-only inventory and ambiguity audit

Deployable unit: a read-only audit/reporting change or standalone script.

Owned files / surfaces

  • scripts/audit_skills.py or equivalent repo-local script
  • tests/skills/test_skill_clarity.py if adding guard tests
  • generated artifacts such as .artifacts/skill-audit.json and .artifacts/skill-audit.md should not be committed unless explicitly intended

Required checks

  • Count skills per root.
  • Measure SKILL.md chars and estimated tokens.
  • Detect duplicate names/slugs.
  • Detect backup/stale naming patterns.
  • Check When to Use, Do Not Use For, linked files, and broad/ambiguous descriptions.
  • Classify each skill as keep, compress, move-to-reference, merge-into, archive, or delete-after-approval.

Validation

  • Script runs read-only against both user-local and repo-bundled roots.
  • Report lists top size offenders and ambiguity candidates.
  • No skill content is changed in this unit.

Unit 2 — Scope and placement policy

Deployable unit: explicit policy docs/rubric before rewriting skills.

Owned files / surfaces

  • docs/ or skills/.../references/ policy document, depending repo convention
  • Optional short pointer from existing skill authoring docs

Policy must define

  • What remains in hot-path SKILL.md:
    • trigger contract
    • do-not-use/counter-trigger
    • execution skeleton
    • side-effect approval boundary
    • critical pitfalls
    • verification checklist
    • reference loading guide
  • What moves to references:
    • source maps
    • long examples
    • historical incidents
    • edge-case catalogues
    • API tables
    • long rationale
  • When to merge/archive/delete.

Validation

  • Policy explicitly preserves critical guardrails.
  • Policy distinguishes user-local skill management from repo-bundled PR changes.

Unit 3 — Backup/stale cleanup lane

Deployable unit: smallest cleanup of obvious stale/backup skill entries.

Owned files / surfaces

  • user-local skills only when operating locally, or repo-bundled skills only when operating in repo PR
  • cron skill references if any skill name is removed or consolidated

Parallelization note

  • This unit should run after Unit 1 and Unit 2.
  • It can be independent from high-traffic skill slimming if it only archives/removes clearly stale backup directories.

Safety checks

  • Check pinned/cron/job references before delete/archive.
  • Verify duplicate/backup is not the canonical loaded skill.
  • Keep rollback path.

Validation

  • Before/after skill count.
  • Cron/job reference check passes.
  • skills_list no longer exposes stale backup entries.

Unit 4 — Wrapper slimming lane

Deployable unit: convert wrapper skills into thin routers.

Candidate skills

  • getwiki
  • setwiki
  • ddalggak-cron
  • similar command-wrapper skills

Owned files / surfaces

  • corresponding wrapper SKILL.md
  • linked references/ only when needed

Target contract

  • Wrapper target size: 1k–4k chars.
  • Wrapper contains canonical skill name, mode boundary, and do-not-use rules.
  • Wrapper does not duplicate the canonical workflow body.

Validation

  • skills_list description remains distinct.
  • skill_view(wrapper) loads a compact router.
  • Required canonical skill and linked references still load on demand.

Unit 5 — High-traffic progressive-disclosure lane

Deployable unit: one PR/patch per high-traffic skill, not one giant rewrite.

Candidate skills

  • pjw-icloud-llm-wiki
  • github-pr-workflow
  • github-issues
  • ddalggak-cron-scheduler
  • hermes-agent

Owned files / surfaces

  • each target skill’s SKILL.md
  • that skill’s references/, templates/, scripts/
  • any package/install manifest or verifier that requires linked files to be included

Conflict-free split

  • Each high-traffic skill should be a separate lane/PR where possible.
  • Shared audit policy/tests from Units 1–2 must land first.
  • Do not concurrently edit the same SKILL.md or shared verifier baseline from multiple lanes.

Target size

  • Operational workflow skill: 8k–15k chars when possible.
  • Complex high-risk skill: <= 15k–25k chars.
  • Long references may remain larger because they are lazy-loaded.

Validation

  • Before/after size reported.
  • Guardrails preserved.
  • Reference loading guide points to moved details.
  • Linked files exist and package inclusion/verifier checks pass.

Unit 6 — Bloat/ambiguity lint lane

Deployable unit: regression prevention after the first cleanup passes.

Owned files / surfaces

  • audit script / tests
  • baseline allowlist if existing skills exceed thresholds
  • docs explaining warning vs fail thresholds

Rules to implement initially

  • description exists and is preferably <= 180 chars.
  • When to Use missing -> warning.
  • broad related-skill surface without Do Not Use For -> warning.
  • SKILL.md > 20k -> warning.
  • SKILL.md > 50k -> strong warning or fail once baseline is reduced.
  • SKILL.md > 100k -> fail.
  • duplicate names/slugs -> fail.
  • linked file exists but has no hot-path pointer -> warning.
  • wrapper skill duplicates canonical workflow content -> warning.

Validation

  • Tests pass on current baseline with explicit allowlist if needed.
  • New or modified skills are checked without relying on manual review only.

Dependency / Execution Order

  1. Unit 1: read-only audit
  2. Unit 2: policy/rubric
  3. Unit 3: backup/stale cleanup
  4. Unit 4: wrapper slimming
  5. Unit 5: high-traffic skill progressive disclosure, one skill per lane where possible
  6. Unit 6: lint/CI gate, once baseline thresholds are agreed

AI-Safe Idempotent Slices

For each target skill:

  1. Measure current size and current linked files.
  2. Identify sections to keep/move/delete.
  3. Move exactly one section group to references/.
  4. Add or update hot-path pointer.
  5. Run frontmatter/link/skill load validation.
  6. Compare before/after size.
  7. Stop for review if a critical safety rule would move out of hot path.

Acceptance Criteria

  • Audit report identifies large, duplicate, backup, and ambiguous skills.
  • Hot-path placement policy is documented.
  • Obvious stale backup skills are archived/deleted only after reference checks and approval.
  • Wrapper skills are reduced to thin routers with clear canonical routing.
  • High-traffic skills use progressive disclosure and preserve critical guardrails.
  • Before/after token/char savings are reported per skill.
  • Skill clarity lint prevents reintroducing large ambiguous skills.
  • No cron attached-skill references are broken.

Suggested Labels

  • type/refactor
  • tool/skills
  • P2

Suggested Branch

refactor/skill-clarity-token-budget

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix refactor(skills): reduce ambiguity and hot-path token waste with progressive disclosure