hermes - 💡(How to fix) Fix refactor(skills): reduce ambiguity and hot-path token waste with progressive disclosure

StepCodex · 2026-05-23T04:45:00Z

[hermes] Goal Reduce skill ambiguity and token waste by turning large / duplicated / wrapper-heavy skills into explicit progressive-disclosure skill packages.… ## Fix / Workaround **Deployable unit:** one PR/patch per high-traffic skill, not one giant rewrite. ## Goal Reduce skill ambiguity and token waste by turning large / duplicated / wrapper-heavy skills into explicit progressive-disclosure skill packages. The intent is **not** to make every skill shorter for its own sake. The intent is to keep hot-path `SKILL.md` files limited to trigger contract, routing, safety boundaries, execution skeleton, critical pitfalls, and verification; move long examples/source maps/incidents into on-demand `references/`, `templates/`, or `scripts/`; and remove or archive stale duplicates only after references are checked. ## Source of Truth - User direction: “불필요한 요소들을 제거해서 skill의 모호함을 없애고 토큰의 낭비를 줄이는 효과” - Wiki-grounded plan: - `harness-engineering/prompt-assets.md`: progressive disclosure for prompt assets / skills - `harness-engineering/agent-skills-ecosystem.md`: use minimal workflow-relevant skill surfaces, not maximal installed skill count - `principles/ai-response-cognitive-debt-contract.md`: optimize for decision/verification cost, not mere brevity - `index.md`: subtractive design / minimal harness direction was found in index, but the referenced detailed page was not directly readable in this session - Local runtime/code evidence: - `agent/skill_commands.py` injects loaded skill content into message context - `tools/skills_tool.py` separates `skills_list` metadata, `skill_view` hot-path content, and linked-file lazy loading - `tools/skill_manager_tool.py` enforces `SKILL.md` max size around 100k chars - Inventory snapshot from the planning pass: - `/Users/pjw/.hermes/skills`: 113 skills, ~1,622,694 chars / estimated ~405k tokens - `/Users/pjw/.hermes/hermes-agent/skills`: 89 skills, ~1,011,873 chars / estimated ~253k tokens - P0 examples: `ddalggak.backup.*` ~106k chars, `research-paper-writing` ~102k, `pjw-icloud-llm-wiki` ~66k, `ddalggak-cron-scheduler` ~65k, `github-pr-workflow` ~60k ## Non-Goals / Must Not Touch - Do not delete skills without an explicit archive/delete decision and rollback path. - Do not modify or delete `/Users/pjw/.claude/skills/getwiki` or `/Users/pjw/.claude/skills/setwiki`. - Do not mix user-local cleanup and repo-bundled cleanup in the same unbounded mutation. - Do not hide critical safety guardrails only in deep references. - Do not break cron jobs or workflows that name attached skills. ## Work Plan ### Unit 1 — Read-only inventory and ambiguity audit **Deployable unit:** a read-only audit/reporting change or standalone script. **Owned files / surfaces** - `scripts/audit_skills.py` or equivalent repo-local script - `tests/skills/test_skill_clarity.py` if adding guard tests - generated artifacts such as `.artifacts/skill-audit.json` and `.artifacts/skill-audit.md` should not be committed unless explicitly intended **Required checks** - Count skills per root. - Measure `SKILL.md` chars and estimated tokens. - Detect duplicate names/slugs. - Detect backup/stale naming patterns. - Check `When to Use`, `Do Not Use For`, linked files, and broad/ambiguous descriptions. - Classify each skill as `keep`, `compress`, `move-to-reference`, `merge-into`, `archive`, or `delete-after-approval`. **Validation** - Script runs read-only against both user-local and repo-bundled roots. - Report lists top size offenders and ambiguity candidates. - No skill content is changed in this unit. ### Unit 2 — Scope and placement policy **Deployable unit:** explicit policy docs/rubric before rewriting skills. **Owned files / surfaces** - `docs/` or `skills/.../references/` policy document, depending repo convention - Optional short pointer from existing skill authoring docs **Policy must define** - What remains in hot-path `SKILL.md`: - trigger contract - do-not-use/counter-trigger - execution skeleton - side-effect approval boundary - critical pitfalls - verification checklist - reference loading guide - What moves to references: - source maps - long examples - historical incidents - edge-case catalogues - API tables - long rationale - When to merge/archive/delete. **Validation** - Policy explicitly preserves critical guardrails. - Policy distinguishes user-local skill management from repo-bundled PR changes. ### Unit 3 — Backup/stale cleanup lane **Deployable unit:** smallest cleanup of obvious stale/backup skill entries. **Owned files / surfaces** - user-local skills only when operating locally, or repo-bundled skills only when operating in repo PR - cron skill references if any skill name is removed or consolidated **Parallelization note** - This unit should run after Unit 1 and Unit 2. - It can be independent from high-traffic skill slimming if it only archives/removes clearly stale backup directories. **Safety checks** - Check pinned/cron/job references before delete/archive. - Verify duplicate/backup is not t

hermes2026-05-23 04:45:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Target size

Operational workflow skill: 8k–15k chars when possible.
Complex high-risk skill: <= 15k–25k chars.
Long references may remain larger because they are lazy-loaded.

Fix Action

Fix / Workaround

Deployable unit: one PR/patch per high-traffic skill, not one giant rewrite.

RAW_BUFFERClick to expand / collapse

Goal

Reduce skill ambiguity and token waste by turning large / duplicated / wrapper-heavy skills into explicit progressive-disclosure skill packages.

The intent is not to make every skill shorter for its own sake. The intent is to keep hot-path SKILL.md files limited to trigger contract, routing, safety boundaries, execution skeleton, critical pitfalls, and verification; move long examples/source maps/incidents into on-demand references/, templates/, or scripts/; and remove or archive stale duplicates only after references are checked.

Source of Truth

User direction: “불필요한 요소들을 제거해서 skill의 모호함을 없애고 토큰의 낭비를 줄이는 효과”
Wiki-grounded plan:
- harness-engineering/prompt-assets.md: progressive disclosure for prompt assets / skills
- harness-engineering/agent-skills-ecosystem.md: use minimal workflow-relevant skill surfaces, not maximal installed skill count
- principles/ai-response-cognitive-debt-contract.md: optimize for decision/verification cost, not mere brevity
- index.md: subtractive design / minimal harness direction was found in index, but the referenced detailed page was not directly readable in this session
Local runtime/code evidence:
- agent/skill_commands.py injects loaded skill content into message context
- tools/skills_tool.py separates skills_list metadata, skill_view hot-path content, and linked-file lazy loading
- tools/skill_manager_tool.py enforces SKILL.md max size around 100k chars
Inventory snapshot from the planning pass:
- /Users/pjw/.hermes/skills: 113 skills, ~1,622,694 chars / estimated ~405k tokens
- /Users/pjw/.hermes/hermes-agent/skills: 89 skills, ~1,011,873 chars / estimated ~253k tokens
- P0 examples: ddalggak.backup.* ~106k chars, research-paper-writing ~102k, pjw-icloud-llm-wiki ~66k, ddalggak-cron-scheduler ~65k, github-pr-workflow ~60k

Non-Goals / Must Not Touch

Do not delete skills without an explicit archive/delete decision and rollback path.
Do not modify or delete /Users/pjw/.claude/skills/getwiki or /Users/pjw/.claude/skills/setwiki.
Do not mix user-local cleanup and repo-bundled cleanup in the same unbounded mutation.
Do not hide critical safety guardrails only in deep references.
Do not break cron jobs or workflows that name attached skills.

Work Plan

Unit 1 — Read-only inventory and ambiguity audit

Deployable unit: a read-only audit/reporting change or standalone script.

Owned files / surfaces

scripts/audit_skills.py or equivalent repo-local script
tests/skills/test_skill_clarity.py if adding guard tests
generated artifacts such as .artifacts/skill-audit.json and .artifacts/skill-audit.md should not be committed unless explicitly intended

Required checks

Count skills per root.
Measure SKILL.md chars and estimated tokens.
Detect duplicate names/slugs.
Detect backup/stale naming patterns.
Check When to Use, Do Not Use For, linked files, and broad/ambiguous descriptions.
Classify each skill as keep, compress, move-to-reference, merge-into, archive, or delete-after-approval.

Validation

Script runs read-only against both user-local and repo-bundled roots.
Report lists top size offenders and ambiguity candidates.
No skill content is changed in this unit.

Unit 2 — Scope and placement policy

Deployable unit: explicit policy docs/rubric before rewriting skills.

Owned files / surfaces

docs/ or skills/.../references/ policy document, depending repo convention
Optional short pointer from existing skill authoring docs

Policy must define

What remains in hot-path SKILL.md:
- trigger contract
- do-not-use/counter-trigger
- execution skeleton
- side-effect approval boundary
- critical pitfalls
- verification checklist
- reference loading guide
What moves to references:
- source maps
- long examples
- historical incidents
- edge-case catalogues
- API tables
- long rationale
When to merge/archive/delete.

Validation

Policy explicitly preserves critical guardrails.
Policy distinguishes user-local skill management from repo-bundled PR changes.

Unit 3 — Backup/stale cleanup lane

Deployable unit: smallest cleanup of obvious stale/backup skill entries.

Owned files / surfaces

user-local skills only when operating locally, or repo-bundled skills only when operating in repo PR
cron skill references if any skill name is removed or consolidated

Parallelization note

This unit should run after Unit 1 and Unit 2.
It can be independent from high-traffic skill slimming if it only archives/removes clearly stale backup directories.

Safety checks

Check pinned/cron/job references before delete/archive.
Verify duplicate/backup is not the canonical loaded skill.
Keep rollback path.

Validation

Before/after skill count.
Cron/job reference check passes.
skills_list no longer exposes stale backup entries.

Unit 4 — Wrapper slimming lane

Deployable unit: convert wrapper skills into thin routers.

Candidate skills

getwiki
setwiki
ddalggak-cron
similar command-wrapper skills

Owned files / surfaces

corresponding wrapper SKILL.md
linked references/ only when needed

Target contract

Wrapper target size: 1k–4k chars.
Wrapper contains canonical skill name, mode boundary, and do-not-use rules.
Wrapper does not duplicate the canonical workflow body.

Validation

skills_list description remains distinct.
skill_view(wrapper) loads a compact router.
Required canonical skill and linked references still load on demand.

Unit 5 — High-traffic progressive-disclosure lane

Deployable unit: one PR/patch per high-traffic skill, not one giant rewrite.

Candidate skills

pjw-icloud-llm-wiki
github-pr-workflow
github-issues
ddalggak-cron-scheduler
hermes-agent

Owned files / surfaces

each target skill’s SKILL.md
that skill’s references/, templates/, scripts/
any package/install manifest or verifier that requires linked files to be included

Conflict-free split

Each high-traffic skill should be a separate lane/PR where possible.
Shared audit policy/tests from Units 1–2 must land first.
Do not concurrently edit the same SKILL.md or shared verifier baseline from multiple lanes.

Target size

Operational workflow skill: 8k–15k chars when possible.
Complex high-risk skill: <= 15k–25k chars.
Long references may remain larger because they are lazy-loaded.

Validation

Before/after size reported.
Guardrails preserved.
Reference loading guide points to moved details.
Linked files exist and package inclusion/verifier checks pass.

Unit 6 — Bloat/ambiguity lint lane

Deployable unit: regression prevention after the first cleanup passes.

Owned files / surfaces

audit script / tests
baseline allowlist if existing skills exceed thresholds
docs explaining warning vs fail thresholds

Rules to implement initially

description exists and is preferably <= 180 chars.
When to Use missing -> warning.
broad related-skill surface without Do Not Use For -> warning.
SKILL.md > 20k -> warning.
SKILL.md > 50k -> strong warning or fail once baseline is reduced.
SKILL.md > 100k -> fail.
duplicate names/slugs -> fail.
linked file exists but has no hot-path pointer -> warning.
wrapper skill duplicates canonical workflow content -> warning.

Validation

Tests pass on current baseline with explicit allowlist if needed.
New or modified skills are checked without relying on manual review only.

Dependency / Execution Order

Unit 1: read-only audit
Unit 2: policy/rubric
Unit 3: backup/stale cleanup
Unit 4: wrapper slimming
Unit 5: high-traffic skill progressive disclosure, one skill per lane where possible
Unit 6: lint/CI gate, once baseline thresholds are agreed

AI-Safe Idempotent Slices

For each target skill:

Measure current size and current linked files.
Identify sections to keep/move/delete.
Move exactly one section group to references/.
Add or update hot-path pointer.
Run frontmatter/link/skill load validation.
Compare before/after size.
Stop for review if a critical safety rule would move out of hot path.

Acceptance Criteria

Audit report identifies large, duplicate, backup, and ambiguous skills.
Hot-path placement policy is documented.
Obvious stale backup skills are archived/deleted only after reference checks and approval.
Wrapper skills are reduced to thin routers with clear canonical routing.
High-traffic skills use progressive disclosure and preserve critical guardrails.
Before/after token/char savings are reported per skill.
Skill clarity lint prevents reintroducing large ambiguous skills.
No cron attached-skill references are broken.

Suggested Labels

type/refactor
tool/skills
P2

Suggested Branch

refactor/skill-clarity-token-budget

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix refactor(skills): reduce ambiguity and hot-path token waste with progressive disclosure

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Goal

Source of Truth

Non-Goals / Must Not Touch

Work Plan

Unit 1 — Read-only inventory and ambiguity audit

Unit 2 — Scope and placement policy

Unit 3 — Backup/stale cleanup lane

Unit 4 — Wrapper slimming lane

Unit 5 — High-traffic progressive-disclosure lane

Unit 6 — Bloat/ambiguity lint lane

Dependency / Execution Order

AI-Safe Idempotent Slices

Acceptance Criteria

Suggested Labels

Suggested Branch

Still need to ship something?

TRENDING