claude-code - 💡(How to fix) Fix Feature request: evaluate and score user-created subagents and skills with keep/remove recommendations [1 participants]

johnnyfeldman · 2026-04-21T18:29:03Z

[claude-code] Add a built-in Claude Code command e.g., /audit-subagents , /audit-skills , or a unified /audit-extensions that analyzes the user's custom subage… Add a built-in Claude Code command (e.g., `/audit-subagents`, `/audit-skills`, or a unified `/audit-extensions`) that analyzes the user's custom subagents (`.claude/agents/*.md`) and skills (`.claude/skills/*/SKILL.md`), scores each one on value contribution, and recommends whether to keep, revise, or remove them. Surface the same analysis as a dedicated section in the HTML report produced by `/insights`. ## Fix / Workaround - Many are created once for a narrow task and never used again. - Some overlap heavily with each other or with built-in capabilities. - Some have drifted out of sync with the project (referencing files/paths that no longer exist, or describing behavior the codebase has moved past). - Descriptions are often too vague for the dispatcher to match against — so the skill/agent silently never fires. - There's no principled way to decide which ones to prune, short of manual review. The result is cruft: longer context, worse dispatch precision (more candidate skills/agents diluting the match), and higher cognitive load when writing new ones. ## Summary Add a built-in Claude Code command (e.g., `/audit-subagents`, `/audit-skills`, or a unified `/audit-extensions`) that analyzes the user's custom subagents (`.claude/agents/*.md`) and skills (`.claude/skills/*/SKILL.md`), scores each one on value contribution, and recommends whether to keep, revise, or remove them. Surface the same analysis as a dedicated section in the HTML report produced by `/insights`. ## Problem Power users accumulate dozens of custom subagents and skills over time. In practice: - Many are created once for a narrow task and never used again. - Some overlap heavily with each other or with built-in capabilities. - Some have drifted out of sync with the project (referencing files/paths that no longer exist, or describing behavior the codebase has moved past). - Descriptions are often too vague for the dispatcher to match against — so the skill/agent silently never fires. - There's no principled way to decide which ones to prune, short of manual review. The result is cruft: longer context, worse dispatch precision (more candidate skills/agents diluting the match), and higher cognitive load when writing new ones. ## Proposed behavior A command that, for each custom subagent and skill: 1. **Usage signal** — count how often it has been invoked (from transcripts / session logs) over a user-specified window. 2. **Description quality score** — check whether the description is specific enough to fire reliably (trigger keywords, SKIP clauses, concrete examples) vs. vague. 3. **Overlap detection** — cluster items with substantially similar descriptions/triggers and flag redundancy. 4. **Staleness check** — detect references to files, paths, tools, or commands that no longer exist in the repo. 5. **Value rating** — a composite score (e.g., 0–10) with a one-line justification. 6. **Recommendation** — `KEEP` / `REVISE` / `REMOVE`, with the reason and, for `REVISE`, a suggested edit. Output as a ranked table so the user can quickly sweep through and approve removals in bulk. ## `/insights` integration Add a new section to the HTML report generated by `/insights` titled e.g. **"Custom Extensions Health"** that renders the same audit data in a browser-friendly format: - Summary tiles at the top: total custom subagents, total custom skills, count flagged `REMOVE`, count flagged `REVISE`, count unused in the reporting window. - Sortable / filterable table with one row per subagent and skill, columns: name, type (agent/skill), scope (user/project/plugin), invocations in window, description-quality score, overlap group, staleness flags, composite value score, recommendation, and expandable justification. - Visual cues: red row for `REMOVE`, yellow for `REVISE`, green for `KEEP`; a small bar or sparkline showing invocation counts over time. - A "suggested revisions" drawer per row showing the rewritten description/trigger text, copy-to-clipboard for quick application. - Link at the top of the section pointing to the raw `/audit-extensions` command so users can re-run interactively. This keeps the audit discoverable for users who already run `/insights` periodically, without requiring them to remember a separate command. ## Why this belongs in Claude Code Only Claude Code has visibility into both the definitions and the actual invocation history. A user can't easily compute "this skill has fired 0 times in 90 days" without building their own tooling on top of transcripts. ## Nice-to-haves - Dry-run vs. apply mode (with `--apply` actually deleting or rewriting files after confirmation). - Per-scope filtering (`--user`, `--project`, `--plugin`). - Export as JSON for scripting. - CLI flag on `/insights` (e.g., `--skip-extensions-audit`) for users who don't want this section.

claude-code2026-04-21 18:29:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#51679•Fetched 2026-04-22 07:55:47

View on GitHub

Comments

Participants

Timeline

Reactions

Author

johnnyfeldman

Participants

johnnyfeldman

Timeline (top)

labeled ×3

Add a built-in Claude Code command (e.g., /audit-subagents, /audit-skills, or a unified /audit-extensions) that analyzes the user's custom subagents (.claude/agents/*.md) and skills (.claude/skills/*/SKILL.md), scores each one on value contribution, and recommends whether to keep, revise, or remove them. Surface the same analysis as a dedicated section in the HTML report produced by /insights.

Root Cause

Fix Action

Fix / Workaround

Many are created once for a narrow task and never used again.
Some overlap heavily with each other or with built-in capabilities.
Some have drifted out of sync with the project (referencing files/paths that no longer exist, or describing behavior the codebase has moved past).
Descriptions are often too vague for the dispatcher to match against — so the skill/agent silently never fires.
There's no principled way to decide which ones to prune, short of manual review.

The result is cruft: longer context, worse dispatch precision (more candidate skills/agents diluting the match), and higher cognitive load when writing new ones.

RAW_BUFFERClick to expand / collapse

Summary

Problem

Power users accumulate dozens of custom subagents and skills over time. In practice:

Many are created once for a narrow task and never used again.
Some overlap heavily with each other or with built-in capabilities.
Some have drifted out of sync with the project (referencing files/paths that no longer exist, or describing behavior the codebase has moved past).
Descriptions are often too vague for the dispatcher to match against — so the skill/agent silently never fires.
There's no principled way to decide which ones to prune, short of manual review.

The result is cruft: longer context, worse dispatch precision (more candidate skills/agents diluting the match), and higher cognitive load when writing new ones.

Proposed behavior

A command that, for each custom subagent and skill:

Usage signal — count how often it has been invoked (from transcripts / session logs) over a user-specified window.
Description quality score — check whether the description is specific enough to fire reliably (trigger keywords, SKIP clauses, concrete examples) vs. vague.
Overlap detection — cluster items with substantially similar descriptions/triggers and flag redundancy.
Staleness check — detect references to files, paths, tools, or commands that no longer exist in the repo.
Value rating — a composite score (e.g., 0–10) with a one-line justification.
Recommendation — KEEP / REVISE / REMOVE, with the reason and, for REVISE, a suggested edit.

Output as a ranked table so the user can quickly sweep through and approve removals in bulk.

`/insights` integration

Add a new section to the HTML report generated by /insights titled e.g. "Custom Extensions Health" that renders the same audit data in a browser-friendly format:

Summary tiles at the top: total custom subagents, total custom skills, count flagged REMOVE, count flagged REVISE, count unused in the reporting window.
Sortable / filterable table with one row per subagent and skill, columns: name, type (agent/skill), scope (user/project/plugin), invocations in window, description-quality score, overlap group, staleness flags, composite value score, recommendation, and expandable justification.
Visual cues: red row for REMOVE, yellow for REVISE, green for KEEP; a small bar or sparkline showing invocation counts over time.
A "suggested revisions" drawer per row showing the rewritten description/trigger text, copy-to-clipboard for quick application.
Link at the top of the section pointing to the raw /audit-extensions command so users can re-run interactively.

This keeps the audit discoverable for users who already run /insights periodically, without requiring them to remember a separate command.

Why this belongs in Claude Code

Only Claude Code has visibility into both the definitions and the actual invocation history. A user can't easily compute "this skill has fired 0 times in 90 days" without building their own tooling on top of transcripts.

Nice-to-haves

Dry-run vs. apply mode (with --apply actually deleting or rewriting files after confirmation).
Per-scope filtering (--user, --project, --plugin).
Export as JSON for scripting.
CLI flag on /insights (e.g., --skip-extensions-audit) for users who don't want this section.

extent analysis

TL;DR

Implement a command to analyze custom subagents and skills, providing a score and recommendation for each, to help users declutter and optimize their extensions.

Guidance

Develop a scoring system that considers usage signals, description quality, overlap detection, staleness checks, and value rating to provide a comprehensive evaluation of each custom subagent and skill.
Integrate the audit results into the /insights HTML report, including a sortable and filterable table with visual cues for easy identification of recommended actions.
Consider implementing a dry-run mode and export options to provide users with flexibility and control over the audit process.
Use the existing transcript and session log data to inform the usage signal and invocation count calculations.

Example

// Example output of the audit command
| Name | Type | Scope | Invocations | Description Quality | Overlap Group | Staleness Flags | Value Score | Recommendation |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| My Skill | Skill | User | 0 | Low | Group 1 | Stale | 2 | REMOVE |
| My Agent | Agent | Project | 5 | Medium | Group 2 | Fresh | 8 | KEEP |

Notes

The implementation details of the scoring system and the integration with the /insights report will require careful consideration to ensure accuracy and usability. Additionally, the dry-run mode and export options should be designed with user experience in mind.

Recommendation

Apply a workaround by implementing the proposed /audit-extensions command, which will provide users with a principled way to evaluate and optimize their custom subagents and skills, improving the overall performance and maintainability of their extensions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#LLM response #prompt template #agent execution #callback error #memory management

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Feature request: evaluate and score user-created subagents and skills with keep/remove recommendations [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Summary

Problem

Proposed behavior

`/insights` integration

Why this belongs in Claude Code

Nice-to-haves

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Feature request: evaluate and score user-created subagents and skills with keep/remove recommendations [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Summary

Problem

Proposed behavior

/insights integration

Why this belongs in Claude Code

Nice-to-haves

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

`/insights` integration