claude-code - 💡(How to fix) Fix Skills should self-recommend effort level after execution [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#45271Fetched 2026-04-09 08:09:15
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

Error Message

Skill manifests support an effort frontmatter field (low, medium, high, max), but there's no guidance on what to set it to. The only way to tune it is manual trial and error — run the skill, eyeball the output, decide if it over-thought or under-thought, edit the YAML, repeat.

Root Cause

Skill authors (especially in orgs with many shared skills) currently have to guess. Too high wastes tokens and time on every invocation. Too low produces shallow results. The model is the only one who actually knows which it needed.

RAW_BUFFERClick to expand / collapse

Problem

Skill manifests support an effort frontmatter field (low, medium, high, max), but there's no guidance on what to set it to. The only way to tune it is manual trial and error — run the skill, eyeball the output, decide if it over-thought or under-thought, edit the YAML, repeat.

Claude has all the signal needed to make this judgment itself:

  • Whether the task was mechanical (sequential tool calls, no branching) or required synthesis
  • Whether it had to backtrack or reconsider
  • Whether the output quality would have degraded at a lower effort level
  • Token usage relative to task complexity

None of this feeds back into anything.

Proposal

After a skill executes, Claude should be able to emit an effort recommendation, e.g.:

"This skill ran at high effort but the task was mechanical — low would produce equivalent results and run faster."

This could take several forms (not mutually exclusive):

  1. Post-execution advisory — Claude notes in its response when the effort level seems mismatched
  2. --recommend-effort flag — Run a skill once and get a suggested effort level for the frontmatter
  3. Auto-calibration — After N runs, suggest an effort level based on observed reasoning patterns

This is low-hanging fruit — the model already knows whether it needed to think hard. It just doesn't say so.

Why this matters

Skill authors (especially in orgs with many shared skills) currently have to guess. Too high wastes tokens and time on every invocation. Too low produces shallow results. The model is the only one who actually knows which it needed.

extent analysis

TL;DR

Implement a post-execution advisory or a --recommend-effort flag to provide effort level suggestions based on Claude's execution analysis.

Guidance

  • Analyze Claude's execution data to determine the actual effort required for a task, considering factors like task complexity, token usage, and output quality.
  • Implement a feedback mechanism to provide effort level suggestions, such as a post-execution advisory or a --recommend-effort flag.
  • Consider implementing auto-calibration to suggest an effort level based on observed reasoning patterns after multiple runs.
  • Evaluate the effectiveness of the suggested effort levels to refine the recommendation algorithm.

Example

// Example output with effort recommendation
"This skill ran at `high` effort but the task was mechanical — `low` would produce equivalent results and run faster."

Notes

The proposed solution relies on Claude's ability to analyze its own execution and provide meaningful feedback. The effectiveness of the effort level suggestions may vary depending on the complexity of the tasks and the quality of the execution data.

Recommendation

Apply a workaround by implementing a post-execution advisory to provide effort level suggestions, as this approach can be developed and tested independently of the --recommend-effort flag or auto-calibration features.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING