claude-code - 💡(How to fix) Fix Proposal: Behavior-memory budget advisor (Loop Pilot) as pre-loop hook

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Claude Code's task_budget (Opus 4.7+) introduced a powerful idea: show the model a fuel gauge so it self-moderates. Loop Pilot extends this principle with historical behavior memory — the model doesn't just see a countdown, it sees what success looked like for similar tasks in the past.

Root Cause

Claude Code's task_budget (Opus 4.7+) introduced a powerful idea: show the model a fuel gauge so it self-moderates. Loop Pilot extends this principle with historical behavior memory — the model doesn't just see a countdown, it sees what success looked like for similar tasks in the past.

RAW_BUFFERClick to expand / collapse

Summary

Claude Code's task_budget (Opus 4.7+) introduced a powerful idea: show the model a fuel gauge so it self-moderates. Loop Pilot extends this principle with historical behavior memory — the model doesn't just see a countdown, it sees what success looked like for similar tasks in the past.

What Loop Pilot Adds Beyond task_budget

Featuretask_budgetLoop Pilot
Budget visibility✅ Token countdown✅ Tool-call budget
Historical context✅ "Tasks like this took 3-5 calls"
Tool-specific guidance✅ "Likely tools: bash, write_file"
Repeated-tool warnings✅ "Similar tasks over-used web_search"
Failure pattern hints✅ "edit_file failed in similar episodes"
Confidence signal✅ High/medium/low based on neighbor count

task_budget says: "You have 15,000 tokens left." Loop Pilot says: "Tasks like yours typically take 4 tool calls. The useful tools are bash and write_file. Past episodes that repeated web_search wasted cycles. You have 10 iterations available."

Both are advisory. They complement each other — one gives resource awareness, the other gives behavioral awareness.

The Philosophy Match

Claude Code already implements the "inform" half of this through task_budget. Loop Pilot is the same philosophy applied to behavioral patterns rather than token budgets.

Claude Code's spinning detector (3 turns < 500 tokens → halt) is a reactive guardrail. Loop Pilot is a proactive advisor — it prevents spinning by telling the model upfront what productive behavior looks like for this task type.

Implementation

Repo: https://github.com/monbishnoi/loop-pilot

Running in production on Cal (an agent harness built on Claude):

  • 404 real episodes of Claude tool-use behavior
  • KNN similarity search finds 5 relevant past episodes per task
  • Budget predictions consistently accurate
  • ~300ms latency (embedding model already warm)
  • Model observably respects guidance as operational context

Integration Options

  1. Pre-loop hook: If Claude Code exposes a lifecycle hook before model invocation, Loop Pilot can inject guidance into the assembled context
  2. Custom instruction extension: Loop Pilot guidance appended to system prompt via user configuration
  3. MCP tool: Claude Code supports MCP — Loop Pilot runs as an MCP server with plan_task as the primary tool
  4. Built-in feature: Claude Code could natively support behavior memory alongside task_budget (most ambitious)

Technical Details

  • TypeScript, MIT licensed
  • SQLite-backed episode memory
  • Pluggable embedding providers (shares existing model — no extra resources)
  • MCP server (4 tools) + HTTP server
  • Designed to layer on top without changing existing architecture

The Bridge Strategy

This is explicitly a bridge — as models get post-trained on agentic RL, they'll internalize iteration efficiency. Until then:

  • Current models: inform heavily (budget + tool guidance + failure hints)
  • Post-trained models: inform lightly (just the budget)
  • Fully self-regulating models: just the safety cap

Loop Pilot's guidance gracefully becomes unnecessary. The harness simplifies over time.

References

  • Related: task_budget feature (Opus 4.7)
  • Related: diminishing-returns detector (3 low-output turns)
  • Research: Tool-R1 (arXiv 2509.12867), ARTIST (Microsoft), DAPO (ByteDance)
  • Blog: "Inform, Then Trust" (publishing this week)

Happy to discuss integration patterns or contribute if there's interest.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING