claude-code - 💡(How to fix) Fix Proposal: Behavior-memory budget advisor (Loop Pilot) as pre-loop hook

claude-code2026-06-05 19:19:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Claude Code's task_budget (Opus 4.7+) introduced a powerful idea: show the model a fuel gauge so it self-moderates. Loop Pilot extends this principle with historical behavior memory — the model doesn't just see a countdown, it sees what success looked like for similar tasks in the past.

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

What Loop Pilot Adds Beyond task_budget

Feature	task_budget	Loop Pilot
Budget visibility	✅ Token countdown	✅ Tool-call budget
Historical context	❌	✅ "Tasks like this took 3-5 calls"
Tool-specific guidance	❌	✅ "Likely tools: bash, write_file"
Repeated-tool warnings	❌	✅ "Similar tasks over-used web_search"
Failure pattern hints	❌	✅ "edit_file failed in similar episodes"
Confidence signal	❌	✅ High/medium/low based on neighbor count

task_budget says: "You have 15,000 tokens left." Loop Pilot says: "Tasks like yours typically take 4 tool calls. The useful tools are bash and write_file. Past episodes that repeated web_search wasted cycles. You have 10 iterations available."

Both are advisory. They complement each other — one gives resource awareness, the other gives behavioral awareness.

The Philosophy Match

Claude Code already implements the "inform" half of this through task_budget. Loop Pilot is the same philosophy applied to behavioral patterns rather than token budgets.

Claude Code's spinning detector (3 turns < 500 tokens → halt) is a reactive guardrail. Loop Pilot is a proactive advisor — it prevents spinning by telling the model upfront what productive behavior looks like for this task type.

Implementation

Repo: https://github.com/monbishnoi/loop-pilot

Running in production on Cal (an agent harness built on Claude):

404 real episodes of Claude tool-use behavior
KNN similarity search finds 5 relevant past episodes per task
Budget predictions consistently accurate
~300ms latency (embedding model already warm)
Model observably respects guidance as operational context

Integration Options

Pre-loop hook: If Claude Code exposes a lifecycle hook before model invocation, Loop Pilot can inject guidance into the assembled context
Custom instruction extension: Loop Pilot guidance appended to system prompt via user configuration
MCP tool: Claude Code supports MCP — Loop Pilot runs as an MCP server with plan_task as the primary tool
Built-in feature: Claude Code could natively support behavior memory alongside task_budget (most ambitious)

Technical Details

TypeScript, MIT licensed
SQLite-backed episode memory
Pluggable embedding providers (shares existing model — no extra resources)
MCP server (4 tools) + HTTP server
Designed to layer on top without changing existing architecture

The Bridge Strategy

This is explicitly a bridge — as models get post-trained on agentic RL, they'll internalize iteration efficiency. Until then:

Current models: inform heavily (budget + tool guidance + failure hints)
Post-trained models: inform lightly (just the budget)
Fully self-regulating models: just the safety cap

Loop Pilot's guidance gracefully becomes unnecessary. The harness simplifies over time.

References

Related: task_budget feature (Opus 4.7)
Related: diminishing-returns detector (3 low-output turns)
Research: Tool-R1 (arXiv 2509.12867), ARTIST (Microsoft), DAPO (ByteDance)
Blog: "Inform, Then Trust" (publishing this week)

Happy to discuss integration patterns or contribute if there's interest.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering