claude-code - 💡(How to fix) Fix Default agent behavior should be engineering-organization, not LLM-style

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Long-time Claude Code customer (Max plan, heavy daily usage on a complex multi-repo software-engineering project). Filing this as product-design feedback, not a bug report.

Root Cause

  1. Author a multi-section operating-plan doctrine teaching the agent the basic discipline (index sources, classify per fixed precedence, build a reverse-provenance ledger, mechanical evidence requirement, etc.)
  2. Add a second directive layer codifying a final-output verification algorithm because the operating plan alone wasn't holding
  3. Add a third "lean controls" addendum because the agent started expanding ceremony instead of preserving controls

Fix Action

Fix / Workaround

Maintaining per-user auto-memory entries that bind these disciplines for my own sessions. That's a workaround, not a fix — every other customer has to re-discover the same gap.

RAW_BUFFERClick to expand / collapse

Contact: [email protected] (PixelMinds AI Labs, Inc.)

Context

Long-time Claude Code customer (Max plan, heavy daily usage on a complex multi-repo software-engineering project). Filing this as product-design feedback, not a bug report.

What I observe in practice

The default mode of Claude Code on substantive engineering tasks lands closer to "LLM-style" behavior than "engineering-organization" behavior:

  • Pattern-match on keywords in the codebase, generate plausible-sounding output, iterate when corrected
  • "Verified by inspection" framing without specific evidence
  • Default to "I don't know" / "needs adjudication" classifications without exhausting the available verification sources first
  • Narrative confidence that doesn't survive ground-truth grep
  • Each correction round surfaces a new layer of pattern-match-driven errors

A senior engineer at a real engineering organization defaults to the opposite posture:

  • Read the source artifact before claiming coverage; don't keyword-match
  • Build indexes / ledgers / certification artifacts before classifying
  • Cite specific evidence (file, line, commit SHA) for every claim
  • Exhaust the available decision/precedent/source registries before classifying anything as "unknown"
  • Mechanical proof over narrative; trace every claim
  • Be the brake on its own pattern-match impulses, not the amplifier

Empirical case

This week, on a single multi-hour session, I had to:

  1. Author a multi-section operating-plan doctrine teaching the agent the basic discipline (index sources, classify per fixed precedence, build a reverse-provenance ledger, mechanical evidence requirement, etc.)
  2. Add a second directive layer codifying a final-output verification algorithm because the operating plan alone wasn't holding
  3. Add a third "lean controls" addendum because the agent started expanding ceremony instead of preserving controls

This shouldn't be on the user. Every one of these directives is "what a senior engineer would do by default." A customer paying for Claude Code shouldn't have to write a multi-section operating doctrine teaching the agent to read source code before claiming coverage.

Why this matters at product level

Claude Code is differentiated from Claude AI / ChatGPT / Gemini by the value claim of "AI engineer that can drive real software work end-to-end." That claim only holds if the default behavior is engineering-discipline. When the default is LLM-style pattern-match-then-iterate, the value proposition collapses — the same output is available cheaper elsewhere.

At $200/month plus metered usage, this isn't a "nice to have" feature request. It's the core product claim.

What would help

A few specific suggestions:

  1. System prompt / training defaults — make engineering-discipline the default verification posture: trace-every-claim, ground-truth-first, exhaust durable-decision search before classifying unknown, cite specific evidence, mechanical proof over narrative.
  2. Self-check before claiming "verified" — distinguish "matched a keyword" from "read the cited record's full scope and confirmed match." Default to the latter.
  3. UNTRACEABLE / unknown as last resort — not default. Exhaust available verification sources first.
  4. Surface explicitly when doing LLM-style work — "this is genuinely brainstorming because [reason]" — not silent default.
  5. Multi-round correction loops are a signal of methodology drift, not normal workflow. Should be visible to the agent as such, with self-escalation behavior rather than continued grinding.

What I'm doing on my side

Maintaining per-user auto-memory entries that bind these disciplines for my own sessions. That's a workaround, not a fix — every other customer has to re-discover the same gap.

Happy to provide more empirical detail if useful. Not posting proprietary project content here; willing to share in a private channel if Anthropic wants the case study.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING