claude-code - 💡(How to fix) Fix [Feedback] 10 Behavioral Patterns from Large-Scale Agentic Development (400K LOC, 422 tools, 80+ KB files) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48746Fetched 2026-04-16 06:52:08
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Multi-session agentic development across a large MCP-native Python project (~400K LOC, 422 tools, 80+ knowledge graph files, 26 auto-dispatch agents). Claude Code (Opus 4.6, 1M context) used as the primary development tool across 20+ sessions. These patterns were identified through a 6-persona automated review panel that audited the work.

This is not a single bug report — it's a collection of recurring behavioral patterns that caused rework. Each pattern occurred across 2-4 sessions despite corrections.


Error Message

Impact: Silent data loss. No error thrown — edges just don't appear in queries. Suggestion: When ingesting domain knowledge, check whether content contains client-specific identifiers and warn before storing in product-level files. Suggestion: Subagents should never claim they lack tool access when they have it. If a tool call is blocked by permissions, the error should be specific rather than a blanket refusal.

Pattern 8: Meta-Lessons Containing the Data They Warn Against

Root Cause

PatternFrequencyRoot Cause
JSON key inconsistency3 sessionsNot reading existing structure before appending
Safety gate on primary path only2 sessionsNot auditing all entry points
Structural tests pass, runtime fails2 sessionsNo integration test with actual execution
Uniform scores/weights2 sessionsDefaulting to "safe" middle value
Client data in product artifacts2 sessionsNo boundary check between scopes
Column names assumed without DESCRIBE4 sessionsNot verifying schema before SQL

Fix Action

Fix / Workaround

Multi-session agentic development across a large MCP-native Python project (~400K LOC, 422 tools, 80+ knowledge graph files, 26 auto-dispatch agents). Claude Code (Opus 4.6, 1M context) used as the primary development tool across 20+ sessions. These patterns were identified through a 6-persona automated review panel that audited the work.

What happened: A safety gate (risk classification) was wired into the intent routing layer but not into the direct tool dispatch function. Destructive tools could be called directly, bypassing all safety checks.

What happened: A new process definition (YAML) referenced tools not present in the executor's dispatch table. Tests for YAML schema, trigger keywords, and field presence all passed. But the process would fail at step 3 with RuntimeError("Unknown tool") when actually executed.

RAW_BUFFERClick to expand / collapse

Context

Multi-session agentic development across a large MCP-native Python project (~400K LOC, 422 tools, 80+ knowledge graph files, 26 auto-dispatch agents). Claude Code (Opus 4.6, 1M context) used as the primary development tool across 20+ sessions. These patterns were identified through a 6-persona automated review panel that audited the work.

This is not a single bug report — it's a collection of recurring behavioral patterns that caused rework. Each pattern occurred across 2-4 sessions despite corrections.


Pattern 1: JSON Schema Key Inconsistency When Appending to Existing Files

What happened: When generating JSON graph edge objects across multiple tool calls, Claude used "relationship" as the key in some edges and "type" in others — within the same file. 79 of 329 edges were written with the wrong key, making them invisible to downstream graph loaders that read edge["type"].

Impact: Silent data loss. No error thrown — edges just don't appear in queries.

Suggestion: When appending to an existing JSON array, read and match the schema of existing entries before writing new ones.


Pattern 2: Directed Edge Semantics (Source/Target Direction Reversed)

What happened: Claude created a directed graph edge with source and target reversed from its own description. The edge said A SHOULD_PRECEDE B but the description said "B should come before A."

Impact: Process ordering tools traverse the graph and schedule steps in wrong order.

Suggestion: When creating directed edges with temporal/causal semantics, verify source = what happens FIRST, target = what happens AFTER. Cross-check against the description.


Pattern 3: Security Gate Wired to Primary Path Only

What happened: A safety gate (risk classification) was wired into the intent routing layer but not into the direct tool dispatch function. Destructive tools could be called directly, bypassing all safety checks.

Impact: Defense-in-depth failure. The "happy path" was safe but the direct path was wide open.

Suggestion: When implementing safety middleware, audit ALL entry points to the protected system (grep for the underlying execution function to find all callers).


Pattern 4: Client-Specific Data in Product-Level Artifacts

What happened: During multi-session work, KB files containing real client company names, employee names, and engagement details were committed to the product repository instead of staying in the client project.

Impact: Compliance and trust violation — other customers deploying the product would see another client's data.

Suggestion: When ingesting domain knowledge, check whether content contains client-specific identifiers and warn before storing in product-level files.


Pattern 5: Structural Tests Pass But Runtime Execution Fails

What happened: A new process definition (YAML) referenced tools not present in the executor's dispatch table. Tests for YAML schema, trigger keywords, and field presence all passed. But the process would fail at step 3 with RuntimeError("Unknown tool") when actually executed.

Impact: False confidence from green tests. The gap between "structurally valid" and "functionally correct" is where bugs hide.

Suggestion: When creating process/workflow definitions that reference tool names, verify each tool exists in the dispatch mechanism. Add at least one integration test that runs the process through the executor.


Pattern 6: Subagent Exploration Failures (Refuses to Use Available Tools)

What happened: On multiple occasions, subagents dispatched for exploration tasks refused to use tools (Read, Glob, Grep, Bash) and instead returned text-only responses saying "I cannot use tools" or "tool calls are not available" — even though the agent type supports all read-only tools.

Impact: Parent agent retries (wasting context) or proceeds without information (making wrong assumptions).

Suggestion: Subagents should never claim they lack tool access when they have it. If a tool call is blocked by permissions, the error should be specific rather than a blanket refusal.


Pattern 7: Weight/Score Monoculture in Generated Data

What happened: When generating 329 weighted graph edges, 56% were assigned weight 0.8. Primary expertise and peripheral associations both got the same weight — a flat distribution with no discriminatory signal.

Impact: Weighted graphs become effectively unweighted, negating the purpose of having weights.

Suggestion: When generating scored/weighted data, use a deliberate distribution (at least 4 tiers) based on explicit criteria, not a single "reasonable-looking" default.


Pattern 8: Meta-Lessons Containing the Data They Warn Against

What happened: A lesson-learned entry titled "Client names must never appear in product KB" contained the actual client names it was warning about.

Impact: The scrub documentation became a new instance of the violation it documented.

Suggestion: When generating lessons about data sanitization, use generic placeholders instead of actual sensitive values.


Pattern 9: Unbounded Relationship Type Vocabulary

What happened: Across 329 graph edges, 37 unique relationship types were used, with 17 appearing only once. Types like GUARDS_AGAINST, CATCHES, SOLVES, ENRICHES each appeared once. Semantically similar relationships used different names.

Impact: Graph queries become fragile — MATCH ()-[r:PREVENTS]->() misses edges typed GUARDS_AGAINST.

Suggestion: When adding edges to an existing graph, check which relationship types already exist and reuse them rather than inventing new synonyms.


Pattern 10: Column/Field Names Assumed Without Verification (Recurring)

What happened: Across multiple sessions, SQL was written using assumed column names that didn't match actual schemas. 4 of 23 documented mistakes were column name mismatches.

Impact: Silent failures (NULL from non-matching JOINs) or compilation errors. Both waste significant time.

Suggestion: Before writing any SQL against an existing table, run DESCRIBE TABLE or equivalent to verify column names. This should be mandatory, not optional.


Summary Table

PatternFrequencyRoot Cause
JSON key inconsistency3 sessionsNot reading existing structure before appending
Safety gate on primary path only2 sessionsNot auditing all entry points
Structural tests pass, runtime fails2 sessionsNo integration test with actual execution
Uniform scores/weights2 sessionsDefaulting to "safe" middle value
Client data in product artifacts2 sessionsNo boundary check between scopes
Column names assumed without DESCRIBE4 sessionsNot verifying schema before SQL

Environment

  • Claude Code CLI on Windows 11
  • Model: Claude Opus 4.6 (1M context)
  • Superpowers plugin v5.0.7 installed
  • 26 custom agents via .claude/agents/
  • MCP server with 422 tools (SSE transport)

extent analysis

TL;DR

To address the recurring behavioral patterns in the Claude Code project, implement a combination of code reviews, automated tests, and schema validation to ensure consistency and accuracy in JSON data, graph edges, and SQL queries.

Guidance

  1. Implement schema validation: Before appending to existing JSON files, read and match the schema of existing entries to prevent key inconsistencies.
  2. Verify graph edge semantics: When creating directed edges, ensure that the source and target nodes are correctly ordered and match the description.
  3. Conduct thorough code reviews: Regularly review code changes to catch errors, such as incorrect column names or missing safety gates, before they are committed.
  4. Add integration tests: Include tests that run processes through the executor to catch runtime errors, such as unknown tools or incorrect weights.
  5. Use deliberate distributions for weighted data: Assign weights based on explicit criteria to avoid a flat distribution and ensure discriminatory signal.

Example

To validate JSON schema, you can use a library like jsonschema in Python:

import jsonschema

# Define the expected schema
schema = {
    "type": "object",
    "properties": {
        "relationship": {"type": "string"}
    }
}

# Load the existing JSON data
with open('data.json') as f:
    data = json.load(f)

# Validate the data against the schema
jsonschema.validate(instance=data, schema=schema)

Notes

These suggestions focus on addressing the specific patterns identified in the issue report. However, a more comprehensive solution may require additional changes, such as refactoring the codebase or implementing more robust testing frameworks.

Recommendation

Apply the suggested workarounds and implement a more comprehensive testing and validation framework to prevent similar issues in the future. This will help ensure the accuracy and consistency of the data and prevent silent failures or errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING