claude-code - 💡(How to fix) Fix Feature: Pre-computed codebase index to reduce agent exploration overhead [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48127Fetched 2026-04-15 06:32:24
View on GitHub
Comments
1
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×4closed ×1commented ×1

Error Message

probe_query("fix login timeout")

→ Primary matches: backend/app/api/auth.py → login() [line 19] Called by: native/app/login.tsx:handleLogin, frontend/LoginPage.tsx:handleSubmit

backend/app/services/auth_service.py → authenticate() [line 15] ← found via synonym Called by: auth_service.py:login Calls: core/security.py:verify_password

→ Tests: backend/tests/test_auth.py (9 test cases) backend/tests/test_auth_security.py (5 test cases)

→ Patterns: Error handling: async def, raises exceptions Test style: integration (real DB)

Total: 1 tool call, <100ms, minimal context usage

Code Example

grep "login"15 results
Read file 1 → not relevant
Read file 2 → partially relevant
Read file 3this is the one
Read its imports → find dependency
Read test file → understand coverage
...

---

Tool calls to find relevant code:
  1. Grep "login"15 results
  2. Read backend/app/api/auth.py → found handler
  3. Read backend/app/services/auth_service.py → found service
  4. Read backend/app/services/auth_service.py imports → found dependency
  5. Grep "session"8 results
  6. Read backend/app/core/security.py → found verify_password
  7. Read tests/ directory listing → found test files
  8. Read backend/tests/test_auth.py → found tests
  9. Read frontend/src/pages/auth/LoginPage.tsx → found frontend caller
  10. Grep "timeout" → find config

Total: ~10 tool calls, ~30s, significant context window usage

---

probe_query("fix login timeout")

Primary matches:
  backend/app/api/auth.pylogin() [line 19]
    Called by: native/app/login.tsx:handleLogin, frontend/LoginPage.tsx:handleSubmit

  backend/app/services/auth_service.pyauthenticate() [line 15]  ← found via synonym
    Called by: auth_service.py:login
    Calls: core/security.py:verify_password

Tests:
  backend/tests/test_auth.py (9 test cases)
  backend/tests/test_auth_security.py (5 test cases)

Patterns:
  Error handling: async def, raises exceptions
  Test style: integration (real DB)

Total: 1 tool call, <100ms, minimal context usage
RAW_BUFFERClick to expand / collapse

Feature Proposal: Pre-computed codebase index for agent context

Problem

Claude Code spends ~50% of tool calls on exploration — reading files to understand what's relevant before it can start working. On a 284-file project, a typical task like "fix the login bug" generates this:

grep "login" → 15 results
Read file 1 → not relevant
Read file 2 → partially relevant
Read file 3 → this is the one
Read its imports → find dependency
Read test file → understand coverage
...

10-15 tool calls before writing a single line of code. This costs tokens, time, and often leads to missed context (the agent doesn't know what it'll break).

Data

I built probe, a proof-of-concept that pre-computes a function-level codebase index. Tested on two real projects:

Metricarchmap (63 files)appcelular (284 files)
Symbols indexed2602,148
Call edges resolved191870
Full index time1.5s1.8s
Incremental (no changes)0.2s

Before (current agent behavior)

Task: "fix the login timeout"

Tool calls to find relevant code:
  1. Grep "login" → 15 results
  2. Read backend/app/api/auth.py → found handler
  3. Read backend/app/services/auth_service.py → found service
  4. Read backend/app/services/auth_service.py imports → found dependency
  5. Grep "session" → 8 results
  6. Read backend/app/core/security.py → found verify_password
  7. Read tests/ directory listing → found test files
  8. Read backend/tests/test_auth.py → found tests
  9. Read frontend/src/pages/auth/LoginPage.tsx → found frontend caller
  10. Grep "timeout" → find config

Total: ~10 tool calls, ~30s, significant context window usage

After (with pre-computed index)

Same task, single query:

probe_query("fix login timeout")

→ Primary matches:
  backend/app/api/auth.py → login() [line 19]
    Called by: native/app/login.tsx:handleLogin, frontend/LoginPage.tsx:handleSubmit

  backend/app/services/auth_service.py → authenticate() [line 15]  ← found via synonym
    Called by: auth_service.py:login
    Calls: core/security.py:verify_password

→ Tests:
  backend/tests/test_auth.py (9 test cases)
  backend/tests/test_auth_security.py (5 test cases)

→ Patterns:
  Error handling: async def, raises exceptions
  Test style: integration (real DB)

Total: 1 tool call, <100ms, minimal context usage

10 tool calls → 1. Same information. More complete.

What the index provides that grep/read cannot

  1. Call graph at function level: handleLogin → login → authenticate → verify_password. Not inferable from imports alone — requires AST parsing of actual call sites.

  2. Impact analysis: "If you change authenticate(), these 4 functions break, these 2 test files cover it, and security.py co-changes 80% of the time." No way to get this from sequential file reads.

  3. Semantic search: Searching "login" also finds authenticate(), signin(), session() without an LLM — via static synonym mapping. Grep can't do this.

  4. Pattern detection: Before writing code, the agent knows the project uses snake_case, async def, integration tests, and specific error handling patterns. Currently requires reading 5-6 files to absorb.

How it works (no LLM, fully offline)

  1. Tree-sitter AST parses every file → extracts functions, classes, methods, types with signatures
  2. Call site extraction from AST → raw caller_name, callee_name pairs
  3. Import graph resolution → maps call sites to actual symbol IDs across files
  4. Method call resolution → resolves obj.method() via import-based type inference
  5. Git history analysis → co-change correlation between files
  6. Pattern extraction → naming conventions, error handling, test structure
  7. SQLite + FTS5 → instant queries with full-text search

Supports: TypeScript, JavaScript, Python, Go. Index updates incrementally (only re-parses changed files).

Proposal

Build this into Claude Code's core, not as an external tool:

  1. Auto-index on first interaction with a project (1-2s for most codebases)
  2. Auto-update via file watcher (already implemented in probe's MCP server)
  3. Consult before every task — the agent should never start reading files blindly when an index exists
  4. Impact check before every edit — the agent should know what it'll break

The index is the "codebase memory" that persists between conversations. CLAUDE.md is prose; the index is structured, queryable, always up-to-date.

Proof of concept

  • npm: npx probe-code index && npx probe-code query "your task"
  • MCP server: npx probe-mcp . (8 tools, auto-reindex)
  • GitHub: github.com/Matiasxth/probe
  • Programmatic API: import { queryCodebase, analyzeImpact } from 'probe-code'

~3,500 lines of TypeScript. 56 tests. MIT license.


Built by @matiasxth — full-stack developer and archmap author.

extent analysis

TL;DR

Integrate the pre-computed codebase index into Claude Code's core to reduce the number of tool calls and improve context understanding.

Guidance

  • Implement the proposed index using Tree-sitter AST, call site extraction, and import graph resolution to provide a comprehensive understanding of the codebase.
  • Integrate the index into Claude Code's core, enabling auto-indexing on first interaction and auto-updating via file watcher.
  • Consult the index before every task to avoid blind file reading and perform impact checks before every edit to anticipate potential breakages.
  • Utilize the provided proof of concept, including the probe-code and probe-mcp tools, to test and refine the integration.

Example

import { queryCodebase, analyzeImpact } from 'probe-code';

// Query the codebase for a specific task
const results = queryCodebase("fix login timeout");

// Analyze the impact of a potential edit
const impact = analyzeImpact("authenticate()", "login");

Notes

The proposed solution relies on the probe proof of concept, which has been tested on two real projects with promising results. However, further testing and refinement may be necessary to ensure seamless integration with Claude Code.

Recommendation

Apply the proposed workaround by integrating the pre-computed codebase index into Claude Code's core, as it has the potential to significantly improve the tool's efficiency and effectiveness. This approach allows for a more comprehensive understanding of the codebase, reducing the need for sequential file reads and improving context awareness.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING