langchain - ✅(Solved) Fix count_tokens_approximately: missing handler for tool_use content blocks causes ~2.4x overcounting [4 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#35558Fetched 2026-04-08 00:25:36
View on GitHub
Comments
2
Participants
3
Timeline
12
Reactions
0
Author
Assignees
Timeline (top)
cross-referenced ×5referenced ×3commented ×2assigned ×1

Error Message

In production (weside.ai), we measured 4.6x overcounting for real Anthropic conversation threads containing tool calls. The tiktoken cl100k_base approximation already overcounts Claude tokens (~1.9x), and the repr() fallback compounds the error to ~2.4x on top of that for tool_use blocks specifically.

Root Cause

In langchain_core/messages/utils.py around line 2281, the content block handler has cases for text, image_url, etc. but no case for tool_use / tool_result:

# Current behavior (simplified):
for item in content:
    if isinstance(item, str):
        ...
    elif item.get("type") == "text":
        total_chars += len(item.get("text", ""))
    else:
        total_chars += len(repr(item))  # ← tool_use falls here!

repr({"type": "tool_use", "id": "...", "name": "...", "input": {...}}) produces a Python dict repr with single quotes, True/False booleans, etc. — typically ~2.4x longer than the equivalent compact JSON for nested tool inputs.

Fix Action

Workaround

We implemented a normalization function in our codebase (_normalize_for_counting()) that pre-processes messages to extract only text content and normalize tool blocks to compact JSON before passing to any token counter. This prevents the repr() inflation.

PR fix notes

PR #35566: fix(core): use compact json for tool_use/tool_result in count_tokens_approximately

Description (problem / solution / changelog)

  • Added explicit handling for tool_use and tool_result list-content blocks.
  • Switched these blocks from repr(block) length counting to compact JSON length counting (json.dumps(..., separators=(",", ":"))), with a safe fallback to repr.
  • Added regression tests for both tool_use and tool_result counting.
  • Updated the existing list-content/tool-call test to assert the correct invariant after this normalization.

Fixes: #35558

Changed files

  • libs/core/langchain_core/messages/utils.py (modified, +12/-0)
  • libs/core/tests/unit_tests/messages/test_utils.py (modified, +65/-1)

PR #35568: fix(core): handle tool_use/tool_result blocks in count_tokens_approximately

Description (problem / solution / changelog)

Use json.dumps(separators=(",",":")) instead of repr() for tool_use and tool_result content blocks in count_tokens_approximately().

repr() produces Python-style output (single quotes, True/False) that inflates the character count by ~2.4x for nested tool inputs compared to the actual JSON representation used by providers.

Fixes #35558

Changes:

  • libs/core/langchain_core/messages/utils.py: Added explicit handlers for tool_use and tool_result block types using compact json.dumps
  • libs/core/tests/unit_tests/messages/test_utils.py: Added 2 new tests for tool_use/tool_result blocks, widened tolerance in existing test

Verification:

  • make test passes (156/156 tests in test_utils.py)
  • ruff check passes

Changed files

  • libs/core/langchain_core/messages/utils.py (modified, +12/-0)
  • libs/core/tests/unit_tests/messages/test_utils.py (modified, +49/-1)

PR #35650: fix: use compact JSON for tool_use blocks in count_tokens_approximately

Description (problem / solution / changelog)

Summary

Fixes #35558

count_tokens_approximately() uses repr() as a fallback for unknown content block types. For tool_use and tool_result blocks (Anthropic format), repr() produces Python dict representation with single quotes and verbose formatting, causing ~2.4x overcounting compared to actual token usage.

This change switches the fallback to json.dumps() with compact separators, which produces output much closer to what LLM APIs actually tokenize.

Impact

In production, the repr() overcounting compounded with tiktoken approximation to cause 4.6x overcounting for conversations with tool calls, leading to premature context summarization at only ~40% of actual token budget utilization.

Test plan

  • Verify count_tokens_approximately produces more accurate counts for messages with tool_use blocks
  • Verify existing tests pass

🤖 Generated with Claude Code

Changed files

  • libs/core/langchain_core/messages/utils.py (modified, +6/-2)

PR #35696: fix(core): add tool_use/tool_result handlers to count_tokens_approximately

Description (problem / solution / changelog)

Problem

count_tokens_approximately() has no handler for tool_use or tool_result content blocks (Anthropic format). These fall through to the else branch which calls repr(block), producing Python dict representations longer than compact JSON equivalents.

In production, this causes premature summarization — the system believes the context is full when only ~40% of the actual token budget is consumed.

Closes #35558

Root Cause

The content block handler in count_tokens_approximately has cases for text, image_url, etc. but no case for tool_use / tool_result:

else:
    total_chars += len(repr(item))  # ← tool_use falls here!

repr() produces Python dict representations with single quotes, True/False booleans, etc. — typically longer than equivalent compact JSON for nested tool inputs.

Fix

  • Added explicit handlers for tool_use, tool_result, and thinking content blocks
  • Changed the fallback for unknown dict blocks from repr() to json.dumps(separators=(',',':')) for more accurate character estimation
  • Changed tool_calls serialization (for string-content AI messages) from repr() to json.dumps() for consistency

Tests

Added 8 new tests verifying accurate token counting for:

  • tool_use blocks (simple and nested inputs)
  • tool_result blocks (string content, nested text+image blocks)
  • Multiple tool_use blocks in a single message
  • thinking blocks
  • Compact JSON produces fewer chars than repr()

Changed files

  • libs/core/langchain_core/messages/utils.py (modified, +36/-4)
  • libs/core/tests/unit_tests/messages/test_utils.py (modified, +161/-9)

Code Example

from langchain_core.messages import AIMessage
from langchain_core.messages.utils import count_tokens_approximately

# Message with tool_use content block (Anthropic format)
msg = AIMessage(content=[
    {
        "type": "tool_use",
        "id": "toolu_01AbCdEf",
        "name": "search_memories",
        "input": {"query": "recent events"},
    }
])

# Actual compact JSON would be ~90 chars ≈ ~23 tokens
# repr() produces: "{'type': 'tool_use', 'id': 'toolu_01AbCdEf', 'name': 'search_memories', 'input': {'query': 'recent events'}}"
# That's ~115 chars from repr vs ~90 for compact JSON — already worse, and scales badly with large inputs

approx = count_tokens_approximately(msg)
print(f"Approximated tokens: {approx}")
# Returns significantly more tokens than the actual content warrants

---

# Current behavior (simplified):
for item in content:
    if isinstance(item, str):
        ...
    elif item.get("type") == "text":
        total_chars += len(item.get("text", ""))
    else:
        total_chars += len(repr(item))  # ← tool_use falls here!

---

import json

elif item.get("type") in ("tool_use", "tool_result"):
    # Normalize to compact JSON to avoid repr() inflation
    total_chars += len(json.dumps(item, separators=(",", ":")))
RAW_BUFFERClick to expand / collapse

Bug Description

count_tokens_approximately() in langchain_core/messages/utils.py has no handler for tool_use content blocks (Anthropic format). These blocks fall through to the else branch which calls repr(block) — producing a Python object representation like "{'type': 'tool_use', 'id': '...', 'name': '...', 'input': {...}}" that is much longer than the actual JSON content.

Affected Version

langchain-core>=0.1.x (confirmed on 1.2.17)

Reproducer

from langchain_core.messages import AIMessage
from langchain_core.messages.utils import count_tokens_approximately

# Message with tool_use content block (Anthropic format)
msg = AIMessage(content=[
    {
        "type": "tool_use",
        "id": "toolu_01AbCdEf",
        "name": "search_memories",
        "input": {"query": "recent events"},
    }
])

# Actual compact JSON would be ~90 chars ≈ ~23 tokens
# repr() produces: "{'type': 'tool_use', 'id': 'toolu_01AbCdEf', 'name': 'search_memories', 'input': {'query': 'recent events'}}"
# That's ~115 chars from repr vs ~90 for compact JSON — already worse, and scales badly with large inputs

approx = count_tokens_approximately(msg)
print(f"Approximated tokens: {approx}")
# Returns significantly more tokens than the actual content warrants

Root Cause

In langchain_core/messages/utils.py around line 2281, the content block handler has cases for text, image_url, etc. but no case for tool_use / tool_result:

# Current behavior (simplified):
for item in content:
    if isinstance(item, str):
        ...
    elif item.get("type") == "text":
        total_chars += len(item.get("text", ""))
    else:
        total_chars += len(repr(item))  # ← tool_use falls here!

repr({"type": "tool_use", "id": "...", "name": "...", "input": {...}}) produces a Python dict repr with single quotes, True/False booleans, etc. — typically ~2.4x longer than the equivalent compact JSON for nested tool inputs.

Impact

In production (weside.ai), we measured 4.6x overcounting for real Anthropic conversation threads containing tool calls. The tiktoken cl100k_base approximation already overcounts Claude tokens (~1.9x), and the repr() fallback compounds the error to ~2.4x on top of that for tool_use blocks specifically.

This caused premature summarization — the system believed the context was full when only ~40% of the actual token budget was consumed.

Expected Behavior

tool_use and tool_result blocks should be normalized to compact JSON (or at minimum use json.dumps(item)) before measuring character length:

import json

elif item.get("type") in ("tool_use", "tool_result"):
    # Normalize to compact JSON to avoid repr() inflation
    total_chars += len(json.dumps(item, separators=(",", ":")))

Workaround

We implemented a normalization function in our codebase (_normalize_for_counting()) that pre-processes messages to extract only text content and normalize tool blocks to compact JSON before passing to any token counter. This prevents the repr() inflation.

Additional Notes

  • This issue also affects tool_result content blocks (which can contain nested content arrays with text and image items)
  • The fix should be consistent with how text blocks are handled — only count meaningful content, not Python object repr overhead

extent analysis

Fix Plan

Step 1: Update count_tokens_approximately() in langchain_core/messages/utils.py

import json

# ...

elif item.get("type") in ("tool_use", "tool_result"):
    # Normalize to compact JSON to avoid repr() inflation
    total_chars += len(json.dumps(item, separators=(",", ":")))

Step 2: Add tool_use and tool_result content block handling

# ...

elif item.get("type") == "text":
    total_chars += len(item.get("text", ""))
elif item.get("type") in ("tool_use", "tool_result"):
    # Handle tool_use and tool_result content blocks
    total_chars += len(json.dumps(item, separators=(",", ":")))
else:
    total_chars += len(repr(item))

Verification

  1. Run the reproducer code with the updated count_tokens_approximately() function.
  2. Verify that the approximated tokens count is accurate and not overcounting.
  3. Test with various input messages containing tool_use and tool_result content blocks.

Extra Tips

  • To prevent regressions, ensure that the fix is consistent with how text blocks are handled.
  • Consider adding additional logging or debugging statements to verify the correctness of the fix.
  • If you're using a CI/CD pipeline, add a test case to verify the fix and prevent regressions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - ✅(Solved) Fix count_tokens_approximately: missing handler for tool_use content blocks causes ~2.4x overcounting [4 pull requests, 2 comments, 3 participants]