langchain - ✅(Solved) Fix `_get_approximate_token_counter` doesn't recognize ChatAnthropicVertex, causing SummarizationMiddleware to never trigger [2 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36318Fetched 2026-04-08 01:40:56
View on GitHub
Comments
2
Participants
1
Timeline
12
Reactions
0
Participants
Assignees
Timeline (top)
commented ×2cross-referenced ×2referenced ×2assigned ×1

_get_approximate_token_counter in langchain.agents.middleware.summarization uses an exact match model._llm_type == "anthropic-chat" to detect Anthropic models and apply the correct chars_per_token=3.3. But ChatAnthropicVertex (Claude via Vertex AI) returns _llm_type = "anthropic-chat-vertexai", so it falls through to the default 4.0 chars/token.

This underestimates token count by ~16%. When used with SummarizationMiddleware, the trigger threshold (85% of 200K = 170K) is never reached according to the estimate, but the actual prompt is already past 200K. The API rejects it.

Two additional safety mechanisms are also inactive for Vertex AI:

  • use_usage_metadata_scaling in count_tokens_approximately requires response_metadata["model_provider"] to be set on AI messages. ChatAnthropicVertex never sets this (unlike ChatAnthropic which sets "anthropic"). This is a langchain-google-vertexai issue.
  • _should_summarize_based_on_reported_tokens has the same dependency and is equally inactive.

Suggested fix: Change the check to model._llm_type.startswith("anthropic-chat") to cover both ChatAnthropic and ChatAnthropicVertex.

Error Message

BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 202018 tokens > 200000 maximum'}}

Root Cause

_get_approximate_token_counter in langchain.agents.middleware.summarization uses an exact match model._llm_type == "anthropic-chat" to detect Anthropic models and apply the correct chars_per_token=3.3. But ChatAnthropicVertex (Claude via Vertex AI) returns _llm_type = "anthropic-chat-vertexai", so it falls through to the default 4.0 chars/token.

This underestimates token count by ~16%. When used with SummarizationMiddleware, the trigger threshold (85% of 200K = 170K) is never reached according to the estimate, but the actual prompt is already past 200K. The API rejects it.

Two additional safety mechanisms are also inactive for Vertex AI:

  • use_usage_metadata_scaling in count_tokens_approximately requires response_metadata["model_provider"] to be set on AI messages. ChatAnthropicVertex never sets this (unlike ChatAnthropic which sets "anthropic"). This is a langchain-google-vertexai issue.
  • _should_summarize_based_on_reported_tokens has the same dependency and is equally inactive.

Suggested fix: Change the check to model._llm_type.startswith("anthropic-chat") to cover both ChatAnthropic and ChatAnthropicVertex.

Fix Action

Fix / Workaround

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I provided a self-contained, minimal, reproducible example that a maintainer can copy and run AS IS, including all necessary imports and data.

PR fix notes

PR #36319: fix: recognize ChatAnthropicVertex in _get_approximate_token_counter

Description (problem / solution / changelog)

(No description)

Changed files

  • libs/langchain_v1/langchain/agents/middleware/summarization.py (modified, +2/-2)

PR #36320: fix(langchain): recognize ChatAnthropicVertex in _get_approximate_token_counter

Description (problem / solution / changelog)

Summary

_get_approximate_token_counter uses model._llm_type == "anthropic-chat" to detect Anthropic models and apply chars_per_token=3.3. But ChatAnthropicVertex (Claude via Vertex AI) returns _llm_type = "anthropic-chat-vertexai", so the check fails and the default 4.0 chars/token is used.

This underestimates token count by ~16%. When used with SummarizationMiddleware, the trigger threshold (e.g. 85% of 200K = 170K) is never reached according to the estimate, but the actual prompt is already past 200K. The API rejects it with prompt is too long.

Change

== "anthropic-chat".startswith("anthropic-chat")

This covers both ChatAnthropic ("anthropic-chat") and ChatAnthropicVertex ("anthropic-chat-vertexai").

Reproduction

Full standalone script with API calls: https://gist.github.com/Jordanh1996/af56156da6cfab82917215dd340f76ac

Fixes #36318


This contribution was made with AI assistance (Claude).

Changed files

  • libs/langchain_v1/langchain/agents/middleware/summarization.py (modified, +1/-1)

Code Example

from functools import partial
from langchain.agents.middleware.summarization import _get_approximate_token_counter
from langchain_core.messages import HumanMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_google_vertexai.model_garden import ChatAnthropicVertex

model = ChatAnthropicVertex(
    model_name="claude-haiku-4-5",
    project="your-project",
    location="us-east5",
    max_output_tokens=256,
    profile={"max_input_tokens": 200_000},
)

# The bug: _get_approximate_token_counter checks model._llm_type == "anthropic-chat"
# but ChatAnthropicVertex returns "anthropic-chat-vertexai"
print(f"model._llm_type: {model._llm_type!r}")  # 'anthropic-chat-vertexai'
print(f"Match: {model._llm_type == 'anthropic-chat'}")  # False

# Result: counter uses default 4.0 chars/token instead of 3.3
counter = _get_approximate_token_counter(model)
correct = partial(count_tokens_approximately, chars_per_token=3.3)

msg = [HumanMessage(content="x" * 600_000)]
print(f"Middleware estimate: {counter(msg)}")   # ~150K
print(f"Correct estimate:   {correct(msg)}")    # ~182K

---

BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 202018 tokens > 200000 maximum'}}

---

langchain: 1.2.10
langchain_core: 1.2.16
langchain_google_vertexai: 3.2.2
langchain_anthropic: 1.3.4
Python: 3.11.8
OS: Darwin (macOS ARM64)
RAW_BUFFERClick to expand / collapse

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I provided a self-contained, minimal, reproducible example that a maintainer can copy and run AS IS, including all necessary imports and data.

Example Code

from functools import partial
from langchain.agents.middleware.summarization import _get_approximate_token_counter
from langchain_core.messages import HumanMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_google_vertexai.model_garden import ChatAnthropicVertex

model = ChatAnthropicVertex(
    model_name="claude-haiku-4-5",
    project="your-project",
    location="us-east5",
    max_output_tokens=256,
    profile={"max_input_tokens": 200_000},
)

# The bug: _get_approximate_token_counter checks model._llm_type == "anthropic-chat"
# but ChatAnthropicVertex returns "anthropic-chat-vertexai"
print(f"model._llm_type: {model._llm_type!r}")  # 'anthropic-chat-vertexai'
print(f"Match: {model._llm_type == 'anthropic-chat'}")  # False

# Result: counter uses default 4.0 chars/token instead of 3.3
counter = _get_approximate_token_counter(model)
correct = partial(count_tokens_approximately, chars_per_token=3.3)

msg = [HumanMessage(content="x" * 600_000)]
print(f"Middleware estimate: {counter(msg)}")   # ~150K
print(f"Correct estimate:   {correct(msg)}")    # ~182K

Full standalone reproduction script with API calls demonstrating the rejection: https://gist.github.com/Jordanh1996/af56156da6cfab82917215dd340f76ac

Error Message and Stack Trace (if applicable)

BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 202018 tokens > 200000 maximum'}}

Description

_get_approximate_token_counter in langchain.agents.middleware.summarization uses an exact match model._llm_type == "anthropic-chat" to detect Anthropic models and apply the correct chars_per_token=3.3. But ChatAnthropicVertex (Claude via Vertex AI) returns _llm_type = "anthropic-chat-vertexai", so it falls through to the default 4.0 chars/token.

This underestimates token count by ~16%. When used with SummarizationMiddleware, the trigger threshold (85% of 200K = 170K) is never reached according to the estimate, but the actual prompt is already past 200K. The API rejects it.

Two additional safety mechanisms are also inactive for Vertex AI:

  • use_usage_metadata_scaling in count_tokens_approximately requires response_metadata["model_provider"] to be set on AI messages. ChatAnthropicVertex never sets this (unlike ChatAnthropic which sets "anthropic"). This is a langchain-google-vertexai issue.
  • _should_summarize_based_on_reported_tokens has the same dependency and is equally inactive.

Suggested fix: Change the check to model._llm_type.startswith("anthropic-chat") to cover both ChatAnthropic and ChatAnthropicVertex.

System Info

langchain: 1.2.10
langchain_core: 1.2.16
langchain_google_vertexai: 3.2.2
langchain_anthropic: 1.3.4
Python: 3.11.8
OS: Darwin (macOS ARM64)

extent analysis

Fix Plan

To resolve the issue, we need to modify the _get_approximate_token_counter function in langchain.agents.middleware.summarization to correctly identify Anthropic models, including those from Vertex AI.

  • Update the condition in _get_approximate_token_counter to use startswith instead of exact match:
if model._llm_type.startswith("anthropic-chat"):
    return partial(count_tokens_approximately, chars_per_token=3.3)

This change will ensure that both ChatAnthropic and ChatAnthropicVertex models are correctly identified and the accurate chars_per_token value is used.

Verification

To verify the fix, run the provided example code and check that the estimated token count matches the correct estimate:

counter = _get_approximate_token_counter(model)
msg = [HumanMessage(content="x" * 600_000)]
print(f"Middleware estimate: {counter(msg)}")   # ~182K

The estimated token count should now match the correct estimate of ~182K.

Extra Tips

  • Make sure to update the langchain package to the latest version after applying the fix.
  • Consider submitting a pull request to the langchain repository to include this fix in future releases.
  • If you encounter similar issues with other models or providers, check the _llm_type attribute and adjust the condition accordingly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING