langchain - ✅(Solved) Fix feat(langchain): add token usage tracking middleware [3 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#35752Fetched 2026-04-08 00:24:43
View on GitHub
Comments
1
Participants
1
Timeline
9
Reactions
0
Timeline (top)
cross-referenced ×3labeled ×3closed ×1commented ×1

Error Message

  • Supports exit_behavior="end" (graceful) and "error" (raises TokenBudgetExceededError)

Fix Action

Fixed

PR fix notes

PR #1: feat(langchain): add token usage tracking middleware

Description (problem / solution / changelog)

Fixes #35752

Summary

Adds TokenUsageTrackingMiddleware — a new middleware that tracks cumulative token usage (input, output, total) across model calls and optionally enforces token budgets.

[rest of description...]

Changed files

  • libs/langchain_v1/langchain/agents/middleware/__init__.py (modified, +6/-0)
  • libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py (added, +314/-0)
  • libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py (added, +278/-0)

PR #35751: feat(langchain): add token usage tracking middleware

Description (problem / solution / changelog)

Fixes #

<!-- Replace everything above this line with a 1-2 sentence description of your change. Keep the "Fixes #xx" keyword and update the issue number. -->

Read the full contributing guidelines: https://docs.langchain.com/oss/python/contributing/overview

All contributions must be in English. See the language policy.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

Thank you for contributing to LangChain! Follow these steps to have your pull request considered as ready for review.

  1. PR title: Should follow the format: TYPE(SCOPE): DESCRIPTION
  1. PR description:
  • Write 1-2 sentences summarizing the change.
  • The Fixes #xx line at the top is required for external contributions — update the issue number and keep the keyword. This links your PR to the approved issue and auto-closes it on merge.
  • If there are any breaking changes, please clearly describe them.
  • If this PR depends on another PR being merged first, please include "Depends on #PR_NUMBER" in the description.
  1. Run make format, make lint and make test from the root of the package(s) you've modified.
  • We will not consider a PR unless these three are passing in CI.
  1. How did you verify your code works?

Additional guidelines:

  • All external PRs must link to an issue or discussion where a solution has been approved by a maintainer, and you must be assigned to that issue. PRs without prior approval will be closed.
  • PRs should not touch more than one package unless absolutely necessary.
  • Do not update the uv.lock files or add dependencies to pyproject.toml files (even optional ones) unless you have explicit permission to do so by a maintainer.

Social handles (optional)

<!-- If you'd like a shoutout on release, add your socials below -->

Twitter: @ LinkedIn: https://linkedin.com/in/

Changed files

  • libs/langchain_v1/langchain/agents/middleware/__init__.py (modified, +6/-0)
  • libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py (added, +314/-0)
  • libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py (added, +278/-0)

PR #35753: feat(langchain): add token usage tracking middleware

Description (problem / solution / changelog)

Fixes #35752

This PR adds a new Token Usage Tracking Middleware to LangChain's agent middleware system. Tracks cumulative token consumption (input, output, total) at thread and run levels with optional budget enforcement.

Why This Feature

  • Cost Monitoring: Track API costs in production
  • Budget Enforcement: Optional token limits per thread/run
  • Full Async Support: Works with both sync and async agent pipelines

Files Changed

  • libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py - Core middleware
  • libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py - 21 unit tests (all passing ✅)
  • libs/langchain_v1/langchain/agents/middleware/__init__.py - Updated exports

Testing

All 21 unit tests passing locally. Run:

make test
make lint
make format

## Changed files

- `libs/langchain_v1/langchain/agents/middleware/__init__.py` (modified, +6/-0)
- `libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py` (added, +314/-0)
- `libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py` (added, +278/-0)

Code Example

from langchain.agents import create_agent
from langchain.agents.middleware import TokenUsageTrackingMiddleware

# Tracking only (observability, no limits)
agent = create_agent("openai:gpt-4o", middleware=[TokenUsageTrackingMiddleware()])

# With budget enforcement
agent = create_agent(
    "openai:gpt-4o",
    tools=[search],
    middleware=[TokenUsageTrackingMiddleware(run_budget=50000, exit_behavior="end")],
)
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

Feature Request

Problem: The middleware system has call-count tracking (ModelCallLimitMiddleware) but no built-in way to track actual token consumption across agent runs. Users currently need custom code to aggregate usage_metadata from AIMessage responses for cost monitoring and budget enforcement.

Proposed Solution: A TokenUsageTrackingMiddleware that:

  • Extracts usage_metadata from AIMessage responses after each model call
  • Accumulates input_tokens, output_tokens, total_tokens at thread and run levels
  • Optionally enforces thread_budget / run_budget with configurable exit behavior
  • Follows the same patterns as ModelCallLimitMiddleware

Use Case: Production agents need token-level cost monitoring and budget caps. This is the most common observability gap in the current middleware offerings.

I have a working implementation with 21 passing unit tests and would like to contribute this.

Use Case

Production AI agents need token-level cost monitoring and budget enforcement.

Currently, users must write custom wrapper code to extract usage_metadata from every AIMessage, accumulate counts manually, and implement their own budget-check logic — even though the middleware system already supports this pattern for call counts (ModelCallLimitMiddleware).

This feature would let users:

  • Monitor token consumption across agent runs with zero custom code
  • Set hard token budgets to prevent runaway costs in agentic loops
  • Get thread-level (persistent) and run-level (per-invocation) breakdowns

Proposed Solution

A new TokenUsageTrackingMiddleware class following existing middleware patterns:

from langchain.agents import create_agent
from langchain.agents.middleware import TokenUsageTrackingMiddleware

# Tracking only (observability, no limits)
agent = create_agent("openai:gpt-4o", middleware=[TokenUsageTrackingMiddleware()])

# With budget enforcement
agent = create_agent(
    "openai:gpt-4o",
    tools=[search],
    middleware=[TokenUsageTrackingMiddleware(run_budget=50000, exit_behavior="end")],
)

Implementation details:

  • Uses after_model hook to extract usage_metadata from the latest AIMessage
  • Accumulates input_tokens, output_tokens, total_tokens at thread and run levels
  • Uses before_model hook to check budgets before each model call
  • Supports exit_behavior="end" (graceful) and "error" (raises TokenBudgetExceededError)
  • State uses UntrackedValue for run-level fields (same pattern as ModelCallLimitMiddleware)
  • Both sync and async variants implemented

I have a working implementation with 21 passing unit tests ready to contribute.

Alternatives Considered

  1. Using callbacks/tracing (e.g. LangSmith) — tracks usage externally but cannot enforce budgets or stop agent execution mid-run.

  2. Wrapping the model with a custom class — works but doesn't integrate with the middleware system and requires per-project boilerplate.

  3. Extending ModelCallLimitMiddleware to also track tokens — possible but conflates two concerns (call counting vs. token tracking) and makes the API less clean.

A dedicated middleware is the cleanest approach since it follows the single-responsibility pattern of existing middleware (e.g. ModelRetryMiddleware, ModelFallbackMiddleware).

Additional Context

Related patterns in the codebase:

  • ModelCallLimitMiddleware (libs/langchain_v1/langchain/agents/middleware/model_call_limit.py) — same state tracking pattern with thread/run levels
  • SummarizationMiddleware — already reads usage_metadata for token-based triggers
  • langchain_core.messages.ai.UsageMetadata — the standard token usage TypedDict

This middleware fills the gap between call-count limiting and full observability platforms, giving users a lightweight built-in option for token budget enforcement.

extent analysis

Fix Plan

Step 1: Implement TokenUsageTrackingMiddleware

Create a new file token_usage_tracking.py with the following code:

from langchain.agents.middleware import Middleware
from langchain.agents import UntrackedValue
from langchain_core.messages.ai import UsageMetadata

class TokenUsageTrackingMiddleware(Middleware):
    def __init__(self, run_budget=None, thread_budget=None, exit_behavior="end"):
        self.run_budget = run_budget
        self.thread_budget = thread_budget
        self.exit_behavior = exit_behavior
        self.run_level_tokens = UntrackedValue("run_level_tokens")
        self.thread_level_tokens = UntrackedValue("thread_level_tokens")

    def before_model(self, agent):
        if self.run_budget and self.run_level_tokens.get() + self.thread_level_tokens.get() > self.run_budget:
            if self.exit_behavior == "end":
                raise Exception("Token budget exceeded")
            elif self.exit_behavior == "error":
                raise TokenBudgetExceededError("Token budget exceeded")
        if self.thread_budget and self.thread_level_tokens.get() > self.thread_budget:
            if self.exit_behavior == "end":
                raise Exception("Token budget exceeded")
            elif self.exit_behavior == "error":
                raise TokenBudgetExceededError("Token budget exceeded")

    def after_model(self, agent, response):
        usage_metadata = response.get("usage_metadata")
        if usage_metadata:
            input_tokens = usage_metadata.get("input_tokens")
            output_tokens = usage_metadata.get("output_tokens")
            total_tokens = input_tokens + output_tokens
            self.run_level_tokens.set(self.run_level_tokens.get() + total_tokens)
            self.thread_level_tokens.set(self.thread_level_tokens.get() + total_tokens)

Step 2: Update Agent Creation

Update the agent creation code to include the new middleware:

from langchain.agents import create_agent
from token_usage_tracking import TokenUsageTrackingMiddleware

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING