langchain - ✅(Solved) Fix feat(langchain): add token usage tracking middleware [3 pull requests, 1 comments, 1 participants]

langchain2026-03-11 07:39:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#35752•Fetched 2026-04-08 00:24:43

View on GitHub

Comments

Participants

Timeline

Reactions

Author

prakhar-srivastavaa

Participants

prakhar-srivastavaa

Timeline (top)

cross-referenced ×3labeled ×3closed ×1commented ×1

Error Message

Supports exit_behavior="end" (graceful) and "error" (raises TokenBudgetExceededError)

Fix Action

Fixed

Fixed by PR: feat(langchain): add token usage tracking middleware (https://github.com/prakhar-srivastavaa/langchain/pull/1)
Fixed by PR: feat(langchain): add token usage tracking middleware (https://github.com/langchain-ai/langchain/pull/35751)
Fixed by PR: feat(langchain): add token usage tracking middleware (https://github.com/langchain-ai/langchain/pull/35753)

PR fix notes

PR #1: feat(langchain): add token usage tracking middleware

Repository: prakhar-srivastavaa/langchain
Author: prakhar-srivastavaa
State: closed | merged: True
Link: https://github.com/prakhar-srivastavaa/langchain/pull/1

Description (problem / solution / changelog)

Fixes #35752

Summary

Adds TokenUsageTrackingMiddleware — a new middleware that tracks cumulative token usage (input, output, total) across model calls and optionally enforces token budgets.

[rest of description...]

Changed files

libs/langchain_v1/langchain/agents/middleware/__init__.py (modified, +6/-0)
libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py (added, +314/-0)
libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py (added, +278/-0)

PR #35751: feat(langchain): add token usage tracking middleware

Repository: langchain-ai/langchain
Author: prakhar-srivastavaa
State: closed | merged: False
Link: https://github.com/langchain-ai/langchain/pull/35751

Description (problem / solution / changelog)

Fixes #

Read the full contributing guidelines: https://docs.langchain.com/oss/python/contributing/overview

All contributions must be in English. See the language policy.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

Thank you for contributing to LangChain! Follow these steps to have your pull request considered as ready for review.

PR title: Should follow the format: TYPE(SCOPE): DESCRIPTION

Examples:
- fix(anthropic): resolve flag parsing error
- feat(core): add multi-tenant support
- test(openai): update API usage tests
Allowed TYPE and SCOPE values: https://github.com/langchain-ai/langchain/blob/master/.github/workflows/pr_lint.yml#L15-L33

PR description:

Write 1-2 sentences summarizing the change.
The Fixes #xx line at the top is required for external contributions — update the issue number and keep the keyword. This links your PR to the approved issue and auto-closes it on merge.
If there are any breaking changes, please clearly describe them.
If this PR depends on another PR being merged first, please include "Depends on #PR_NUMBER" in the description.

Run make format, make lint and make test from the root of the package(s) you've modified.

We will not consider a PR unless these three are passing in CI.

How did you verify your code works?

Additional guidelines:

All external PRs must link to an issue or discussion where a solution has been approved by a maintainer, and you must be assigned to that issue. PRs without prior approval will be closed.
PRs should not touch more than one package unless absolutely necessary.
Do not update the uv.lock files or add dependencies to pyproject.toml files (even optional ones) unless you have explicit permission to do so by a maintainer.

Social handles (optional)

Twitter: @ LinkedIn: https://linkedin.com/in/

Changed files

libs/langchain_v1/langchain/agents/middleware/__init__.py (modified, +6/-0)
libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py (added, +314/-0)
libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py (added, +278/-0)

PR #35753: feat(langchain): add token usage tracking middleware

Repository: langchain-ai/langchain
Author: prakhar-srivastavaa
State: closed | merged: False
Link: https://github.com/langchain-ai/langchain/pull/35753

Description (problem / solution / changelog)

Fixes #35752

This PR adds a new Token Usage Tracking Middleware to LangChain's agent middleware system. Tracks cumulative token consumption (input, output, total) at thread and run levels with optional budget enforcement.

Why This Feature

Cost Monitoring: Track API costs in production
Budget Enforcement: Optional token limits per thread/run
Full Async Support: Works with both sync and async agent pipelines

Files Changed

libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py - Core middleware
libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py - 21 unit tests (all passing ✅)
libs/langchain_v1/langchain/agents/middleware/__init__.py - Updated exports

Testing

All 21 unit tests passing locally. Run:

make test
make lint
make format

## Changed files

- `libs/langchain_v1/langchain/agents/middleware/__init__.py` (modified, +6/-0)
- `libs/langchain_v1/langchain/agents/middleware/token_usage_tracking.py` (added, +314/-0)
- `libs/langchain_v1/tests/unit_tests/agents/middleware/implementations/test_token_usage_tracking.py` (added, +278/-0)

Code Example

from langchain.agents import create_agent
from langchain.agents.middleware import TokenUsageTrackingMiddleware

# Tracking only (observability, no limits)
agent = create_agent("openai:gpt-4o", middleware=[TokenUsageTrackingMiddleware()])

# With budget enforcement
agent = create_agent(
    "openai:gpt-4o",
    tools=[search],
    middleware=[TokenUsageTrackingMiddleware(run_budget=50000, exit_behavior="end")],
)

RAW_BUFFERClick to expand / collapse

Checked other resources

This is a feature request, not a bug report or usage question.
I added a clear and descriptive title that summarizes the feature request.
I used the GitHub search to find a similar feature request and didn't find it.
I checked the LangChain documentation and API reference to see if this feature already exists.
This is not related to the langchain-community package.

Package (Required)

Feature Description

Feature Request

Problem: The middleware system has call-count tracking (ModelCallLimitMiddleware) but no built-in way to track actual token consumption across agent runs. Users currently need custom code to aggregate usage_metadata from AIMessage responses for cost monitoring and budget enforcement.

Proposed Solution: A TokenUsageTrackingMiddleware that:

Extracts usage_metadata from AIMessage responses after each model call
Accumulates input_tokens, output_tokens, total_tokens at thread and run levels
Optionally enforces thread_budget / run_budget with configurable exit behavior
Follows the same patterns as ModelCallLimitMiddleware

Use Case: Production agents need token-level cost monitoring and budget caps. This is the most common observability gap in the current middleware offerings.

I have a working implementation with 21 passing unit tests and would like to contribute this.

Use Case

Production AI agents need token-level cost monitoring and budget enforcement.

Currently, users must write custom wrapper code to extract usage_metadata from every AIMessage, accumulate counts manually, and implement their own budget-check logic — even though the middleware system already supports this pattern for call counts (ModelCallLimitMiddleware).

This feature would let users:

Monitor token consumption across agent runs with zero custom code
Set hard token budgets to prevent runaway costs in agentic loops
Get thread-level (persistent) and run-level (per-invocation) breakdowns

Proposed Solution

A new TokenUsageTrackingMiddleware class following existing middleware patterns:

from langchain.agents import create_agent
from langchain.agents.middleware import TokenUsageTrackingMiddleware

# Tracking only (observability, no limits)
agent = create_agent("openai:gpt-4o", middleware=[TokenUsageTrackingMiddleware()])

# With budget enforcement
agent = create_agent(
    "openai:gpt-4o",
    tools=[search],
    middleware=[TokenUsageTrackingMiddleware(run_budget=50000, exit_behavior="end")],
)

Implementation details:

Uses after_model hook to extract usage_metadata from the latest AIMessage
Accumulates input_tokens, output_tokens, total_tokens at thread and run levels
Uses before_model hook to check budgets before each model call
Supports exit_behavior="end" (graceful) and "error" (raises TokenBudgetExceededError)
State uses UntrackedValue for run-level fields (same pattern as ModelCallLimitMiddleware)
Both sync and async variants implemented

I have a working implementation with 21 passing unit tests ready to contribute.

Alternatives Considered

Using callbacks/tracing (e.g. LangSmith) — tracks usage externally but cannot enforce budgets or stop agent execution mid-run.
Wrapping the model with a custom class — works but doesn't integrate with the middleware system and requires per-project boilerplate.
Extending ModelCallLimitMiddleware to also track tokens — possible but conflates two concerns (call counting vs. token tracking) and makes the API less clean.

A dedicated middleware is the cleanest approach since it follows the single-responsibility pattern of existing middleware (e.g. ModelRetryMiddleware, ModelFallbackMiddleware).

Additional Context

Related patterns in the codebase:

ModelCallLimitMiddleware (libs/langchain_v1/langchain/agents/middleware/model_call_limit.py) — same state tracking pattern with thread/run levels
SummarizationMiddleware — already reads usage_metadata for token-based triggers
langchain_core.messages.ai.UsageMetadata — the standard token usage TypedDict

This middleware fills the gap between call-count limiting and full observability platforms, giving users a lightweight built-in option for token budget enforcement.

extent analysis

Fix Plan

Step 1: Implement TokenUsageTrackingMiddleware

Create a new file token_usage_tracking.py with the following code:

from langchain.agents.middleware import Middleware
from langchain.agents import UntrackedValue
from langchain_core.messages.ai import UsageMetadata

class TokenUsageTrackingMiddleware(Middleware):
    def __init__(self, run_budget=None, thread_budget=None, exit_behavior="end"):
        self.run_budget = run_budget
        self.thread_budget = thread_budget
        self.exit_behavior = exit_behavior
        self.run_level_tokens = UntrackedValue("run_level_tokens")
        self.thread_level_tokens = UntrackedValue("thread_level_tokens")

    def before_model(self, agent):
        if self.run_budget and self.run_level_tokens.get() + self.thread_level_tokens.get() > self.run_budget:
            if self.exit_behavior == "end":
                raise Exception("Token budget exceeded")
            elif self.exit_behavior == "error":
                raise TokenBudgetExceededError("Token budget exceeded")
        if self.thread_budget and self.thread_level_tokens.get() > self.thread_budget:
            if self.exit_behavior == "end":
                raise Exception("Token budget exceeded")
            elif self.exit_behavior == "error":
                raise TokenBudgetExceededError("Token budget exceeded")

    def after_model(self, agent, response):
        usage_metadata = response.get("usage_metadata")
        if usage_metadata:
            input_tokens = usage_metadata.get("input_tokens")
            output_tokens = usage_metadata.get("output_tokens")
            total_tokens = input_tokens + output_tokens
            self.run_level_tokens.set(self.run_level_tokens.get() + total_tokens)
            self.thread_level_tokens.set(self.thread_level_tokens.get() + total_tokens)

Step 2: Update Agent Creation

Update the agent creation code to include the new middleware:

from langchain.agents import create_agent
from token_usage_tracking import TokenUsageTrackingMiddleware

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #agent execution #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

langchain - ✅(Solved) Fix feat(langchain): add token usage tracking middleware [3 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #1: feat(langchain): add token usage tracking middleware

Description (problem / solution / changelog)

Summary

Changed files

PR #35751: feat(langchain): add token usage tracking middleware

Description (problem / solution / changelog)

Social handles (optional)

Changed files

PR #35753: feat(langchain): add token usage tracking middleware

Description (problem / solution / changelog)

Why This Feature

Files Changed

Testing

Code Example

Checked other resources

Package (Required)

Feature Description

Feature Request

Use Case

Proposed Solution

Alternatives Considered

Additional Context

extent analysis

Fix Plan

Step 1: Implement TokenUsageTrackingMiddleware

Step 2: Update Agent Creation

Still need to ship something?

RELATED_DISCOVERY

TRENDING