langchain - 💡(How to fix) Fix [Feature] Eager tool dispatch middleware — overlap tool execution with model streaming

Fix Action

Fix / Workaround

Proposal: a built-in middleware (or option on create_agent) for eager tool dispatch — firing each tool the moment its tool_use block finishes streaming, while the rest of the assistant response is still being generated. This is strictly faster than today's parallel tool calling (which waits for the full response before firing any tool) and produces identical model outputs.

Same Tool definitions, same prompt, same model outputs — only the dispatch timing changes.

With eager dispatch, wall clock becomes max(stream, max(tool)) instead of stream + max(tool).

Code Example

from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from eager_tools_langgraph import eager_middleware

agent = create_agent(
    model=ChatAnthropic(model_name="claude-sonnet-4-5"),
    tools=[get_weather, get_stock_price, get_news],
    middleware=[eager_middleware(eager_tools)],
)

Submission checklist

This is a feature request, not a bug report or usage question.
I added a clear and descriptive title that summarizes the feature request.
I used the GitHub search to find a similar feature request and didn't find it.
I checked the LangChain documentation and API reference to see if this feature already exists.
This is not related to the langchain-community package.

Package

langchain (specifically langchain.agents.create_agent + the middleware system)

Feature Description

We've built and open-sourced a reference implementation as a LangGraph middleware: eager_middleware. It drops into the existing create_agent API with one line:

from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from eager_tools_langgraph import eager_middleware

agent = create_agent(
    model=ChatAnthropic(model_name="claude-sonnet-4-5"),
    tools=[get_weather, get_stock_price, get_news],
    middleware=[eager_middleware(eager_tools)],
)

Same Tool definitions, same prompt, same model outputs — only the dispatch timing changes.

Use Case

Today, create_agent already runs a single assistant message's tool calls in parallel — that moves the tool phase from sum-of-durations to max-of-durations. But the stream still has to finish before the tool phase begins. For agents with multi-second tool calls (HTTP fetches, DB queries, RAG, vector search, etc.), the model's stream and the tool execution are sequential, even though they could overlap.

With eager dispatch, wall clock becomes max(stream, max(tool)) instead of stream + max(tool).

Numbers:

Synthetic harness (16 workloads, 3–15 tools, deterministic): 1.20×–1.50× over parallel dispatch, median ~1.28×
CloudThinker production (real workloads with provider jitter): ~50% median latency cut

Full benchmark: https://github.com/cloudthinker-ai/eager-tools/blob/main/bench/results.md

This pattern is most valuable for users running agents with sub-second-to-multi-second tools — which describes most production agents today.

Proposed Solution

A middleware that:

Adapts the model's streaming response (Anthropic + OpenAI today; extensible)
Detects per-block seal events (a new tool_call_id in the stream signals the previous block is complete)
Dispatches the completed tool to an executor pool while the stream continues
Surfaces results back to the agent loop without changing the existing Tool contract

Our OSS implementation already exposes the seal mechanism, a gate callable for non-idempotent tools (HITL), and Tool.idempotent for blanket safety. Full mechanism doc: https://github.com/cloudthinker-ai/eager-tools/blob/main/METHOD.md

If the team is open to it, we'd be happy to:

Contribute the middleware (or a stripped-down core) upstream
Iterate on the contract to match the official middleware patterns
Co-author docs / examples

Alternatives Considered

Today's parallel tool calling: already provided by create_agent. Eager dispatch is an additive improvement on top of it — not a replacement.
Manual streaming + tool dispatch: users can implement this themselves by parsing the raw stream, but it requires forking the runner. A middleware keeps it transparent.
Out-of-band tool prefetch via prompt hints: brittle, not generalizable.

Additional Context

Blog post with the full production story: https://cloudthinker.io/blogs/eager-tool-calling-50-percent-faster-agents
Repo (MIT): https://github.com/cloudthinker-ai/eager-tools
"When NOT to use it" doc (sub-50ms tools, sequentially dependent tools, non-idempotent tools, non-streaming backends): https://github.com/cloudthinker-ai/eager-tools/blob/main/docs/when-not-to-use.md

Happy to open a draft PR or RFC if the team is open to discussing an upstream path. Thanks for everything you build!

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix [Feature] Eager tool dispatch middleware — overlap tool execution with model streaming

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Code Example

Submission checklist

Package

Feature Description

Use Case

Proposed Solution

Alternatives Considered

Additional Context

Still need to ship something?

TRENDING

langchain - 💡(How to fix) Fix [Feature] Eager tool dispatch middleware — overlap tool execution with model streaming

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Code Example

Submission checklist

Package

Feature Description

Use Case

Proposed Solution

Alternatives Considered

Additional Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING