langchain - 💡(How to fix) Fix [Feature] Eager tool dispatch middleware — overlap tool execution with model streaming

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Proposal: a built-in middleware (or option on create_agent) for eager tool dispatch — firing each tool the moment its tool_use block finishes streaming, while the rest of the assistant response is still being generated. This is strictly faster than today's parallel tool calling (which waits for the full response before firing any tool) and produces identical model outputs.

Same Tool definitions, same prompt, same model outputs — only the dispatch timing changes.

With eager dispatch, wall clock becomes max(stream, max(tool)) instead of stream + max(tool).

Code Example

from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from eager_tools_langgraph import eager_middleware

agent = create_agent(
    model=ChatAnthropic(model_name="claude-sonnet-4-5"),
    tools=[get_weather, get_stock_price, get_news],
    middleware=[eager_middleware(eager_tools)],
)
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package

  • langchain (specifically langchain.agents.create_agent + the middleware system)

Feature Description

Proposal: a built-in middleware (or option on create_agent) for eager tool dispatch — firing each tool the moment its tool_use block finishes streaming, while the rest of the assistant response is still being generated. This is strictly faster than today's parallel tool calling (which waits for the full response before firing any tool) and produces identical model outputs.

We've built and open-sourced a reference implementation as a LangGraph middleware: eager_middleware. It drops into the existing create_agent API with one line:

from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from eager_tools_langgraph import eager_middleware

agent = create_agent(
    model=ChatAnthropic(model_name="claude-sonnet-4-5"),
    tools=[get_weather, get_stock_price, get_news],
    middleware=[eager_middleware(eager_tools)],
)

Same Tool definitions, same prompt, same model outputs — only the dispatch timing changes.

Use Case

Today, create_agent already runs a single assistant message's tool calls in parallel — that moves the tool phase from sum-of-durations to max-of-durations. But the stream still has to finish before the tool phase begins. For agents with multi-second tool calls (HTTP fetches, DB queries, RAG, vector search, etc.), the model's stream and the tool execution are sequential, even though they could overlap.

With eager dispatch, wall clock becomes max(stream, max(tool)) instead of stream + max(tool).

Numbers:

  • Synthetic harness (16 workloads, 3–15 tools, deterministic): 1.20×–1.50× over parallel dispatch, median ~1.28×
  • CloudThinker production (real workloads with provider jitter): ~50% median latency cut

Full benchmark: https://github.com/cloudthinker-ai/eager-tools/blob/main/bench/results.md

This pattern is most valuable for users running agents with sub-second-to-multi-second tools — which describes most production agents today.

Proposed Solution

A middleware that:

  1. Adapts the model's streaming response (Anthropic + OpenAI today; extensible)
  2. Detects per-block seal events (a new tool_call_id in the stream signals the previous block is complete)
  3. Dispatches the completed tool to an executor pool while the stream continues
  4. Surfaces results back to the agent loop without changing the existing Tool contract

Our OSS implementation already exposes the seal mechanism, a gate callable for non-idempotent tools (HITL), and Tool.idempotent for blanket safety. Full mechanism doc: https://github.com/cloudthinker-ai/eager-tools/blob/main/METHOD.md

If the team is open to it, we'd be happy to:

  • Contribute the middleware (or a stripped-down core) upstream
  • Iterate on the contract to match the official middleware patterns
  • Co-author docs / examples

Alternatives Considered

  • Today's parallel tool calling: already provided by create_agent. Eager dispatch is an additive improvement on top of it — not a replacement.
  • Manual streaming + tool dispatch: users can implement this themselves by parsing the raw stream, but it requires forking the runner. A middleware keeps it transparent.
  • Out-of-band tool prefetch via prompt hints: brittle, not generalizable.

Additional Context

Happy to open a draft PR or RFC if the team is open to discussing an upstream path. Thanks for everything you build!

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - 💡(How to fix) Fix [Feature] Eager tool dispatch middleware — overlap tool execution with model streaming