langchain - 💡(How to fix) Fix Lazy instantiation for fallback models in ModelFallbackMiddleware

langchain2026-05-23 00:48:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

No change to fallback ordering or exception propagation

Code Example

for model in all_models:
    self.models.append(init_chat_model(model) if isinstance(model, str) else model)

---

self._specs: tuple[str | BaseChatModel, ...] = (first_model, *additional_models)
self._cache: dict[int, BaseChatModel] = {}

def _resolve(self, idx: int) -> BaseChatModel:
    if idx not in self._cache:
        spec = self._specs[idx]
        self._cache[idx] = init_chat_model(spec) if isinstance(spec, str) else spec
    return self._cache[idx]

RAW_BUFFERClick to expand / collapse

Submission checklist

This is a feature request, not a bug report or usage question.
I added a clear and descriptive title that summarizes the feature request.
I used the GitHub search to find a similar feature request and didn't find it.
I checked the LangChain documentation and API reference to see if this feature already exists.
This is not related to the langchain-community package.

Package (Required)

Feature Description

ModelFallbackMiddleware.__init__ eagerly instantiates every fallback model via init_chat_model() at construction time, regardless of whether those models are ever triggered in practice.

File: libs/langchain_v1/langchain/agents/middleware/model_fallback.py Lines: ~65–72 (the for model in all_models: loop in __init__)

In a multi-agent pipeline with multiple LLM calls per query, when the primary model has high uptime and fallbacks are configured as insurance, this unnecessarily allocates HTTP clients and connection pools for every fallback — including ones that may never be invoked.

Use Case

I'm trying to build a multi-agent pipeline with 8–9 LLM calls per query. ModelFallbackMiddleware is configured with multiple fallback models as insurance against provider outages — but the primary model has >99% uptime, so fallbacks are rarely triggered.

Currently, I have to work around this by manually managing instantiation outside the middleware: I keep the middleware as a singleton with a dummy fallback spec, then instantiate the actual fallback model manually and inject it only when the primary fails. This defeats the purpose of having a clean fallback abstraction.

This feature would help me/users to reduce unnecessary memory and connection pool allocation in production multi-agent systems where fallbacks are configured defensively but rarely invoked.

Proposed Solution

Store specs in __init__, resolve and cache model instances on first use inside wrap_model_call / awrap_model_call. Zero change to public API or constructor signature.

Before (eager):

for model in all_models:
    self.models.append(init_chat_model(model) if isinstance(model, str) else model)

After (lazy + cached):

self._specs: tuple[str | BaseChatModel, ...] = (first_model, *additional_models)
self._cache: dict[int, BaseChatModel] = {}

def _resolve(self, idx: int) -> BaseChatModel:
    if idx not in self._cache:
        spec = self._specs[idx]
        self._cache[idx] = init_chat_model(spec) if isinstance(spec, str) else spec
    return self._cache[idx]

Why this is safe:

No change to constructor signature or public API
No change to fallback ordering or exception propagation
Cache lifetime equals middleware instance lifetime
BaseChatModel is stateless — safe to share across concurrent requests
BaseChatModel instances passed directly bypass instantiation as before

Alternatives Considered

Keep current eager instantiation. Acceptable if middleware is always used as a singleton, but wastes resources when fallback list is long and primary model rarely fails.

Manually instantiate fallback models outside the middleware and inject them only when the primary fails. This works but couples fallback logic into application code and defeats the purpose of having a clean middleware abstraction.

Additional Context

How I use AI: I am willing to implement this and open a PR once assigned.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering