langchain - 💡(How to fix) Fix Lazy instantiation for fallback models in ModelFallbackMiddleware

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  • No change to fallback ordering or exception propagation

Code Example

for model in all_models:
    self.models.append(init_chat_model(model) if isinstance(model, str) else model)

---

self._specs: tuple[str | BaseChatModel, ...] = (first_model, *additional_models)
self._cache: dict[int, BaseChatModel] = {}

def _resolve(self, idx: int) -> BaseChatModel:
    if idx not in self._cache:
        spec = self._specs[idx]
        self._cache[idx] = init_chat_model(spec) if isinstance(spec, str) else spec
    return self._cache[idx]
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a feature request, not a bug report or usage question.
  • I added a clear and descriptive title that summarizes the feature request.
  • I used the GitHub search to find a similar feature request and didn't find it.
  • I checked the LangChain documentation and API reference to see if this feature already exists.
  • This is not related to the langchain-community package.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Feature Description

ModelFallbackMiddleware.__init__ eagerly instantiates every fallback model via init_chat_model() at construction time, regardless of whether those models are ever triggered in practice.

File: libs/langchain_v1/langchain/agents/middleware/model_fallback.py Lines: ~65–72 (the for model in all_models: loop in __init__)

In a multi-agent pipeline with multiple LLM calls per query, when the primary model has high uptime and fallbacks are configured as insurance, this unnecessarily allocates HTTP clients and connection pools for every fallback — including ones that may never be invoked.

Use Case

I'm trying to build a multi-agent pipeline with 8–9 LLM calls per query. ModelFallbackMiddleware is configured with multiple fallback models as insurance against provider outages — but the primary model has >99% uptime, so fallbacks are rarely triggered.

Currently, I have to work around this by manually managing instantiation outside the middleware: I keep the middleware as a singleton with a dummy fallback spec, then instantiate the actual fallback model manually and inject it only when the primary fails. This defeats the purpose of having a clean fallback abstraction.

This feature would help me/users to reduce unnecessary memory and connection pool allocation in production multi-agent systems where fallbacks are configured defensively but rarely invoked.

Proposed Solution

Store specs in __init__, resolve and cache model instances on first use inside wrap_model_call / awrap_model_call. Zero change to public API or constructor signature.

Before (eager):

for model in all_models:
    self.models.append(init_chat_model(model) if isinstance(model, str) else model)

After (lazy + cached):

self._specs: tuple[str | BaseChatModel, ...] = (first_model, *additional_models)
self._cache: dict[int, BaseChatModel] = {}

def _resolve(self, idx: int) -> BaseChatModel:
    if idx not in self._cache:
        spec = self._specs[idx]
        self._cache[idx] = init_chat_model(spec) if isinstance(spec, str) else spec
    return self._cache[idx]

Why this is safe:

  • No change to constructor signature or public API
  • No change to fallback ordering or exception propagation
  • Cache lifetime equals middleware instance lifetime
  • BaseChatModel is stateless — safe to share across concurrent requests
  • BaseChatModel instances passed directly bypass instantiation as before

Alternatives Considered

Keep current eager instantiation. Acceptable if middleware is always used as a singleton, but wastes resources when fallback list is long and primary model rarely fails.

or

Manually instantiate fallback models outside the middleware and inject them only when the primary fails. This works but couples fallback logic into application code and defeats the purpose of having a clean middleware abstraction.

Additional Context

How I use AI: I am willing to implement this and open a PR once assigned.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING