langchain - 💡(How to fix) Fix langchain-openrouter: ChatOpenRouter creates fresh httpx clients per instantiation (no default-client caching)

ChatOpenRouter.__init__ constructs a brand-new pair of httpx.Client / httpx.AsyncClient for every instance, with no caching at the base level. Under any usage pattern that instantiates the model per-request (LangGraph factory graphs, FastAPI dependency injection that returns a fresh model, etc.), this leaks TLS keep-alive sockets and httpx pool state to openrouter.ai for the lifetime of the unclosed pools.

This is the same class of bug that was fixed for AzureChatOpenAI in #32489 / PR #32531 — langchain_openai.BaseChatOpenAI uses _get_default_httpx_client / _get_default_async_httpx_client from _client_utils to cache a single default httpx client per (base_url, timeout). langchain-openrouter does not.

Root Cause

Expected (with proper default-client caching): socket count stays roughly constant. Observed: socket count grows linearly with instance count and does not return to baseline after del + gc.collect(), because the httpx pools embedded in each discarded instance retain their TLS connections until interpreter exit.

Code Example

if extra_headers:
    import httpx

    client_kwargs["client"] = httpx.Client(
        headers=extra_headers, follow_redirects=True
    )
    client_kwargs["async_client"] = httpx.AsyncClient(
        headers=extra_headers, follow_redirects=True
    )

---

import gc, os, asyncio
from langchain_openrouter import ChatOpenRouter

os.environ.setdefault("OPENROUTER_API_KEY", "sk-or-...")  # any valid key

def count_sockets():
    import psutil
    p = psutil.Process()
    return sum(1 for c in p.net_connections(kind="tcp") if c.status == "ESTABLISHED")

async def main():
    print("before:", count_sockets())
    for i in range(20):
        m = ChatOpenRouter(model="openai/gpt-4o-mini")
        await m.ainvoke("hi")
        del m
    gc.collect()
    print("after 20 fresh instances:", count_sockets())

asyncio.run(main())

Summary

Source

Package: langchain-openrouter==0.2.3 (latest), Python 3.13.

Relevant code in langchain_openrouter/chat_models.py (approximate line numbers from installed wheel, around lines 390–397):

if extra_headers:
    import httpx

    client_kwargs["client"] = httpx.Client(
        headers=extra_headers, follow_redirects=True
    )
    client_kwargs["async_client"] = httpx.AsyncClient(
        headers=extra_headers, follow_redirects=True
    )

These clients are then passed into openrouter.OpenRouter(**client_kwargs). When extra_headers is empty, the underlying openrouter SDK still constructs its own httpx clients per instance, so the leak path exists in both branches.

There is no _get_default_*_httpx_client style caching, no aclose() on instance teardown, and no shared module-level client.

Minimal reproduction

import gc, os, asyncio
from langchain_openrouter import ChatOpenRouter

os.environ.setdefault("OPENROUTER_API_KEY", "sk-or-...")  # any valid key

def count_sockets():
    import psutil
    p = psutil.Process()
    return sum(1 for c in p.net_connections(kind="tcp") if c.status == "ESTABLISHED")

async def main():
    print("before:", count_sockets())
    for i in range(20):
        m = ChatOpenRouter(model="openai/gpt-4o-mini")
        await m.ainvoke("hi")
        del m
    gc.collect()
    print("after 20 fresh instances:", count_sockets())

asyncio.run(main())

Why this matters in practice

LangGraph "factory graphs" (the graph constructor takes a RunnableConfig and is invoked per request — documented behavior in aegra / langgraph-server) result in a fresh ChatOpenRouter per request unless the caller wraps the factory in a manual cache. With streaming responses pinning connections (see openai/openai-python#763) and connection pools sized for a single client, this hits resource limits in production well before users hit any model rate limit.

Suggested fix

Mirror PR #32531:

Import _get_default_httpx_client / _get_default_async_httpx_client from langchain_openai._client_utils (or replicate in langchain_openrouter).
In the model-validator path that currently constructs httpx.Client(...) / httpx.AsyncClient(...), fall through to the cached helper when the user hasn't provided http_client / http_async_client explicitly.
Expose http_client / http_async_client as constructor kwargs so callers who do want a custom pool can pass one in (matching ChatOpenAI's public surface).

Happy to send a PR mirroring #32531 if a maintainer can confirm this is the desired direction.

Environment

langchain-openrouter==0.2.3
httpx>=0.27
Python 3.13
Linux (also reproduces on macOS)

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix langchain-openrouter: ChatOpenRouter creates fresh httpx clients per instantiation (no default-client caching)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Source

Minimal reproduction

Why this matters in practice

Suggested fix

Environment

Still need to ship something?

TRENDING

langchain - 💡(How to fix) Fix langchain-openrouter: ChatOpenRouter creates fresh httpx clients per instantiation (no default-client caching)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Source

Minimal reproduction

Why this matters in practice

Suggested fix

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING