litellm - 💡(How to fix) Fix [Bug]: MCP semantic tool filter crashloops at proxy startup with a large MCP server registry [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26155Fetched 2026-04-22 07:46:15
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Timeline (top)
labeled ×2renamed ×1

Error Message

build_router_from_mcp_registry() iterates every MCP server in the DB, fetches every tool from each, and constructs a single SemanticRouter(routes=[...], auto_sync="local") whose construction eagerly embeds every route's utterances in one shot. This is the only caller of this method in the codebase — grep confirms it runs exclusively at startup.

Two design gaps make this a hard crash-loop rather than "slow but eventually healthy":

  1. No persistence / no incremental build. Every pod restart re-fetches every tool and re-embeds every route from scratch. If one boot OOMs, every subsequent boot OOMs identically. There is no on-disk cache, no Redis cache, no "resume from last successful embedding" — the work is entirely re-done.
  2. Adding / removing MCP servers does not refresh the router. POST /v1/mcp/server (management_endpoints/mcp_management_endpoints.py:810-819) updates the DB and the global_mcp_server_manager registry but never calls build_router_from_mcp_registry() or any rebuild. So:
    • New tools are invisible to the semantic filter until the next proxy restart, which is also the next OOM.
    • A DB accumulated from previous registrations (dev/test leftovers, team onboarding, etc.) silently becomes a startup bomb.

Concrete numbers from my incident (GKE, 2 GiB container limit for litellm, qwen3-embedding-4b wrapped as embedding api):

  • ~15 mock MCP servers × ~500 synthetic tools ≈ 7,500 routes
  • Startup embedding pass exceeded the 2 GiB cgroup limit → exitCode: 137 within ~60s
  • 24+ restarts in 90 minutes, readiness never passing
  • Recovery required manually deleting LiteLLM_MCPServerTable / LiteLLM_ObjectPermissionTable rows so the registry was small enough to fit

Expected behavior

The semantic tool filter should be production-safe regardless of DB size:

  1. Build the router lazily (on first request per tool) or incrementally (batched, with a memory budget), not eagerly-all-at-once at startup.
  2. Persist embeddings (keyed by a hash of tool_name + description) so pod restarts are O(new-or-changed-tools), not O(all-registered-tools).
  3. Rebuild / augment the index when POST|PUT|DELETE /v1/mcp/server fires, so the in-memory state stays in sync without requiring a restart.
  4. Startup failure of the filter should degrade gracefully — return tools unfiltered and log — rather than blocking the proxy from ever becoming ready. (Currently initialize_from_config catches the exception, but the upstream OOM is at the OS level, so the except never runs.)

Steps to Reproduce

Steps to Reproduce

  1. config.yaml with semantic filter enabled:

Root Cause

Root cause is in SemanticToolFilterHook.initialize_from_config():

Code Example

`build_router_from_mcp_registry()` iterates **every** MCP server in the DB, fetches **every** tool from each, and constructs a single `SemanticRouter(routes=[...], auto_sync="local")` whose construction eagerly embeds every route's utterances in one shot. This is the only caller of this method in the codebase — `grep` confirms it runs exclusively at startup.

Two design gaps make this a hard crash-loop rather than "slow but eventually healthy":

1. **No persistence / no incremental build.** Every pod restart re-fetches every tool and re-embeds every route from scratch. If one boot OOMs, every subsequent boot OOMs identically. There is no on-disk cache, no Redis cache, no "resume from last successful embedding" — the work is entirely re-done.
2. **Adding / removing MCP servers does not refresh the router.** `POST /v1/mcp/server` (`management_endpoints/mcp_management_endpoints.py:810-819`) updates the DB and the `global_mcp_server_manager` registry but never calls `build_router_from_mcp_registry()` or any rebuild. So:
   - New tools are invisible to the semantic filter until the next proxy restart, which is also the next OOM.
   - A DB accumulated from previous registrations (dev/test leftovers, team onboarding, etc.) silently becomes a startup bomb.

**Concrete numbers from my incident** (GKE, 2 GiB container limit for litellm, `qwen3-embedding-4b` wrapped as embedding api):
- ~15 mock MCP servers × ~500 synthetic tools ≈ 7,500 routes
- Startup embedding pass exceeded the 2 GiB cgroup limit → `exitCode: 137` within ~60s
- 24+ restarts in 90 minutes, readiness never passing
- Recovery required manually deleting `LiteLLM_MCPServerTable` / `LiteLLM_ObjectPermissionTable` rows so the registry was small enough to fit

### Expected behavior

The semantic tool filter should be production-safe regardless of DB size:

1. Build the router **lazily** (on first request per tool) or **incrementally** (batched, with a memory budget), not eagerly-all-at-once at startup.
2. **Persist** embeddings (keyed by a hash of `tool_name` + `description`) so pod restarts are O(new-or-changed-tools), not O(all-registered-tools).
3. Rebuild / augment the index when `POST|PUT|DELETE /v1/mcp/server` fires, so the in-memory state stays in sync without requiring a restart.
4. Startup failure of the filter should degrade gracefully — return tools unfiltered and log — rather than blocking the proxy from ever becoming ready. (Currently `initialize_from_config` catches the exception, but the upstream OOM is at the OS level, so the `except` never runs.)


### Steps to Reproduce

### Steps to Reproduce

1. `config.yaml` with semantic filter enabled:

---

2. Register N MCP servers in the DB (e.g. via `POST /v1/mcp/server`) where each server exposes a large `tools/list`. With N × tools ≈ 5k+ the repro is reliable at a 2 GiB memory limit; the threshold scales with limit and embedding-vector size.
3. Restart the proxy. Observe `OOMKilled` (exit 137) within ~30–90s of boot, every time. Readiness never passes, and `/health/readiness` 503s.

Minimal reproduction without a mock server: register a single server whose `tools/list` returns many entries (2–3k) by pointing `url` at any HTTP endpoint that returns a long JSON-RPC `tools/list` response.


### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When litellm_settings.mcp_semantic_tool_filter.enabled: true and the DB has a non-trivial number of MCP servers registered (total tools in the low thousands), the proxy enters a deterministic crash loop on every boot and never becomes ready due to OOM or readiness probe even with reasonable time gap.

Root cause is in SemanticToolFilterHook.initialize_from_config():

            # Build router from MCP registry on startup
            await semantic_filter.build_router_from_mcp_registry()

build_router_from_mcp_registry() iterates every MCP server in the DB, fetches every tool from each, and constructs a single SemanticRouter(routes=[...], auto_sync="local") whose construction eagerly embeds every route's utterances in one shot. This is the only caller of this method in the codebase — grep confirms it runs exclusively at startup.

Two design gaps make this a hard crash-loop rather than "slow but eventually healthy":

  1. No persistence / no incremental build. Every pod restart re-fetches every tool and re-embeds every route from scratch. If one boot OOMs, every subsequent boot OOMs identically. There is no on-disk cache, no Redis cache, no "resume from last successful embedding" — the work is entirely re-done.
  2. Adding / removing MCP servers does not refresh the router. POST /v1/mcp/server (management_endpoints/mcp_management_endpoints.py:810-819) updates the DB and the global_mcp_server_manager registry but never calls build_router_from_mcp_registry() or any rebuild. So:
    • New tools are invisible to the semantic filter until the next proxy restart, which is also the next OOM.
    • A DB accumulated from previous registrations (dev/test leftovers, team onboarding, etc.) silently becomes a startup bomb.

Concrete numbers from my incident (GKE, 2 GiB container limit for litellm, qwen3-embedding-4b wrapped as embedding api):

  • ~15 mock MCP servers × ~500 synthetic tools ≈ 7,500 routes
  • Startup embedding pass exceeded the 2 GiB cgroup limit → exitCode: 137 within ~60s
  • 24+ restarts in 90 minutes, readiness never passing
  • Recovery required manually deleting LiteLLM_MCPServerTable / LiteLLM_ObjectPermissionTable rows so the registry was small enough to fit

Expected behavior

The semantic tool filter should be production-safe regardless of DB size:

  1. Build the router lazily (on first request per tool) or incrementally (batched, with a memory budget), not eagerly-all-at-once at startup.
  2. Persist embeddings (keyed by a hash of tool_name + description) so pod restarts are O(new-or-changed-tools), not O(all-registered-tools).
  3. Rebuild / augment the index when POST|PUT|DELETE /v1/mcp/server fires, so the in-memory state stays in sync without requiring a restart.
  4. Startup failure of the filter should degrade gracefully — return tools unfiltered and log — rather than blocking the proxy from ever becoming ready. (Currently initialize_from_config catches the exception, but the upstream OOM is at the OS level, so the except never runs.)

Steps to Reproduce

Steps to Reproduce

  1. config.yaml with semantic filter enabled:
    litellm_settings:
      mcp_semantic_tool_filter:
        enabled: true
        embedding_model: <any embedding model registered in model_list>
        top_k: 5
        similarity_threshold: 0.4
  2. Register N MCP servers in the DB (e.g. via POST /v1/mcp/server) where each server exposes a large tools/list. With N × tools ≈ 5k+ the repro is reliable at a 2 GiB memory limit; the threshold scales with limit and embedding-vector size.
  3. Restart the proxy. Observe OOMKilled (exit 137) within ~30–90s of boot, every time. Readiness never passes, and /health/readiness 503s.

Minimal reproduction without a mock server: register a single server whose tools/list returns many entries (≥2–3k) by pointing url at any HTTP endpoint that returns a long JSON-RPC tools/list response.

Relevant log output

# Pod events
Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

# kubectl get pods (representative)
NAME                        READY   STATUS             RESTARTS          AGE
litellm-xxxxxxxxxx-xxxxx    1/2     CrashLoopBackOff   24 (3m40s ago)    92m

# litellm boot logs, repeating on every restart
INFO  Initialized SemanticToolFilterHook with filter: enabled=True, top_k=5
INFO  Building semantic router from MCP registry: <N> servers
INFO  Fetched <~7500> tools total
<SIGKILL — no further log lines; container restarts>

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.0

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The likely fix involves modifying the build_router_from_mcp_registry method to build the router lazily or incrementally, persisting embeddings to avoid rebuilding on every restart, and updating the router when MCP servers are added or removed.

Guidance

  • Modify the SemanticToolFilterHook.initialize_from_config method to build the router lazily, fetching tools only when needed, rather than eagerly at startup.
  • Implement a persistence mechanism, such as a cache or database, to store embeddings and avoid rebuilding them on every restart.
  • Update the build_router_from_mcp_registry method to rebuild or augment the index when MCP servers are added, removed, or updated, to keep the in-memory state in sync.
  • Consider implementing a fallback mechanism to return tools unfiltered and log an error if the filter fails to initialize, rather than blocking the proxy from becoming ready.

Example

# Example of lazy building
async def build_router_from_mcp_registry(self):
    # Fetch tools only when needed
    tools = await self.fetch_tools_from_mcp_servers()
    # Build router incrementally
    for tool in tools:
        # Add tool to router
        self.semantic_filter.add_tool(tool)

# Example of persistence
async def build_router_from_mcp_registry(self):
    # Load persisted embeddings
    embeddings = await self.load_persisted_embeddings()
    # Build router from persisted embeddings
    self.semantic_filter.build_router(embeddings)

Notes

The provided solution is based on the assumption that the build_router_from_mcp_registry method is the primary cause of the OOM issue. However, further investigation may be necessary to confirm this and identify any other contributing factors.

Recommendation

Apply a workaround by modifying the build_router_from_mcp_registry method to build the router lazily or incrementally, and persisting embeddings to avoid rebuilding on every restart. This should help mitigate the OOM issue and allow the proxy to become ready.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The semantic tool filter should be production-safe regardless of DB size:

  1. Build the router lazily (on first request per tool) or incrementally (batched, with a memory budget), not eagerly-all-at-once at startup.
  2. Persist embeddings (keyed by a hash of tool_name + description) so pod restarts are O(new-or-changed-tools), not O(all-registered-tools).
  3. Rebuild / augment the index when POST|PUT|DELETE /v1/mcp/server fires, so the in-memory state stays in sync without requiring a restart.
  4. Startup failure of the filter should degrade gracefully — return tools unfiltered and log — rather than blocking the proxy from ever becoming ready. (Currently initialize_from_config catches the exception, but the upstream OOM is at the OS level, so the except never runs.)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING