litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tool calls fail in stateless mode — first request's lifespan teardown cancels all in-flight requests [1 comments, 1 participants]

Root Cause

Root Cause In stateless mode, each incoming request triggers Server.run() (MCP SDK lowlevel/server.py:373), which enters the LiteLLM-defined lifespan context manager. The lifespan's finally block calls shutdown_session_managers(), which tears down the shared StreamableHTTPSessionManager and its TaskGroup — killing all other concurrent requests.

Code Example

@contextlib.asynccontextmanager
async def lifespan(app) -> AsyncIterator[None]:
    await initialize_session_managers()
    try:
        yield
    finally:
        await shutdown_session_managers()  # ← cancels all concurrent requests

---

mcp_servers:
  my_server:
    url: "http://my-mcp-server/mcp"
    transport: "http"
    allow_all_keys: true

---

API_KEY="sk-your-key-here"
URL="http://localhost:4000/mcp/"
for i in 1 2 3; do
  (
    curl -s --max-time 65 -X POST "$URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"my_tool\",\"arguments\":{\"query\":\"test $i\"}}}" \
      -o /dev/null -w "Call $i: HTTP %{http_code} in %{time_total}s\n" \
      || echo "Call $i: TIMED OUT"
  ) &
done
wait

---

# Client output: 1 succeeds, 2 time out
Call 3: HTTP 200 in 0.134s
Call 1: TIMED OUT after 65s
Call 2: TIMED OUT after 65s
# Upstream MCP server logs: all 3 tool calls arrive and complete within 145ms
15:17:37.443 - CallTool "test longwait 1" → 15:17:37.588 Found results (145ms)
15:17:37.444 - CallTool "test longwait 2" → 15:17:37.587 Found results (143ms)
15:17:37.445 - CallTool "test longwait 3" → 15:17:37.524 Found results (79ms)
# LiteLLM proxy logs: cancellations at 15:17:39 (2s after backend completed)
# and 15:18:37 (60s MCP client timeout on the already-dead sessions)
15:17:39 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When multiple MCP tools/call requests arrive concurrently at the /mcp/ endpoint (stateless mode), only the first to complete returns successfully. All other in-flight requests are cancelled. Sequential requests work fine.

Expected: All concurrent tool calls return successfully. Actual: 1 of N succeeds, the rest hang until the MCP client timeout (60s) and are cancelled.

litellm/proxy/_experimental/mcp_server/server.py:246-252:

@contextlib.asynccontextmanager
async def lifespan(app) -> AsyncIterator[None]:
    await initialize_session_managers()
    try:
        yield
    finally:
        await shutdown_session_managers()  # ← cancels all concurrent requests

The call chain:

Each stateless request spawns run_stateless_server into the shared _task_group (streamable_http_manager.py:182)
Each run_stateless_server calls self.app.run() → enters lifespan (server.py:246)
initialize_session_managers() is a no-op after the first call (guarded by _SESSION_MANAGERS_INITIALIZED, server.py:208)
The request is processed, the tool call completes on the upstream server
The first request's Server.run() exits → lifespan finally → shutdown_session_managers() (server.py:226)
shutdown_session_managers() calls _session_manager_cm.aexit() (server.py:235) → tg.cancel_scope.cancel() (streamable_http_manager.py:134)
All tasks in the shared TaskGroup are cancelled, including other in-flight requests The shutdown also sets _SESSION_MANAGERS_INITIALIZED = False (server.py:243), so subsequent requests have to re-initialize.

Steps to Reproduce

Configure LiteLLM with any MCP server (stateless Streamable HTTP):

mcp_servers:
  my_server:
    url: "http://my-mcp-server/mcp"
    transport: "http"
    allow_all_keys: true

Send 3 concurrent tools/call requests:

API_KEY="sk-your-key-here"
URL="http://localhost:4000/mcp/"
for i in 1 2 3; do
  (
    curl -s --max-time 65 -X POST "$URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"my_tool\",\"arguments\":{\"query\":\"test $i\"}}}" \
      -o /dev/null -w "Call $i: HTTP %{http_code} in %{time_total}s\n" \
      || echo "Call $i: TIMED OUT"
  ) &
done
wait

Observe: 1 call returns HTTP 200 in ~400ms, the other 2 hang for 60s+ then time out.

Relevant log output

# Client output: 1 succeeds, 2 time out
Call 3: HTTP 200 in 0.134s
Call 1: TIMED OUT after 65s
Call 2: TIMED OUT after 65s
# Upstream MCP server logs: all 3 tool calls arrive and complete within 145ms
15:17:37.443 - CallTool "test longwait 1" → 15:17:37.588 Found results (145ms)
15:17:37.444 - CallTool "test longwait 2" → 15:17:37.587 Found results (143ms)
15:17:37.445 - CallTool "test longwait 3" → 15:17:37.524 Found results (79ms)
# LiteLLM proxy logs: cancellations at 15:17:39 (2s after backend completed)
# and 15:18:37 (60s MCP client timeout on the already-dead sessions)
15:17:39 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

main-latest (ghcr.io/berriai/litellm:main-latest, pulled 2026-03-23)

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to prevent the shutdown_session_managers() function from cancelling all concurrent requests. We can achieve this by moving the shutdown_session_managers() call outside the lifespan context manager.

Code Changes

# litellm/proxy/_experimental/mcp_server/server.py

@contextlib.asynccontextmanager
async def lifespan(app) -> AsyncIterator[None]:
    await initialize_session_managers()
    try:
        yield
    finally:
        pass  # Remove shutdown_session_managers() from here

# Call shutdown_session_managers() when the server is stopped
async def on_shutdown(app):
    await shutdown_session_managers()

# Add on_shutdown to the app
app.on_shutdown.append(on_shutdown)

Configuration Changes

No configuration changes are required.

Verification

To verify the fix, run the same test as before:

API_KEY="sk-your-key-here"
URL="http://localhost:4000/mcp/"
for i in 1 2 3; do
  (
    curl -s --max-time 65 -X POST "$URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"my_tool\",\"arguments\":{\"query\":\"test $i\"}}}" \
      -o /dev/null -w "Call $i: HTTP %{http_code} in %{time_total}s\n" \
      || echo "Call $i: TIMED OUT"
  ) &
done
wait

All three calls should now return successfully.

Extra Tips

Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
Consider adding logging to track the shutdown of session managers to ensure that it is working as expected.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tool calls fail in stateless mode — first request's lifespan teardown cancels all in-flight requests [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Code Changes

Configuration Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tool calls fail in stateless mode — first request's lifespan teardown cancels all in-flight requests [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Code Changes

Configuration Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING