litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tool calls fail in stateless mode — first request's lifespan teardown cancels all in-flight requests [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24522Fetched 2026-04-08 01:22:51
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×2closed ×1commented ×1

Root Cause

Root Cause In stateless mode, each incoming request triggers Server.run() (MCP SDK lowlevel/server.py:373), which enters the LiteLLM-defined lifespan context manager. The lifespan's finally block calls shutdown_session_managers(), which tears down the shared StreamableHTTPSessionManager and its TaskGroup — killing all other concurrent requests.

Code Example

@contextlib.asynccontextmanager
async def lifespan(app) -> AsyncIterator[None]:
    await initialize_session_managers()
    try:
        yield
    finally:
        await shutdown_session_managers()  # ← cancels all concurrent requests

---

mcp_servers:
  my_server:
    url: "http://my-mcp-server/mcp"
    transport: "http"
    allow_all_keys: true

---

API_KEY="sk-your-key-here"
URL="http://localhost:4000/mcp/"
for i in 1 2 3; do
  (
    curl -s --max-time 65 -X POST "$URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"my_tool\",\"arguments\":{\"query\":\"test $i\"}}}" \
      -o /dev/null -w "Call $i: HTTP %{http_code} in %{time_total}s\n" \
      || echo "Call $i: TIMED OUT"
  ) &
done
wait

---

# Client output: 1 succeeds, 2 time out
Call 3: HTTP 200 in 0.134s
Call 1: TIMED OUT after 65s
Call 2: TIMED OUT after 65s
# Upstream MCP server logs: all 3 tool calls arrive and complete within 145ms
15:17:37.443 - CallTool "test longwait 1"15:17:37.588 Found results (145ms)
15:17:37.444 - CallTool "test longwait 2"15:17:37.587 Found results (143ms)
15:17:37.445 - CallTool "test longwait 3"15:17:37.524 Found results (79ms)
# LiteLLM proxy logs: cancellations at 15:17:39 (2s after backend completed)
# and 15:18:37 (60s MCP client timeout on the already-dead sessions)
15:17:39 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When multiple MCP tools/call requests arrive concurrently at the /mcp/ endpoint (stateless mode), only the first to complete returns successfully. All other in-flight requests are cancelled. Sequential requests work fine.

Expected: All concurrent tool calls return successfully. Actual: 1 of N succeeds, the rest hang until the MCP client timeout (60s) and are cancelled.

Root Cause In stateless mode, each incoming request triggers Server.run() (MCP SDK lowlevel/server.py:373), which enters the LiteLLM-defined lifespan context manager. The lifespan's finally block calls shutdown_session_managers(), which tears down the shared StreamableHTTPSessionManager and its TaskGroup — killing all other concurrent requests.

litellm/proxy/_experimental/mcp_server/server.py:246-252:

@contextlib.asynccontextmanager
async def lifespan(app) -> AsyncIterator[None]:
    await initialize_session_managers()
    try:
        yield
    finally:
        await shutdown_session_managers()  # ← cancels all concurrent requests

The call chain:

  1. Each stateless request spawns run_stateless_server into the shared _task_group (streamable_http_manager.py:182)
  2. Each run_stateless_server calls self.app.run() → enters lifespan (server.py:246)
  3. initialize_session_managers() is a no-op after the first call (guarded by _SESSION_MANAGERS_INITIALIZED, server.py:208)
  4. The request is processed, the tool call completes on the upstream server
  5. The first request's Server.run() exits → lifespan finally → shutdown_session_managers() (server.py:226)
  6. shutdown_session_managers() calls _session_manager_cm.aexit() (server.py:235) → tg.cancel_scope.cancel() (streamable_http_manager.py:134)
  7. All tasks in the shared TaskGroup are cancelled, including other in-flight requests The shutdown also sets _SESSION_MANAGERS_INITIALIZED = False (server.py:243), so subsequent requests have to re-initialize.

Steps to Reproduce

  1. Configure LiteLLM with any MCP server (stateless Streamable HTTP):
mcp_servers:
  my_server:
    url: "http://my-mcp-server/mcp"
    transport: "http"
    allow_all_keys: true
  1. Send 3 concurrent tools/call requests:
API_KEY="sk-your-key-here"
URL="http://localhost:4000/mcp/"
for i in 1 2 3; do
  (
    curl -s --max-time 65 -X POST "$URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"my_tool\",\"arguments\":{\"query\":\"test $i\"}}}" \
      -o /dev/null -w "Call $i: HTTP %{http_code} in %{time_total}s\n" \
      || echo "Call $i: TIMED OUT"
  ) &
done
wait
  1. Observe: 1 call returns HTTP 200 in ~400ms, the other 2 hang for 60s+ then time out.

Relevant log output

# Client output: 1 succeeds, 2 time out
Call 3: HTTP 200 in 0.134s
Call 1: TIMED OUT after 65s
Call 2: TIMED OUT after 65s
# Upstream MCP server logs: all 3 tool calls arrive and complete within 145ms
15:17:37.443 - CallTool "test longwait 1"15:17:37.588 Found results (145ms)
15:17:37.444 - CallTool "test longwait 2"15:17:37.587 Found results (143ms)
15:17:37.445 - CallTool "test longwait 3"15:17:37.524 Found results (79ms)
# LiteLLM proxy logs: cancellations at 15:17:39 (2s after backend completed)
# and 15:18:37 (60s MCP client timeout on the already-dead sessions)
15:17:39 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled
15:18:37 - LiteLLM:WARNING: client.py:450 - MCP client tool call was cancelled

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

main-latest (ghcr.io/berriai/litellm:main-latest, pulled 2026-03-23)

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to prevent the shutdown_session_managers() function from cancelling all concurrent requests. We can achieve this by moving the shutdown_session_managers() call outside the lifespan context manager.

Code Changes

# litellm/proxy/_experimental/mcp_server/server.py

@contextlib.asynccontextmanager
async def lifespan(app) -> AsyncIterator[None]:
    await initialize_session_managers()
    try:
        yield
    finally:
        pass  # Remove shutdown_session_managers() from here

# Call shutdown_session_managers() when the server is stopped
async def on_shutdown(app):
    await shutdown_session_managers()

# Add on_shutdown to the app
app.on_shutdown.append(on_shutdown)

Configuration Changes

No configuration changes are required.

Verification

To verify the fix, run the same test as before:

API_KEY="sk-your-key-here"
URL="http://localhost:4000/mcp/"
for i in 1 2 3; do
  (
    curl -s --max-time 65 -X POST "$URL" \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"my_tool\",\"arguments\":{\"query\":\"test $i\"}}}" \
      -o /dev/null -w "Call $i: HTTP %{http_code} in %{time_total}s\n" \
      || echo "Call $i: TIMED OUT"
  ) &
done
wait

All three calls should now return successfully.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
  • Consider adding logging to track the shutdown of session managers to ensure that it is working as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tool calls fail in stateless mode — first request's lifespan teardown cancels all in-flight requests [1 comments, 1 participants]