litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tools/call requests on same StreamableHTTP session hang — only 1 of N completes [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24581Fetched 2026-04-08 01:32:33
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×2commented ×1

Error Message

#!/usr/bin/env python3 """ Reproducer: concurrent MCP tools/call on same StreamableHTTP session. Sequential calls succeed; concurrent calls hang until timeout. Usage: pip install httpx python3 repro.py <LITELLM_URL> <API_KEY> <TOOL_NAME> [TOOL_ARGS_JSON] """ import asyncio import json import sys import time import httpx BASE_URL = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:4000" API_KEY = sys.argv[2] if len(sys.argv) > 2 else "sk-1234" TOOL_NAME = sys.argv[3] if len(sys.argv) > 3 else "search" TOOL_ARGS = json.loads(sys.argv[4]) if len(sys.argv) > 4 else {"query": "test"} MCP_URL = f"{BASE_URL}/mcp/" CALL_TIMEOUT = 15 def parse_sse(text): for line in text.split("\n"): if line.startswith("data: "): return json.loads(line[6:]) return json.loads(text) async def mcp_post(client, headers, payload): resp = await asyncio.wait_for( client.post(MCP_URL, json=payload, headers=headers, timeout=CALL_TIMEOUT), timeout=CALL_TIMEOUT + 2, ) if resp.status_code == 202 or not resp.text.strip(): return {}, resp ct = resp.headers.get("content-type", "") if "text/event-stream" in ct: return parse_sse(resp.text), resp return resp.json(), resp async def init_session(client, headers): result, resp = await mcp_post(client, headers, { "jsonrpc": "2.0", "id": 0, "method": "initialize", "params": { "protocolVersion": "2025-03-26", "capabilities": {}, "clientInfo": {"name": "bug-reproducer", "version": "1.0"}, }, }) sid = resp.headers.get("mcp-session-id", "") if sid: headers["mcp-session-id"] = sid await mcp_post(client, headers, { "jsonrpc": "2.0", "method": "notifications/initialized", }) return sid async def call_tool(client, headers, request_id, label): start = time.monotonic() try: result, _ = await mcp_post(client, headers, { "jsonrpc": "2.0", "id": request_id, "method": "tools/call", "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS}, }) elapsed = time.monotonic() - start if "error" in result: return label, "ERROR", elapsed, f"code={result['error'].get('code')}" if "result" in result: is_err = result["result"].get("isError", False) return label, "MCP_ERROR" if is_err else "OK", elapsed, "" return label, "UNEXPECTED", elapsed, str(result)[:80] except (asyncio.TimeoutError, httpx.ReadTimeout): return label, "TIMEOUT", time.monotonic() - start, "" except Exception as e: return label, type(e).name, time.monotonic() - start, str(e)[:60] async def main(): print(f"Target: {MCP_URL} Tool: {TOOL_NAME} Timeout: {CALL_TIMEOUT}s") print() base_headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", "Accept": "application/json, text/event-stream", } async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client: print("[1] Sequential (3 calls, same session)") h = {**base_headers} await init_session(client, h) for i in range(3): label, status, elapsed, detail = await call_tool(client, h, i + 10, f"seq-{i+1}") mark = "✓" if status == "OK" else "✗" print(f" {mark} {label}: {status} ({elapsed:.1f}s) {detail}") async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client: print("\n[2] Concurrent (3 calls, same session)") h = {**base_headers} await init_session(client, h) tasks = [ asyncio.create_task(call_tool(client, h, i + 20, f"par-{i+1}")) for i in range(3) ] for label, status, elapsed, detail in await asyncio.gather(*tasks): mark = "✓" if status == "OK" else "✗" print(f" {mark} {label}: {status} ({elapsed:.1f}s) {detail}") if name == "main": asyncio.run(main())

Code Example

#!/usr/bin/env python3
"""
Reproducer: concurrent MCP tools/call on same StreamableHTTP session.
Sequential calls succeed; concurrent calls hang until timeout.
Usage:
    pip install httpx
    python3 repro.py <LITELLM_URL> <API_KEY> <TOOL_NAME> [TOOL_ARGS_JSON]
"""
import asyncio
import json
import sys
import time
import httpx
BASE_URL = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:4000"
API_KEY = sys.argv[2] if len(sys.argv) > 2 else "sk-1234"
TOOL_NAME = sys.argv[3] if len(sys.argv) > 3 else "search"
TOOL_ARGS = json.loads(sys.argv[4]) if len(sys.argv) > 4 else {"query": "test"}
MCP_URL = f"{BASE_URL}/mcp/"
CALL_TIMEOUT = 15
def parse_sse(text):
    for line in text.split("\n"):
        if line.startswith("data: "):
            return json.loads(line[6:])
    return json.loads(text)
async def mcp_post(client, headers, payload):
    resp = await asyncio.wait_for(
        client.post(MCP_URL, json=payload, headers=headers, timeout=CALL_TIMEOUT),
        timeout=CALL_TIMEOUT + 2,
    )
    if resp.status_code == 202 or not resp.text.strip():
        return {}, resp
    ct = resp.headers.get("content-type", "")
    if "text/event-stream" in ct:
        return parse_sse(resp.text), resp
    return resp.json(), resp
async def init_session(client, headers):
    result, resp = await mcp_post(client, headers, {
        "jsonrpc": "2.0", "id": 0, "method": "initialize",
        "params": {
            "protocolVersion": "2025-03-26",
            "capabilities": {},
            "clientInfo": {"name": "bug-reproducer", "version": "1.0"},
        },
    })
    sid = resp.headers.get("mcp-session-id", "")
    if sid:
        headers["mcp-session-id"] = sid
    await mcp_post(client, headers, {
        "jsonrpc": "2.0", "method": "notifications/initialized",
    })
    return sid
async def call_tool(client, headers, request_id, label):
    start = time.monotonic()
    try:
        result, _ = await mcp_post(client, headers, {
            "jsonrpc": "2.0", "id": request_id, "method": "tools/call",
            "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS},
        })
        elapsed = time.monotonic() - start
        if "error" in result:
            return label, "ERROR", elapsed, f"code={result['error'].get('code')}"
        if "result" in result:
            is_err = result["result"].get("isError", False)
            return label, "MCP_ERROR" if is_err else "OK", elapsed, ""
        return label, "UNEXPECTED", elapsed, str(result)[:80]
    except (asyncio.TimeoutError, httpx.ReadTimeout):
        return label, "TIMEOUT", time.monotonic() - start, ""
    except Exception as e:
        return label, type(e).__name__, time.monotonic() - start, str(e)[:60]
async def main():
    print(f"Target: {MCP_URL}  Tool: {TOOL_NAME}  Timeout: {CALL_TIMEOUT}s")
    print()
    base_headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
        "Accept": "application/json, text/event-stream",
    }
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("[1] Sequential (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        for i in range(3):
            label, status, elapsed, detail = await call_tool(client, h, i + 10, f"seq-{i+1}")
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("\n[2] Concurrent (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        tasks = [
            asyncio.create_task(call_tool(client, h, i + 20, f"par-{i+1}"))
            for i in range(3)
        ]
        for label, status, elapsed, detail in await asyncio.gather(*tasks):
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
if __name__ == "__main__":
    asyncio.run(main())

---

[1] Sequential (3 calls, same session)
    ✓ seq-1: OK (0.3s)
    ✓ seq-2: OK (0.2s)
    ✓ seq-3: OK (0.3s)
[2] Concurrent (3 calls, same session)
    ✗ par-1: TIMEOUT (17.0s)
    ✓ par-2: OK (0.3s)
    ✗ par-3: TIMEOUT (17.0s)
Server logs after 60s:
client.py:450 - MCP client tool call was cancelled
client.py:450 - MCP client tool call was cancelled
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Concurrent tools/call JSON-RPC requests within a single StreamableHTTP session (POST /mcp/) fail. Only 1 call completes- the rest hang until MCP_CLIENT_TIMEOUT (60s default). The server logs "MCP client tool call was cancelled". Sequential calls on the same session work fine. This is the failure mode for MCP clients that maintaining a persistent session and sending parallel tool calls (e.g. LibreChat).

Steps to Reproduce

  1. Have any MCP server registered in LiteLLM (the tool call needs to take at least ~100ms to hit the bug — instant/cached responses won't trigger it)
  2. pip install httpx
  3. Save the script below as repro.py and run: python3 repro.py http://localhost:4000 sk-YOUR_KEY your_tool_name '{"arg": "value"}'
#!/usr/bin/env python3
"""
Reproducer: concurrent MCP tools/call on same StreamableHTTP session.
Sequential calls succeed; concurrent calls hang until timeout.
Usage:
    pip install httpx
    python3 repro.py <LITELLM_URL> <API_KEY> <TOOL_NAME> [TOOL_ARGS_JSON]
"""
import asyncio
import json
import sys
import time
import httpx
BASE_URL = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:4000"
API_KEY = sys.argv[2] if len(sys.argv) > 2 else "sk-1234"
TOOL_NAME = sys.argv[3] if len(sys.argv) > 3 else "search"
TOOL_ARGS = json.loads(sys.argv[4]) if len(sys.argv) > 4 else {"query": "test"}
MCP_URL = f"{BASE_URL}/mcp/"
CALL_TIMEOUT = 15
def parse_sse(text):
    for line in text.split("\n"):
        if line.startswith("data: "):
            return json.loads(line[6:])
    return json.loads(text)
async def mcp_post(client, headers, payload):
    resp = await asyncio.wait_for(
        client.post(MCP_URL, json=payload, headers=headers, timeout=CALL_TIMEOUT),
        timeout=CALL_TIMEOUT + 2,
    )
    if resp.status_code == 202 or not resp.text.strip():
        return {}, resp
    ct = resp.headers.get("content-type", "")
    if "text/event-stream" in ct:
        return parse_sse(resp.text), resp
    return resp.json(), resp
async def init_session(client, headers):
    result, resp = await mcp_post(client, headers, {
        "jsonrpc": "2.0", "id": 0, "method": "initialize",
        "params": {
            "protocolVersion": "2025-03-26",
            "capabilities": {},
            "clientInfo": {"name": "bug-reproducer", "version": "1.0"},
        },
    })
    sid = resp.headers.get("mcp-session-id", "")
    if sid:
        headers["mcp-session-id"] = sid
    await mcp_post(client, headers, {
        "jsonrpc": "2.0", "method": "notifications/initialized",
    })
    return sid
async def call_tool(client, headers, request_id, label):
    start = time.monotonic()
    try:
        result, _ = await mcp_post(client, headers, {
            "jsonrpc": "2.0", "id": request_id, "method": "tools/call",
            "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS},
        })
        elapsed = time.monotonic() - start
        if "error" in result:
            return label, "ERROR", elapsed, f"code={result['error'].get('code')}"
        if "result" in result:
            is_err = result["result"].get("isError", False)
            return label, "MCP_ERROR" if is_err else "OK", elapsed, ""
        return label, "UNEXPECTED", elapsed, str(result)[:80]
    except (asyncio.TimeoutError, httpx.ReadTimeout):
        return label, "TIMEOUT", time.monotonic() - start, ""
    except Exception as e:
        return label, type(e).__name__, time.monotonic() - start, str(e)[:60]
async def main():
    print(f"Target: {MCP_URL}  Tool: {TOOL_NAME}  Timeout: {CALL_TIMEOUT}s")
    print()
    base_headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
        "Accept": "application/json, text/event-stream",
    }
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("[1] Sequential (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        for i in range(3):
            label, status, elapsed, detail = await call_tool(client, h, i + 10, f"seq-{i+1}")
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("\n[2] Concurrent (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        tasks = [
            asyncio.create_task(call_tool(client, h, i + 20, f"par-{i+1}"))
            for i in range(3)
        ]
        for label, status, elapsed, detail in await asyncio.gather(*tasks):
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
if __name__ == "__main__":
    asyncio.run(main())

Relevant log output

[1] Sequential (3 calls, same session)
    ✓ seq-1: OK (0.3s)
    ✓ seq-2: OK (0.2s)
    ✓ seq-3: OK (0.3s)
[2] Concurrent (3 calls, same session)
    ✗ par-1: TIMEOUT (17.0s)
    ✓ par-2: OK (0.3s)
    ✗ par-3: TIMEOUT (17.0s)
Server logs after 60s:
client.py:450 - MCP client tool call was cancelled
client.py:450 - MCP client tool call was cancelled

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

helm/docker main-latest (reports 1.82.6)

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

The issue arises from concurrent JSON-RPC requests within a single StreamableHTTP session. To fix this, we need to ensure that each request is handled independently without interfering with others.

Step 1: Create a New Client Session for Each Request

Instead of reusing the same client session for concurrent requests, create a new session for each request. This can be achieved by moving the init_session call inside the call_tool function.

async def call_tool(client, headers, request_id, label):
    # Create a new session for each request
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as new_client:
        new_headers = {**headers}
        await init_session(new_client, new_headers)
        start = time.monotonic()
        try:
            result, _ = await mcp_post(new_client, new_headers, {
                "jsonrpc": "2.0", "id": request_id, "method": "tools/call",
                "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS},
            })
            # ... rest of the function remains the same

Step 2: Use a Semaphore to Limit Concurrent Requests

To prevent overwhelming the server with too many concurrent requests, use a semaphore to limit the number of concurrent requests.

sem = asyncio.Semaphore(5)  # Allow up to 5 concurrent requests

async def call_tool(client, headers, request_id, label):
    async with sem:
        # ... rest of the function remains the same

Verification

Run the modified repro.py script to verify that the fix works. The output should show that all concurrent requests complete successfully without any timeouts.

Extra Tips

  • Make sure to adjust the semaphore limit according to your server's capacity to handle concurrent requests.
  • Consider implementing a retry mechanism for failed requests to improve robustness.
  • Monitor server logs to ensure that the fix does not introduce any new issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING