litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tools/call requests on same StreamableHTTP session hang — only 1 of N completes [1 comments, 2 participants]

litellm2026-03-25 18:33:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24581•Fetched 2026-04-08 01:32:33

View on GitHub

Comments

Participants

Timeline

Reactions

Author

bmwt

Participants

bmwt

kaiisfree

Timeline (top)

labeled ×2commented ×1

Error Message

#!/usr/bin/env python3 """ Reproducer: concurrent MCP tools/call on same StreamableHTTP session. Sequential calls succeed; concurrent calls hang until timeout. Usage: pip install httpx python3 repro.py <LITELLM_URL> <API_KEY> <TOOL_NAME> [TOOL_ARGS_JSON] """ import asyncio import json import sys import time import httpx BASE_URL = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:4000" API_KEY = sys.argv[2] if len(sys.argv) > 2 else "sk-1234" TOOL_NAME = sys.argv[3] if len(sys.argv) > 3 else "search" TOOL_ARGS = json.loads(sys.argv[4]) if len(sys.argv) > 4 else {"query": "test"} MCP_URL = f"{BASE_URL}/mcp/" CALL_TIMEOUT = 15 def parse_sse(text): for line in text.split("\n"): if line.startswith("data: "): return json.loads(line[6:]) return json.loads(text) async def mcp_post(client, headers, payload): resp = await asyncio.wait_for( client.post(MCP_URL, json=payload, headers=headers, timeout=CALL_TIMEOUT), timeout=CALL_TIMEOUT + 2, ) if resp.status_code == 202 or not resp.text.strip(): return {}, resp ct = resp.headers.get("content-type", "") if "text/event-stream" in ct: return parse_sse(resp.text), resp return resp.json(), resp async def init_session(client, headers): result, resp = await mcp_post(client, headers, { "jsonrpc": "2.0", "id": 0, "method": "initialize", "params": { "protocolVersion": "2025-03-26", "capabilities": {}, "clientInfo": {"name": "bug-reproducer", "version": "1.0"}, }, }) sid = resp.headers.get("mcp-session-id", "") if sid: headers["mcp-session-id"] = sid await mcp_post(client, headers, { "jsonrpc": "2.0", "method": "notifications/initialized", }) return sid async def call_tool(client, headers, request_id, label): start = time.monotonic() try: result, _ = await mcp_post(client, headers, { "jsonrpc": "2.0", "id": request_id, "method": "tools/call", "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS}, }) elapsed = time.monotonic() - start if "error" in result: return label, "ERROR", elapsed, f"code={result['error'].get('code')}" if "result" in result: is_err = result["result"].get("isError", False) return label, "MCP_ERROR" if is_err else "OK", elapsed, "" return label, "UNEXPECTED", elapsed, str(result)[:80] except (asyncio.TimeoutError, httpx.ReadTimeout): return label, "TIMEOUT", time.monotonic() - start, "" except Exception as e: return label, type(e).name, time.monotonic() - start, str(e)[:60] async def main(): print(f"Target: {MCP_URL} Tool: {TOOL_NAME} Timeout: {CALL_TIMEOUT}s") print() base_headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", "Accept": "application/json, text/event-stream", } async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client: print("[1] Sequential (3 calls, same session)") h = {**base_headers} await init_session(client, h) for i in range(3): label, status, elapsed, detail = await call_tool(client, h, i + 10, f"seq-{i+1}") mark = "✓" if status == "OK" else "✗" print(f" {mark} {label}: {status} ({elapsed:.1f}s) {detail}") async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client: print("\n[2] Concurrent (3 calls, same session)") h = {**base_headers} await init_session(client, h) tasks = [ asyncio.create_task(call_tool(client, h, i + 20, f"par-{i+1}")) for i in range(3) ] for label, status, elapsed, detail in await asyncio.gather(*tasks): mark = "✓" if status == "OK" else "✗" print(f" {mark} {label}: {status} ({elapsed:.1f}s) {detail}") if name == "main": asyncio.run(main())

Code Example

#!/usr/bin/env python3
"""
Reproducer: concurrent MCP tools/call on same StreamableHTTP session.
Sequential calls succeed; concurrent calls hang until timeout.
Usage:
    pip install httpx
    python3 repro.py <LITELLM_URL> <API_KEY> <TOOL_NAME> [TOOL_ARGS_JSON]
"""
import asyncio
import json
import sys
import time
import httpx
BASE_URL = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:4000"
API_KEY = sys.argv[2] if len(sys.argv) > 2 else "sk-1234"
TOOL_NAME = sys.argv[3] if len(sys.argv) > 3 else "search"
TOOL_ARGS = json.loads(sys.argv[4]) if len(sys.argv) > 4 else {"query": "test"}
MCP_URL = f"{BASE_URL}/mcp/"
CALL_TIMEOUT = 15
def parse_sse(text):
    for line in text.split("\n"):
        if line.startswith("data: "):
            return json.loads(line[6:])
    return json.loads(text)
async def mcp_post(client, headers, payload):
    resp = await asyncio.wait_for(
        client.post(MCP_URL, json=payload, headers=headers, timeout=CALL_TIMEOUT),
        timeout=CALL_TIMEOUT + 2,
    )
    if resp.status_code == 202 or not resp.text.strip():
        return {}, resp
    ct = resp.headers.get("content-type", "")
    if "text/event-stream" in ct:
        return parse_sse(resp.text), resp
    return resp.json(), resp
async def init_session(client, headers):
    result, resp = await mcp_post(client, headers, {
        "jsonrpc": "2.0", "id": 0, "method": "initialize",
        "params": {
            "protocolVersion": "2025-03-26",
            "capabilities": {},
            "clientInfo": {"name": "bug-reproducer", "version": "1.0"},
        },
    })
    sid = resp.headers.get("mcp-session-id", "")
    if sid:
        headers["mcp-session-id"] = sid
    await mcp_post(client, headers, {
        "jsonrpc": "2.0", "method": "notifications/initialized",
    })
    return sid
async def call_tool(client, headers, request_id, label):
    start = time.monotonic()
    try:
        result, _ = await mcp_post(client, headers, {
            "jsonrpc": "2.0", "id": request_id, "method": "tools/call",
            "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS},
        })
        elapsed = time.monotonic() - start
        if "error" in result:
            return label, "ERROR", elapsed, f"code={result['error'].get('code')}"
        if "result" in result:
            is_err = result["result"].get("isError", False)
            return label, "MCP_ERROR" if is_err else "OK", elapsed, ""
        return label, "UNEXPECTED", elapsed, str(result)[:80]
    except (asyncio.TimeoutError, httpx.ReadTimeout):
        return label, "TIMEOUT", time.monotonic() - start, ""
    except Exception as e:
        return label, type(e).__name__, time.monotonic() - start, str(e)[:60]
async def main():
    print(f"Target: {MCP_URL}  Tool: {TOOL_NAME}  Timeout: {CALL_TIMEOUT}s")
    print()
    base_headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
        "Accept": "application/json, text/event-stream",
    }
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("[1] Sequential (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        for i in range(3):
            label, status, elapsed, detail = await call_tool(client, h, i + 10, f"seq-{i+1}")
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("\n[2] Concurrent (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        tasks = [
            asyncio.create_task(call_tool(client, h, i + 20, f"par-{i+1}"))
            for i in range(3)
        ]
        for label, status, elapsed, detail in await asyncio.gather(*tasks):
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
if __name__ == "__main__":
    asyncio.run(main())

---

[1] Sequential (3 calls, same session)
    ✓ seq-1: OK (0.3s)
    ✓ seq-2: OK (0.2s)
    ✓ seq-3: OK (0.3s)
[2] Concurrent (3 calls, same session)
    ✗ par-1: TIMEOUT (17.0s)
    ✓ par-2: OK (0.3s)
    ✗ par-3: TIMEOUT (17.0s)
Server logs after 60s:
client.py:450 - MCP client tool call was cancelled
client.py:450 - MCP client tool call was cancelled

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Concurrent tools/call JSON-RPC requests within a single StreamableHTTP session (POST /mcp/) fail. Only 1 call completes- the rest hang until MCP_CLIENT_TIMEOUT (60s default). The server logs "MCP client tool call was cancelled". Sequential calls on the same session work fine. This is the failure mode for MCP clients that maintaining a persistent session and sending parallel tool calls (e.g. LibreChat).

Steps to Reproduce

Have any MCP server registered in LiteLLM (the tool call needs to take at least ~100ms to hit the bug — instant/cached responses won't trigger it)
pip install httpx
Save the script below as repro.py and run: python3 repro.py http://localhost:4000 sk-YOUR_KEY your_tool_name '{"arg": "value"}'

#!/usr/bin/env python3
"""
Reproducer: concurrent MCP tools/call on same StreamableHTTP session.
Sequential calls succeed; concurrent calls hang until timeout.
Usage:
    pip install httpx
    python3 repro.py <LITELLM_URL> <API_KEY> <TOOL_NAME> [TOOL_ARGS_JSON]
"""
import asyncio
import json
import sys
import time
import httpx
BASE_URL = sys.argv[1] if len(sys.argv) > 1 else "http://localhost:4000"
API_KEY = sys.argv[2] if len(sys.argv) > 2 else "sk-1234"
TOOL_NAME = sys.argv[3] if len(sys.argv) > 3 else "search"
TOOL_ARGS = json.loads(sys.argv[4]) if len(sys.argv) > 4 else {"query": "test"}
MCP_URL = f"{BASE_URL}/mcp/"
CALL_TIMEOUT = 15
def parse_sse(text):
    for line in text.split("\n"):
        if line.startswith("data: "):
            return json.loads(line[6:])
    return json.loads(text)
async def mcp_post(client, headers, payload):
    resp = await asyncio.wait_for(
        client.post(MCP_URL, json=payload, headers=headers, timeout=CALL_TIMEOUT),
        timeout=CALL_TIMEOUT + 2,
    )
    if resp.status_code == 202 or not resp.text.strip():
        return {}, resp
    ct = resp.headers.get("content-type", "")
    if "text/event-stream" in ct:
        return parse_sse(resp.text), resp
    return resp.json(), resp
async def init_session(client, headers):
    result, resp = await mcp_post(client, headers, {
        "jsonrpc": "2.0", "id": 0, "method": "initialize",
        "params": {
            "protocolVersion": "2025-03-26",
            "capabilities": {},
            "clientInfo": {"name": "bug-reproducer", "version": "1.0"},
        },
    })
    sid = resp.headers.get("mcp-session-id", "")
    if sid:
        headers["mcp-session-id"] = sid
    await mcp_post(client, headers, {
        "jsonrpc": "2.0", "method": "notifications/initialized",
    })
    return sid
async def call_tool(client, headers, request_id, label):
    start = time.monotonic()
    try:
        result, _ = await mcp_post(client, headers, {
            "jsonrpc": "2.0", "id": request_id, "method": "tools/call",
            "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS},
        })
        elapsed = time.monotonic() - start
        if "error" in result:
            return label, "ERROR", elapsed, f"code={result['error'].get('code')}"
        if "result" in result:
            is_err = result["result"].get("isError", False)
            return label, "MCP_ERROR" if is_err else "OK", elapsed, ""
        return label, "UNEXPECTED", elapsed, str(result)[:80]
    except (asyncio.TimeoutError, httpx.ReadTimeout):
        return label, "TIMEOUT", time.monotonic() - start, ""
    except Exception as e:
        return label, type(e).__name__, time.monotonic() - start, str(e)[:60]
async def main():
    print(f"Target: {MCP_URL}  Tool: {TOOL_NAME}  Timeout: {CALL_TIMEOUT}s")
    print()
    base_headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
        "Accept": "application/json, text/event-stream",
    }
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("[1] Sequential (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        for i in range(3):
            label, status, elapsed, detail = await call_tool(client, h, i + 10, f"seq-{i+1}")
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as client:
        print("\n[2] Concurrent (3 calls, same session)")
        h = {**base_headers}
        await init_session(client, h)
        tasks = [
            asyncio.create_task(call_tool(client, h, i + 20, f"par-{i+1}"))
            for i in range(3)
        ]
        for label, status, elapsed, detail in await asyncio.gather(*tasks):
            mark = "✓" if status == "OK" else "✗"
            print(f"    {mark} {label}: {status} ({elapsed:.1f}s) {detail}")
if __name__ == "__main__":
    asyncio.run(main())

Relevant log output

[1] Sequential (3 calls, same session)
    ✓ seq-1: OK (0.3s)
    ✓ seq-2: OK (0.2s)
    ✓ seq-3: OK (0.3s)
[2] Concurrent (3 calls, same session)
    ✗ par-1: TIMEOUT (17.0s)
    ✓ par-2: OK (0.3s)
    ✗ par-3: TIMEOUT (17.0s)
Server logs after 60s:
client.py:450 - MCP client tool call was cancelled
client.py:450 - MCP client tool call was cancelled

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

helm/docker main-latest (reports 1.82.6)

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

The issue arises from concurrent JSON-RPC requests within a single StreamableHTTP session. To fix this, we need to ensure that each request is handled independently without interfering with others.

Step 1: Create a New Client Session for Each Request

Instead of reusing the same client session for concurrent requests, create a new session for each request. This can be achieved by moving the init_session call inside the call_tool function.

async def call_tool(client, headers, request_id, label):
    # Create a new session for each request
    async with httpx.AsyncClient(verify=False, timeout=CALL_TIMEOUT) as new_client:
        new_headers = {**headers}
        await init_session(new_client, new_headers)
        start = time.monotonic()
        try:
            result, _ = await mcp_post(new_client, new_headers, {
                "jsonrpc": "2.0", "id": request_id, "method": "tools/call",
                "params": {"name": TOOL_NAME, "arguments": TOOL_ARGS},
            })
            # ... rest of the function remains the same

Step 2: Use a Semaphore to Limit Concurrent Requests

To prevent overwhelming the server with too many concurrent requests, use a semaphore to limit the number of concurrent requests.

sem = asyncio.Semaphore(5)  # Allow up to 5 concurrent requests

async def call_tool(client, headers, request_id, label):
    async with sem:
        # ... rest of the function remains the same

Verification

Run the modified repro.py script to verify that the fix works. The output should show that all concurrent requests complete successfully without any timeouts.

Extra Tips

Make sure to adjust the semaphore limit according to your server's capacity to handle concurrent requests.
Consider implementing a retry mechanism for failed requests to improve robustness.
Monitor server logs to ensure that the fix does not introduce any new issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #authentication issue #prompt issue #agent setup #task chaining

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tools/call requests on same StreamableHTTP session hang — only 1 of N completes [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Step 1: Create a New Client Session for Each Request

Step 2: Use a Semaphore to Limit Concurrent Requests

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Concurrent MCP tools/call requests on same StreamableHTTP session hang — only 1 of N completes [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Step 1: Create a New Client Session for Each Request

Step 2: Use a Semaphore to Limit Concurrent Requests

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING