litellm - ✅(Solved) Fix [Bug]: `timeout` silently ignored for Bedrock and Vertex AI streaming requests [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23375Fetched 2026-04-08 00:37:09
View on GitHub
Comments
0
Participants
1
Timeline
9
Reactions
1
Participants
Timeline (top)
labeled ×3referenced ×3cross-referenced ×2subscribed ×1

Error Message

{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. Error_str: Request timed out. - timeout value=0.1, time taken=0.64 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=azure-gpt\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}} {"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=Timeout(timeout=0.1), time taken=0.101 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=bedrock-claude\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}} {"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=0.1, time taken=0.191 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=vertex-gemini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

Root Cause

The async_streaming() methods in both Bedrock and Vertex delegate to a standalone make_call() function that does not accept a timeout parameter. The timeout value is received by async_streaming() but never forwarded. When make_call() creates a fallback httpx client via get_async_httpx_client() with no timeout param, it gets a cached client with timeout=600s.

Bedrock Converseconverse_handler.py:

# async_streaming() receives timeout in its signature:
async def async_streaming(self, ..., timeout: Optional[Union[float, httpx.Timeout]], ...):
    ...
    # But never passes it to make_call:
    completion_stream = await make_call(
        client=client,
        api_base=api_base,
        headers=dict(prepped.headers),
        data=data,
        model=model,
        messages=messages,
        logging_obj=logging_obj,
        fake_stream=fake_stream,
        json_mode=json_mode,
        stream_chunk_size=stream_chunk_size,
        # <-- timeout is missing here
    )

Bedrock Invokeinvoke_handler.py:

# make_call() doesn't accept timeout at all:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK)
        # ↑ no timeout param → cached client with 600s default

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=not fake_stream, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

Vertex AI Geminivertex_and_google_ai_studio_gemini.py:

# Same pattern — make_call() has no timeout param:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.VERTEX_AI)

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=True, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

And async_streaming() calls it via functools.partial without passing timeout:

async def async_streaming(self, ..., timeout, ...):
    ...
    streaming_response = CustomStreamWrapper(
        completion_stream=None,
        make_call=partial(
            make_call,
            client=client,
            api_base=api_base,
            headers=headers,
            data=request_body_str,
            model=model,
            messages=messages,
            logging_obj=logging_obj,
            # <-- timeout is missing here
        ),
        ...
    )

Compare with OpenAI/Azure (openai.py) which passes timeout both at client creation and per-request:

openai_aclient = self._get_openai_client(is_async=True, timeout=timeout, ...)
headers, response = await self.make_openai_chat_completion_request(
    openai_aclient=openai_aclient, data=data, timeout=timeout, ...)  # per-request override

Fix Action

Fixed

PR fix notes

PR #23424: forward timeout to make_call() for Bedrock and Vertex AI streaming

Description (problem / solution / changelog)

Relevant issues

Fixes #23375

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

timeoutwas silently ignored for streaming requests to Bedrock Converse, Bedrock Invoke, and Vertex AI Gemini.async_streaming()received the timeout but never passed it tomake_call()`, which had no timeout param and always created an httpx client with the 600s default.

Changed files

  • litellm/llms/bedrock/chat/converse_handler.py (modified, +10/-1)
  • litellm/llms/bedrock/chat/invoke_handler.py (modified, +9/-5)
  • litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py (modified, +8/-1)
  • tests/test_litellm/llms/bedrock/chat/test_invoke_handler.py (modified, +156/-0)
  • tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py (modified, +146/-0)

Code Example

# async_streaming() receives timeout in its signature:
async def async_streaming(self, ..., timeout: Optional[Union[float, httpx.Timeout]], ...):
    ...
    # But never passes it to make_call:
    completion_stream = await make_call(
        client=client,
        api_base=api_base,
        headers=dict(prepped.headers),
        data=data,
        model=model,
        messages=messages,
        logging_obj=logging_obj,
        fake_stream=fake_stream,
        json_mode=json_mode,
        stream_chunk_size=stream_chunk_size,
        # <-- timeout is missing here
    )

---

# make_call() doesn't accept timeout at all:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK)
        # ↑ no timeout param → cached client with 600s default

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=not fake_stream, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

---

# Same pattern — make_call() has no timeout param:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.VERTEX_AI)

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=True, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

---

async def async_streaming(self, ..., timeout, ...):
    ...
    streaming_response = CustomStreamWrapper(
        completion_stream=None,
        make_call=partial(
            make_call,
            client=client,
            api_base=api_base,
            headers=headers,
            data=request_body_str,
            model=model,
            messages=messages,
            logging_obj=logging_obj,
            # <-- timeout is missing here
        ),
        ...
    )

---

openai_aclient = self._get_openai_client(is_async=True, timeout=timeout, ...)
headers, response = await self.make_openai_chat_completion_request(
    openai_aclient=openai_aclient, data=data, timeout=timeout, ...)  # per-request override

---

model_list:
  - model_name: bedrock-claude
    litellm_params:
      model: bedrock/us.anthropic.claude-sonnet-4-6
      timeout: 0.1
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: vertex-gemini
    litellm_params:
      model: vertex_ai/gemini-2.5-flash
      timeout: 0.1
      vertex_project: my-project
      vertex_location: us-central1

  - model_name: azure-gpt
    litellm_params:
      model: azure/gpt-4.1
      api_base: https://my-resource.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-03-01-preview"
      timeout: 0.1

router_settings:
  timeout: 0.1

litellm_settings:
  num_retries: 0

---

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"azure-gpt","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

---

{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. Error_str: Request timed out. - timeout value=0.1, time taken=0.64 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=azure-gpt\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

---

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

---

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

---

# Bedrock non-streaming — correctly times out in 0.17s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'

---

{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=Timeout(timeout=0.1), time taken=0.101 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=bedrock-claude\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

---

# Vertex non-streaming — correctly times out in 0.45s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'

---

{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=0.1, time taken=0.191 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=vertex-gemini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The timeout setting (configured via litellm_params per-model or router_settings) is silently dropped for streaming requests to Bedrock (Converse and Invoke paths) and Vertex AI (Gemini). The configured timeout never reaches the httpx client, so these providers always fall back to the default 600s client timeout regardless of what you configure.

Non-streaming requests are not affected — those paths correctly pass timeout to the httpx client. OpenAI/Azure providers are not affected — timeout is passed correctly for both streaming and non-streaming.

Root Cause

The async_streaming() methods in both Bedrock and Vertex delegate to a standalone make_call() function that does not accept a timeout parameter. The timeout value is received by async_streaming() but never forwarded. When make_call() creates a fallback httpx client via get_async_httpx_client() with no timeout param, it gets a cached client with timeout=600s.

Bedrock Converseconverse_handler.py:

# async_streaming() receives timeout in its signature:
async def async_streaming(self, ..., timeout: Optional[Union[float, httpx.Timeout]], ...):
    ...
    # But never passes it to make_call:
    completion_stream = await make_call(
        client=client,
        api_base=api_base,
        headers=dict(prepped.headers),
        data=data,
        model=model,
        messages=messages,
        logging_obj=logging_obj,
        fake_stream=fake_stream,
        json_mode=json_mode,
        stream_chunk_size=stream_chunk_size,
        # <-- timeout is missing here
    )

Bedrock Invokeinvoke_handler.py:

# make_call() doesn't accept timeout at all:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK)
        # ↑ no timeout param → cached client with 600s default

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=not fake_stream, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

Vertex AI Geminivertex_and_google_ai_studio_gemini.py:

# Same pattern — make_call() has no timeout param:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.VERTEX_AI)

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=True, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

And async_streaming() calls it via functools.partial without passing timeout:

async def async_streaming(self, ..., timeout, ...):
    ...
    streaming_response = CustomStreamWrapper(
        completion_stream=None,
        make_call=partial(
            make_call,
            client=client,
            api_base=api_base,
            headers=headers,
            data=request_body_str,
            model=model,
            messages=messages,
            logging_obj=logging_obj,
            # <-- timeout is missing here
        ),
        ...
    )

Compare with OpenAI/Azure (openai.py) which passes timeout both at client creation and per-request:

openai_aclient = self._get_openai_client(is_async=True, timeout=timeout, ...)
headers, response = await self.make_openai_chat_completion_request(
    openai_aclient=openai_aclient, data=data, timeout=timeout, ...)  # per-request override

Affected providers

ProviderNon-StreamingStreaming
OpenAI / AzureTimeout worksTimeout works
Bedrock ConverseTimeout worksTimeout silently dropped (600s default)
Bedrock InvokeTimeout worksTimeout silently dropped (600s default)
Vertex AI GeminiTimeout worksTimeout silently dropped (600s default)

Steps to Reproduce

Config — set timeout: 0.1 (100ms) so the timeout fires before the first token arrives:

model_list:
  - model_name: bedrock-claude
    litellm_params:
      model: bedrock/us.anthropic.claude-sonnet-4-6
      timeout: 0.1
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: vertex-gemini
    litellm_params:
      model: vertex_ai/gemini-2.5-flash
      timeout: 0.1
      vertex_project: my-project
      vertex_location: us-central1

  - model_name: azure-gpt
    litellm_params:
      model: azure/gpt-4.1
      api_base: https://my-resource.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-03-01-preview"
      timeout: 0.1

router_settings:
  timeout: 0.1

litellm_settings:
  num_retries: 0

Test 1 — Azure streaming (timeout correctly enforced):

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"azure-gpt","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

Result — HTTP 408 in ~0.8s (correct):

{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. Error_str: Request timed out. - timeout value=0.1, time taken=0.64 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=azure-gpt\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

Test 2 — Bedrock streaming (timeout silently ignored):

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

Result — HTTP 200, streams for 60+ seconds to finish_reason: stop. The 100ms timeout is completely ignored.

Test 3 — Vertex streaming (timeout silently ignored):

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

Result — HTTP 200, streams for 35+ seconds to finish_reason: stop. The 100ms timeout is completely ignored.

Control — Bedrock and Vertex non-streaming (timeout works correctly):

The same timeout: 0.1 config correctly times out for non-streaming requests, proving the timeout plumbing works — only the streaming path is broken.

# Bedrock non-streaming — correctly times out in 0.17s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'
{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=Timeout(timeout=0.1), time taken=0.101 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=bedrock-claude\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}
# Vertex non-streaming — correctly times out in 0.45s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'
{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=0.1, time taken=0.191 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=vertex-gemini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.14-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to pass the timeout parameter to the make_call function and then to the httpx client. Here are the steps:

  • Modify the make_call function in converse_handler.py, invoke_handler.py, and vertex_and_google_ai_studio_gemini.py to accept a timeout parameter.
  • Pass the timeout parameter from the async_streaming function to the make_call function.
  • Use the timeout parameter when creating the httpx client in the make_call function.

Example code changes:

# converse_handler.py
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    stream_chunk_size: int = 1024,
    timeout: Optional[Union[float, httpx.Timeout]] = None,  # Add timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK, timeout=timeout)  # Pass timeout to get_async_httpx_client

    response = await client.post(api_base, headers=headers, data=data, stream=not fake_stream, timeout=timeout, logging_obj=logging_obj)

# invoke_handler.py
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    timeout: Optional[Union[float, httpx.Timeout]] = None,  # Add timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK, timeout=timeout)  # Pass timeout to get_async_httpx_client

    response = await client.post(api_base, headers=headers, data=data, stream=not fake_stream, timeout=timeout, logging_obj=logging_obj)

# vertex_and_google_ai_studio_gemini.py
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    timeout: Optional[Union[float, httpx.Timeout]] = None,  # Add timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING