litellm - ✅(Solved) Fix [Bug]: `timeout` silently ignored for Bedrock and Vertex AI streaming requests [1 pull requests, 1 participants]

litellm2026-03-11 20:01:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23375•Fetched 2026-04-08 00:37:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

eddiechapman

Participants

eddiechapman

Timeline (top)

labeled ×3referenced ×3cross-referenced ×2subscribed ×1

Error Message

{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. Error_str: Request timed out. - timeout value=0.1, time taken=0.64 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=azure-gpt\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}} {"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=Timeout(timeout=0.1), time taken=0.101 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=bedrock-claude\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}} {"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=0.1, time taken=0.191 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=vertex-gemini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

Root Cause

The async_streaming() methods in both Bedrock and Vertex delegate to a standalone make_call() function that does not accept a timeout parameter. The timeout value is received by async_streaming() but never forwarded. When make_call() creates a fallback httpx client via get_async_httpx_client() with no timeout param, it gets a cached client with timeout=600s.

Bedrock Converse — converse_handler.py:

# async_streaming() receives timeout in its signature:
async def async_streaming(self, ..., timeout: Optional[Union[float, httpx.Timeout]], ...):
    ...
    # But never passes it to make_call:
    completion_stream = await make_call(
        client=client,
        api_base=api_base,
        headers=dict(prepped.headers),
        data=data,
        model=model,
        messages=messages,
        logging_obj=logging_obj,
        fake_stream=fake_stream,
        json_mode=json_mode,
        stream_chunk_size=stream_chunk_size,
        # <-- timeout is missing here
    )

Bedrock Invoke — invoke_handler.py:

# make_call() doesn't accept timeout at all:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK)
        # ↑ no timeout param → cached client with 600s default

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=not fake_stream, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

Vertex AI Gemini — vertex_and_google_ai_studio_gemini.py:

# Same pattern — make_call() has no timeout param:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.VERTEX_AI)

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=True, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

And async_streaming() calls it via functools.partial without passing timeout:

async def async_streaming(self, ..., timeout, ...):
    ...
    streaming_response = CustomStreamWrapper(
        completion_stream=None,
        make_call=partial(
            make_call,
            client=client,
            api_base=api_base,
            headers=headers,
            data=request_body_str,
            model=model,
            messages=messages,
            logging_obj=logging_obj,
            # <-- timeout is missing here
        ),
        ...
    )

Compare with OpenAI/Azure (openai.py) which passes timeout both at client creation and per-request:

openai_aclient = self._get_openai_client(is_async=True, timeout=timeout, ...)
headers, response = await self.make_openai_chat_completion_request(
    openai_aclient=openai_aclient, data=data, timeout=timeout, ...)  # per-request override

Fix Action

Fixed

Fixed by PR: forward timeout to make_call() for Bedrock and Vertex AI streaming (https://github.com/BerriAI/litellm/pull/23424)

PR fix notes

PR #23424: forward timeout to make_call() for Bedrock and Vertex AI streaming

Repository: BerriAI/litellm
Author: pradyyadav
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/23424

Description (problem / solution / changelog)

Relevant issues

Fixes #23375

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

timeoutwas silently ignored for streaming requests to Bedrock Converse, Bedrock Invoke, and Vertex AI Gemini.async_streaming()received the timeout but never passed it tomake_call()`, which had no timeout param and always created an httpx client with the 600s default.

Changed files

litellm/llms/bedrock/chat/converse_handler.py (modified, +10/-1)
litellm/llms/bedrock/chat/invoke_handler.py (modified, +9/-5)
litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py (modified, +8/-1)
tests/test_litellm/llms/bedrock/chat/test_invoke_handler.py (modified, +156/-0)
tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py (modified, +146/-0)

Code Example

# async_streaming() receives timeout in its signature:
async def async_streaming(self, ..., timeout: Optional[Union[float, httpx.Timeout]], ...):
    ...
    # But never passes it to make_call:
    completion_stream = await make_call(
        client=client,
        api_base=api_base,
        headers=dict(prepped.headers),
        data=data,
        model=model,
        messages=messages,
        logging_obj=logging_obj,
        fake_stream=fake_stream,
        json_mode=json_mode,
        stream_chunk_size=stream_chunk_size,
        # <-- timeout is missing here
    )

---

# make_call() doesn't accept timeout at all:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK)
        # ↑ no timeout param → cached client with 600s default

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=not fake_stream, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

---

# Same pattern — make_call() has no timeout param:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.VERTEX_AI)

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=True, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

---

async def async_streaming(self, ..., timeout, ...):
    ...
    streaming_response = CustomStreamWrapper(
        completion_stream=None,
        make_call=partial(
            make_call,
            client=client,
            api_base=api_base,
            headers=headers,
            data=request_body_str,
            model=model,
            messages=messages,
            logging_obj=logging_obj,
            # <-- timeout is missing here
        ),
        ...
    )

---

openai_aclient = self._get_openai_client(is_async=True, timeout=timeout, ...)
headers, response = await self.make_openai_chat_completion_request(
    openai_aclient=openai_aclient, data=data, timeout=timeout, ...)  # per-request override

---

model_list:
  - model_name: bedrock-claude
    litellm_params:
      model: bedrock/us.anthropic.claude-sonnet-4-6
      timeout: 0.1
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: vertex-gemini
    litellm_params:
      model: vertex_ai/gemini-2.5-flash
      timeout: 0.1
      vertex_project: my-project
      vertex_location: us-central1

  - model_name: azure-gpt
    litellm_params:
      model: azure/gpt-4.1
      api_base: https://my-resource.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-03-01-preview"
      timeout: 0.1

router_settings:
  timeout: 0.1

litellm_settings:
  num_retries: 0

---

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"azure-gpt","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

---

{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. Error_str: Request timed out. - timeout value=0.1, time taken=0.64 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=azure-gpt\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

---

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

---

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

---

# Bedrock non-streaming — correctly times out in 0.17s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'

---

{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=Timeout(timeout=0.1), time taken=0.101 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=bedrock-claude\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

---

# Vertex non-streaming — correctly times out in 0.45s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'

---

{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=0.1, time taken=0.191 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=vertex-gemini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The timeout setting (configured via litellm_params per-model or router_settings) is silently dropped for streaming requests to Bedrock (Converse and Invoke paths) and Vertex AI (Gemini). The configured timeout never reaches the httpx client, so these providers always fall back to the default 600s client timeout regardless of what you configure.

Non-streaming requests are not affected — those paths correctly pass timeout to the httpx client. OpenAI/Azure providers are not affected — timeout is passed correctly for both streaming and non-streaming.

Root Cause

Bedrock Converse — converse_handler.py:

# async_streaming() receives timeout in its signature:
async def async_streaming(self, ..., timeout: Optional[Union[float, httpx.Timeout]], ...):
    ...
    # But never passes it to make_call:
    completion_stream = await make_call(
        client=client,
        api_base=api_base,
        headers=dict(prepped.headers),
        data=data,
        model=model,
        messages=messages,
        logging_obj=logging_obj,
        fake_stream=fake_stream,
        json_mode=json_mode,
        stream_chunk_size=stream_chunk_size,
        # <-- timeout is missing here
    )

Bedrock Invoke — invoke_handler.py:

# make_call() doesn't accept timeout at all:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK)
        # ↑ no timeout param → cached client with 600s default

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=not fake_stream, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

Vertex AI Gemini — vertex_and_google_ai_studio_gemini.py:

# Same pattern — make_call() has no timeout param:
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    # <-- no timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.VERTEX_AI)

    response = await client.post(api_base, headers=headers, data=data,
                                  stream=True, logging_obj=logging_obj)
    # ↑ no timeout= kwarg

And async_streaming() calls it via functools.partial without passing timeout:

async def async_streaming(self, ..., timeout, ...):
    ...
    streaming_response = CustomStreamWrapper(
        completion_stream=None,
        make_call=partial(
            make_call,
            client=client,
            api_base=api_base,
            headers=headers,
            data=request_body_str,
            model=model,
            messages=messages,
            logging_obj=logging_obj,
            # <-- timeout is missing here
        ),
        ...
    )

Compare with OpenAI/Azure (openai.py) which passes timeout both at client creation and per-request:

openai_aclient = self._get_openai_client(is_async=True, timeout=timeout, ...)
headers, response = await self.make_openai_chat_completion_request(
    openai_aclient=openai_aclient, data=data, timeout=timeout, ...)  # per-request override

Affected providers

Provider	Non-Streaming	Streaming
OpenAI / Azure	Timeout works	Timeout works
Bedrock Converse	Timeout works	Timeout silently dropped (600s default)
Bedrock Invoke	Timeout works	Timeout silently dropped (600s default)
Vertex AI Gemini	Timeout works	Timeout silently dropped (600s default)

Steps to Reproduce

Config — set timeout: 0.1 (100ms) so the timeout fires before the first token arrives:

model_list:
  - model_name: bedrock-claude
    litellm_params:
      model: bedrock/us.anthropic.claude-sonnet-4-6
      timeout: 0.1
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: vertex-gemini
    litellm_params:
      model: vertex_ai/gemini-2.5-flash
      timeout: 0.1
      vertex_project: my-project
      vertex_location: us-central1

  - model_name: azure-gpt
    litellm_params:
      model: azure/gpt-4.1
      api_base: https://my-resource.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-03-01-preview"
      timeout: 0.1

router_settings:
  timeout: 0.1

litellm_settings:
  num_retries: 0

Test 1 — Azure streaming (timeout correctly enforced):

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"azure-gpt","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

Result — HTTP 408 in ~0.8s (correct):

{"error":{"message":"litellm.Timeout: APITimeoutError - Request timed out. Error_str: Request timed out. - timeout value=0.1, time taken=0.64 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=azure-gpt\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

Test 2 — Bedrock streaming (timeout silently ignored):

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

Result — HTTP 200, streams for 60+ seconds to finish_reason: stop. The 100ms timeout is completely ignored.

Test 3 — Vertex streaming (timeout silently ignored):

curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":true}'

Result — HTTP 200, streams for 35+ seconds to finish_reason: stop. The 100ms timeout is completely ignored.

Control — Bedrock and Vertex non-streaming (timeout works correctly):

The same timeout: 0.1 config correctly times out for non-streaming requests, proving the timeout plumbing works — only the streaming path is broken.

# Bedrock non-streaming — correctly times out in 0.17s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"bedrock-claude","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'

{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=Timeout(timeout=0.1), time taken=0.101 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=bedrock-claude\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

# Vertex non-streaming — correctly times out in 0.45s
curl -s http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"vertex-gemini","messages":[{"role":"user","content":"Write a 2000 word essay on distributed systems."}],"stream":false}'

{"error":{"message":"litellm.Timeout: Connection timed out. Timeout passed=0.1, time taken=0.191 seconds\n\nDeployment Info: request_timeout: None\ntimeout: 0.1. Received Model Group=vertex-gemini\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"408"}}

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.14-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to pass the timeout parameter to the make_call function and then to the httpx client. Here are the steps:

Modify the make_call function in converse_handler.py, invoke_handler.py, and vertex_and_google_ai_studio_gemini.py to accept a timeout parameter.
Pass the timeout parameter from the async_streaming function to the make_call function.
Use the timeout parameter when creating the httpx client in the make_call function.

Example code changes:

# converse_handler.py
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    stream_chunk_size: int = 1024,
    timeout: Optional[Union[float, httpx.Timeout]] = None,  # Add timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK, timeout=timeout)  # Pass timeout to get_async_httpx_client

    response = await client.post(api_base, headers=headers, data=data, stream=not fake_stream, timeout=timeout, logging_obj=logging_obj)

# invoke_handler.py
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj: Logging,
    fake_stream: bool = False,
    json_mode: Optional[bool] = False,
    bedrock_invoke_provider: Optional[...] = None,
    stream_chunk_size: int = 1024,
    timeout: Optional[Union[float, httpx.Timeout]] = None,  # Add timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders.BEDROCK, timeout=timeout)  # Pass timeout to get_async_httpx_client

    response = await client.post(api_base, headers=headers, data=data, stream=not fake_stream, timeout=timeout, logging_obj=logging_obj)

# vertex_and_google_ai_studio_gemini.py
async def make_call(
    client: Optional[AsyncHTTPHandler],
    api_base: str,
    headers: dict,
    data: str,
    model: str,
    messages: list,
    logging_obj,
    timeout: Optional[Union[float, httpx.Timeout]] = None,  # Add timeout parameter
):
    if client is None:
        client = get_async_httpx_client(llm_provider=litellm.LlmProviders

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #API middleware #SSR setup #ISR setup #authentication setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: `timeout` silently ignored for Bedrock and Vertex AI streaming requests [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #23424: forward timeout to make_call() for Bedrock and Vertex AI streaming

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Changed files

Code Example

Check for existing issues

What happened?

Root Cause

Affected providers

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Still need to ship something?

RELATED_DISCOVERY

TRENDING