litellm - ✅(Solved) Fix [Bug] Anthropic /v1/messages streaming endpoint drops tool_use arguments for vertex_ai/gemini-* models [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25561Fetched 2026-04-12 13:24:52
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
labeled ×2

When routing through the LiteLLM proxy's /v1/messages endpoint to a vertex_ai/gemini-* model with stream=true, the translated Anthropic SSE stream emits content_block_start and content_block_stop for each tool call, but zero content_block_delta events of type input_json_delta in between. As a result, every tool_use block delivered to the client has input: {}, regardless of what arguments the model actually produced.

The exact same request with stream=false returns the full, correct arguments. Flipping the stream flag is sufficient to reproduce the bug in isolation — no other variable matters.

This is fatal for Anthropic-protocol clients that validate tool arguments locally (e.g. Claude Code), because every tool call fails the required-field check and enters an infinite retry loop.

Error Message

#!/usr/bin/env python3 """Same request, only stream flag differs.""" import json import httpx

BASE = "http://127.0.0.1:4001"

READ_TOOL = { "name": "Read", "description": "Reads a file from the local filesystem.", "input_schema": { "type": "object", "properties": { "file_path": {"description": "The absolute path to the file to read", "type": "string"}, }, "required": ["file_path"], "additionalProperties": False, }, }

BODY_BASE = { "model": "claude-sonnet-4-6", # routed to vertex_ai/gemini-3.1-pro-preview via wildcard "max_tokens": 4096, "system": "You are a helpful assistant. Use the Read tool when the user asks you to look at a file.", "tools": [READ_TOOL], "messages": [ {"role": "user", "content": "Read /etc/hosts please"}, ], }

def test_non_stream(): print("\n========== stream=False ==========") r = httpx.post(f"{BASE}/v1/messages", json={**BODY_BASE, "stream": False}, timeout=120) print(f"status: {r.status_code}") for b in r.json().get("content", []): if b.get("type") == "tool_use": print(f" tool_use: name={b['name']} input={json.dumps(b.get('input'))}")

def test_stream(): print("\n========== stream=True ==========") with httpx.stream("POST", f"{BASE}/v1/messages", json={**BODY_BASE, "stream": True}, timeout=120) as r: print(f"status: {r.status_code}") cur_tool = None tool_uses = [] for line in r.iter_lines(): if not line.startswith("data: "): continue try: d = json.loads(line[6:]) except Exception: continue t = d.get("type") if t == "content_block_start": cb = d.get("content_block", {}) if cb.get("type") == "tool_use": cur_tool = {"name": cb.get("name"), "input_start": cb.get("input"), "deltas": []} tool_uses.append(cur_tool) elif t == "content_block_delta": delta = d.get("delta", {}) if delta.get("type") == "input_json_delta" and cur_tool: cur_tool["deltas"].append(delta.get("partial_json", "")) elif t == "content_block_stop": cur_tool = None

for tu in tool_uses:
    joined = "".join(tu["deltas"])
    print(f"  tool_use: name={tu['name']} input_at_start={json.dumps(tu['input_start'])}"
          f" delta_events={len(tu['deltas'])} joined={joined!r}")

if name == "main": test_non_stream() test_stream()

Root Cause

This is fatal for Anthropic-protocol clients that validate tool arguments locally (e.g. Claude Code), because every tool call fails the required-field check and enters an infinite retry loop.

Fix Action

Workaround

None at the proxy level. The only workaround is to bypass LiteLLM for this path entirely and implement Anthropic-protocol ↔ Vertex-AI translation natively. That is what I ended up doing in the downstream project where I hit this bug.

PR fix notes

PR #25601: fix: Anthropic /v1/messages streaming drops tool_use arguments for atomic tool call providers

Description (problem / solution / changelog)

Relevant issues

Fixes: #25561

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

<!-- Include screenshots, screen recordings, or log output demonstrating that your changes work as expected. For bug fixes: show reproduction before the fix and passing behavior after. For new features: show the feature working end-to-end. For UI changes: include before/after screenshots. -->

Type

🐛 Bug Fix

Changed files

  • litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py (modified, +19/-14)
  • tests/test_litellm/llms/anthropic/experimental_pass_through/adapters/test_anthropic_streaming_iterator.py (added, +393/-0)

Code Example

# litellm_config.yaml
model_list:
  - model_name: "*"
    litellm_params:
      model: vertex_ai/gemini-3.1-pro-preview
      vertex_project: YOUR_PROJECT_ID
      vertex_location: global

litellm_settings:
  drop_params: true

---

litellm --config litellm_config.yaml --port 4001

---

#!/usr/bin/env python3
"""Same request, only `stream` flag differs."""
import json
import httpx

BASE = "http://127.0.0.1:4001"

READ_TOOL = {
    "name": "Read",
    "description": "Reads a file from the local filesystem.",
    "input_schema": {
        "type": "object",
        "properties": {
            "file_path": {"description": "The absolute path to the file to read", "type": "string"},
        },
        "required": ["file_path"],
        "additionalProperties": False,
    },
}

BODY_BASE = {
    "model": "claude-sonnet-4-6",  # routed to vertex_ai/gemini-3.1-pro-preview via wildcard
    "max_tokens": 4096,
    "system": "You are a helpful assistant. Use the Read tool when the user asks you to look at a file.",
    "tools": [READ_TOOL],
    "messages": [
        {"role": "user", "content": "Read /etc/hosts please"},
    ],
}


def test_non_stream():
    print("\n========== stream=False ==========")
    r = httpx.post(f"{BASE}/v1/messages", json={**BODY_BASE, "stream": False}, timeout=120)
    print(f"status: {r.status_code}")
    for b in r.json().get("content", []):
        if b.get("type") == "tool_use":
            print(f"  tool_use: name={b['name']} input={json.dumps(b.get('input'))}")


def test_stream():
    print("\n========== stream=True ==========")
    with httpx.stream("POST", f"{BASE}/v1/messages", json={**BODY_BASE, "stream": True}, timeout=120) as r:
        print(f"status: {r.status_code}")
        cur_tool = None
        tool_uses = []
        for line in r.iter_lines():
            if not line.startswith("data: "):
                continue
            try:
                d = json.loads(line[6:])
            except Exception:
                continue
            t = d.get("type")
            if t == "content_block_start":
                cb = d.get("content_block", {})
                if cb.get("type") == "tool_use":
                    cur_tool = {"name": cb.get("name"), "input_start": cb.get("input"), "deltas": []}
                    tool_uses.append(cur_tool)
            elif t == "content_block_delta":
                delta = d.get("delta", {})
                if delta.get("type") == "input_json_delta" and cur_tool:
                    cur_tool["deltas"].append(delta.get("partial_json", ""))
            elif t == "content_block_stop":
                cur_tool = None

    for tu in tool_uses:
        joined = "".join(tu["deltas"])
        print(f"  tool_use: name={tu['name']} input_at_start={json.dumps(tu['input_start'])}"
              f" delta_events={len(tu['deltas'])} joined={joined!r}")


if __name__ == "__main__":
    test_non_stream()
    test_stream()

---

========== stream=False ==========
status: 200
  tool_use: name=Read input={"file_path": "/etc/hosts"}

========== stream=True ==========
status: 200
  tool_use: name=Read input_at_start={} delta_events=0 joined=''

---

event: message_start
data: {"type": "message_start", "message": {...}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "text", "text": ""}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "tool_use", "id": "call_2ee...__thought__<base64 sig>", "name": "Read", "input": {}}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "tool_use"}, "usage": {"input_tokens": 21331, "output_tokens": 126}}

event: message_stop
data: {"type": "message_stop"}
RAW_BUFFERClick to expand / collapse

[Bug] Anthropic /v1/messages streaming endpoint drops tool_use arguments for vertex_ai/gemini-* models

Summary

When routing through the LiteLLM proxy's /v1/messages endpoint to a vertex_ai/gemini-* model with stream=true, the translated Anthropic SSE stream emits content_block_start and content_block_stop for each tool call, but zero content_block_delta events of type input_json_delta in between. As a result, every tool_use block delivered to the client has input: {}, regardless of what arguments the model actually produced.

The exact same request with stream=false returns the full, correct arguments. Flipping the stream flag is sufficient to reproduce the bug in isolation — no other variable matters.

This is fatal for Anthropic-protocol clients that validate tool arguments locally (e.g. Claude Code), because every tool call fails the required-field check and enters an infinite retry loop.

Environment

  • LiteLLM version: 1.83.4 (pip show litellm)
  • Python: 3.9 (/Users/.../Library/Python/3.9/site-packages/litellm)
  • Model: vertex_ai/gemini-3.1-pro-preview (also expected to affect any other vertex_ai/gemini-* model, not model-specific)
  • Auth: ADC (gcloud auth application-default login), vertex_location: global
  • Endpoint: POST /v1/messages on the LiteLLM proxy

Reproduction

Config

# litellm_config.yaml
model_list:
  - model_name: "*"
    litellm_params:
      model: vertex_ai/gemini-3.1-pro-preview
      vertex_project: YOUR_PROJECT_ID
      vertex_location: global

litellm_settings:
  drop_params: true

Start the proxy:

litellm --config litellm_config.yaml --port 4001

Minimal reproducer (Python)

#!/usr/bin/env python3
"""Same request, only `stream` flag differs."""
import json
import httpx

BASE = "http://127.0.0.1:4001"

READ_TOOL = {
    "name": "Read",
    "description": "Reads a file from the local filesystem.",
    "input_schema": {
        "type": "object",
        "properties": {
            "file_path": {"description": "The absolute path to the file to read", "type": "string"},
        },
        "required": ["file_path"],
        "additionalProperties": False,
    },
}

BODY_BASE = {
    "model": "claude-sonnet-4-6",  # routed to vertex_ai/gemini-3.1-pro-preview via wildcard
    "max_tokens": 4096,
    "system": "You are a helpful assistant. Use the Read tool when the user asks you to look at a file.",
    "tools": [READ_TOOL],
    "messages": [
        {"role": "user", "content": "Read /etc/hosts please"},
    ],
}


def test_non_stream():
    print("\n========== stream=False ==========")
    r = httpx.post(f"{BASE}/v1/messages", json={**BODY_BASE, "stream": False}, timeout=120)
    print(f"status: {r.status_code}")
    for b in r.json().get("content", []):
        if b.get("type") == "tool_use":
            print(f"  tool_use: name={b['name']} input={json.dumps(b.get('input'))}")


def test_stream():
    print("\n========== stream=True ==========")
    with httpx.stream("POST", f"{BASE}/v1/messages", json={**BODY_BASE, "stream": True}, timeout=120) as r:
        print(f"status: {r.status_code}")
        cur_tool = None
        tool_uses = []
        for line in r.iter_lines():
            if not line.startswith("data: "):
                continue
            try:
                d = json.loads(line[6:])
            except Exception:
                continue
            t = d.get("type")
            if t == "content_block_start":
                cb = d.get("content_block", {})
                if cb.get("type") == "tool_use":
                    cur_tool = {"name": cb.get("name"), "input_start": cb.get("input"), "deltas": []}
                    tool_uses.append(cur_tool)
            elif t == "content_block_delta":
                delta = d.get("delta", {})
                if delta.get("type") == "input_json_delta" and cur_tool:
                    cur_tool["deltas"].append(delta.get("partial_json", ""))
            elif t == "content_block_stop":
                cur_tool = None

    for tu in tool_uses:
        joined = "".join(tu["deltas"])
        print(f"  tool_use: name={tu['name']} input_at_start={json.dumps(tu['input_start'])}"
              f" delta_events={len(tu['deltas'])} joined={joined!r}")


if __name__ == "__main__":
    test_non_stream()
    test_stream()

Actual output

========== stream=False ==========
status: 200
  tool_use: name=Read input={"file_path": "/etc/hosts"}

========== stream=True ==========
status: 200
  tool_use: name=Read input_at_start={} delta_events=0 joined=''

Expected behavior

Under stream=True, the proxy should emit one or more content_block_delta events with delta.type == "input_json_delta" carrying the model-produced tool arguments as partial_json, between content_block_start and content_block_stop of the tool_use block. The concatenation of these deltas should parse to the same input that stream=False returns.

Actual raw SSE captured from a real client

Full SSE body of a real /v1/messages?beta=true streaming response from a Claude Code client hitting the proxy:

event: message_start
data: {"type": "message_start", "message": {...}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "text", "text": ""}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "tool_use", "id": "call_2ee...__thought__<base64 sig>", "name": "Read", "input": {}}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "tool_use"}, "usage": {"input_tokens": 21331, "output_tokens": 126}}

event: message_stop
data: {"type": "message_stop"}

Note the complete absence of any content_block_delta events. The 126 output tokens reported in usage indicate the model did produce content (thinking + arguments), but only the thinking parts appear to have been consumed by the stream wrapper — the function-call arguments never make it into the Anthropic SSE output.

Scope of impact

Confirmed variables that DO NOT matter

All of these were independently tested and eliminated as contributing factors:

  • Number of tools declared (reproduces with a single-tool declaration)
  • Tool schema complexity (reproduces with a 1-property schema)
  • Presence of thinking / reasoning_effort
  • thoughtSignature round-trip semantics
  • temperature value (tested 0, 1.0)
  • vertex_location (tested global and default)
  • Presence of ?beta=true query param
  • system prompt format (string vs list of blocks)
  • max_tokens value
  • tool_choice configuration

The ONLY variable that matters

Flipping "stream": true"stream": false in the request body, holding everything else constant.

Real-world impact

Any Anthropic-protocol client that:

  1. Sends stream: true (which is default for most Anthropic SDKs), AND
  2. Validates tool_use.input against the declared JSON Schema locally before executing the tool

...will fail 100% of tool calls with a "missing required parameter" error, then enter a retry loop that exhausts tokens without making progress. This is the exact failure mode I observed with Claude Code (Anthropic's official CLI) — every user task that requires reading or editing a file gets stuck spinning until manually interrupted.

Suspected location

The bug appears to be in the Anthropic /v1/messages streaming pass-through adapter, specifically in the OpenAI-chunk → Anthropic-SSE translation path:

  • litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py (AnthropicStreamWrapper.async_anthropic_sse_wrapper)
  • litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py (_translate_streaming_openai_chunk_to_anthropic, around line 1403)

The _translate_streaming_openai_chunk_to_anthropic function at transformation.py:1417-1428 does look for choice.delta.tool_calls[].function.arguments and would produce an input_json_delta if found. So either:

  1. The upstream vertex_ai/gemini streaming handler never populates choice.delta.tool_calls (it may deliver the entire function call args in a non-delta chunk that gets consumed elsewhere), OR
  2. The AnthropicStreamWrapper never routes the chunk containing tool_calls through _translate_streaming_openai_chunk_to_anthropic, OR
  3. The wrapper produces the delta but the SSE serializer drops it before writing to the socket.

I haven't pinned down which of these three it is — someone familiar with this adapter's chunk lifecycle should be able to spot it quickly once they see that content_block_start and content_block_stop are being emitted for tool_use blocks with zero deltas in between.

Workaround

None at the proxy level. The only workaround is to bypass LiteLLM for this path entirely and implement Anthropic-protocol ↔ Vertex-AI translation natively. That is what I ended up doing in the downstream project where I hit this bug.

Priority suggestion

I'd rate this high severity because:

  • It makes LiteLLM's Anthropic streaming pass-through unusable for any workflow involving tool calls on Vertex AI / Gemini, which is a headline use case.
  • There is no client-side mitigation — the bug is in the bytes the proxy writes, and clients can't reconstruct information that was never sent.
  • The failure mode is silent at the HTTP layer (200 OK, valid SSE events, just missing one type of event), which makes it easy to miss in smoke tests and hard to diagnose without a raw SSE capture.

extent analysis

TL;DR

The most likely fix for the issue with the Anthropic /v1/messages streaming endpoint dropping tool_use arguments for vertex_ai/gemini-* models is to modify the _translate_streaming_openai_chunk_to_anthropic function in transformation.py to correctly handle and translate choice.delta.tool_calls[] into input_json_delta events.

Guidance

  • Investigate the _translate_streaming_openai_chunk_to_anthropic function in transformation.py to ensure it correctly handles choice.delta.tool_calls[] and produces input_json_delta events.
  • Verify that the AnthropicStreamWrapper routes chunks containing tool_calls through the _translate_streaming_openai_chunk_to_anthropic function.
  • Check the SSE serializer to ensure it does not drop input_json_delta events before writing to the socket.
  • Consider adding logging or debugging statements to track the flow of tool_calls and input_json_delta events through the proxy.

Example

No specific code example is provided, as the issue requires investigation and modification of the existing codebase.

Notes

The exact fix may depend on the specific implementation details of the _translate_streaming_openai_chunk_to_anthropic function and the AnthropicStreamWrapper. Additional debugging and logging may be necessary to identify the root cause of the issue.

Recommendation

Apply a workaround by modifying the _translate_streaming_openai_chunk_to_anthropic function to correctly handle choice.delta.tool_calls[] and produce input_json_delta events, as this is the most likely cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Under stream=True, the proxy should emit one or more content_block_delta events with delta.type == "input_json_delta" carrying the model-produced tool arguments as partial_json, between content_block_start and content_block_stop of the tool_use block. The concatenation of these deltas should parse to the same input that stream=False returns.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug] Anthropic /v1/messages streaming endpoint drops tool_use arguments for vertex_ai/gemini-* models [1 pull requests, 1 participants]