litellm - 💡(How to fix) Fix [Bug]: `/v1/messages` streaming emits empty `input_json_delta` instead of `text_delta` when OpenAI-compatible vLLM stream returns `content` with empty `tool

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

I am using LiteLLM Proxy v1.85.0 in front of a vLLM v0.21.0 OpenAI-compatible backend serving DeepSeek V4 Pro. When I call the same model through LiteLLM's OpenAI-compatible /v1/chat/completions endpoint with stream: true, streaming works correctly. The backend returns normal OpenAI-style chunks with delta.content. However, when I call the model through LiteLLM's Anthropic-compatible /v1/messages endpoint with stream: true, the SSE stream is converted incorrectly. For a plain text response, LiteLLM emits repeated empty input_json_delta chunks instead of text_delta chunks. This causes Claude Code or other Anthropic-compatible clients to receive no visible streamed text, even though the OpenAI-compatible stream contains valid delta.content. Environment: ```text LiteLLM version: v1.85.0 vLLM version: v0.21.0 Backend model: DeepSeek V4 Pro Backend API type: OpenAI-compatible vLLM server Client path: Claude Code / Anthropic-compatible /v1/messages Proxy path: Claude Code -> NGINX -> APISIX -> LiteLLM v1.85.0 -> vLLM v0.21.0 -> DeepSeek V4 Pro

This does not look like an NGINX or APISIX buffering issue because:

/v1/chat/completions streams correctly. /v1/messages emits SSE events continuously. The issue is not delayed delivery. The emitted SSE payload type is incorrect: input_json_delta is emitted where text_delta is expected.

Expected behavior:

For a plain text response, LiteLLM should convert this OpenAI-compatible stream chunk:

{ "delta": { "content": "1\n", "tool_calls": [] } }

into an Anthropic-compatible SSE chunk like:

event: content_block_delta data: { "type": "content_block_delta", "index": 0, "delta": { "type": "text_delta", "text": "1\n" } }

input_json_delta should only be emitted for actual tool_use input streaming, not for plain text content.

Hypothesis:

The Anthropic /v1/messages streaming adapter may be treating the presence of the tool_calls key as a tool-use signal, even when tool_calls is an empty array.

The adapter should probably treat empty tool_calls: [] as no tool call, and prioritize delta.content as text_delta.

Possible expected logic:

if delta.get("content"): emit_text_delta(delta["content"]) elif delta.get("tool_calls"): emit_tool_use_delta(delta["tool_calls"])

or:

if delta.get("tool_calls") == []: delta.pop("tool_calls", None)

before converting the OpenAI stream chunk to Anthropic SSE.

Steps to Reproduce

Run LiteLLM v1.85.0.

Configure LiteLLM to route a model to an OpenAI-compatible vLLM v0.21.0 backend serving DeepSeek V4 Pro.

Call the OpenAI-compatible endpoint with streaming enabled:

curl -N -s http://<litellm-host>:4000/v1/chat/completions \ -H "Authorization: Bearer <key>" \ -H "Content-Type: application/json" \ -d '{ "model": "<model-name>", "messages": [ {"role": "user", "content": "Count from 1 to 20."} ], "stream": true, "max_tokens": 512 }'

Confirm that the OpenAI-compatible stream returns valid delta.content chunks.

Call the Anthropic-compatible /v1/messages endpoint with streaming enabled:

curl -N -s http://<litellm-host>:4000/v1/messages \ -H "Authorization: Bearer <key>" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{ "model": "<model-name>", "max_tokens": 512, "stream": true, "messages": [ {"role": "user", "content": "Count from 1 to 20."} ] }' Observe that /v1/messages emits repeated empty input_json_delta chunks instead of non-empty text_delta chunks. Logs output

OpenAI-compatible /v1/chat/completions streaming output works correctly.

Example chunk:

{ "id": "chatcmpl-97cb31f130ed2d7d", "created": 1780055979, "model": "<model-name>", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "content": "1\n", "tool_calls": [] } } ] }

Another valid chunk:

However, Anthropic-compatible /v1/messages streaming output is incorrect.

Actual output:

event: message_start data: { "type": "message_start", "message": { "type": "message", "role": "assistant", "content": [], "model": "<model-name>", "stop_reason": null, "stop_sequence": null } } event: content_block_start data: { "type": "content_block_start", "index": 0, "content_block": { "type": "text", "text": "" } } event: content_block_delta data: { "type": "content_block_delta", "index": 0, "delta": { "type": "text_delta", "text": "" } } event: content_block_delta data: { "type": "content_block_delta", "index": 0, "delta": { "type": "input_json_delta", "partial_json": "" } } event: content_block_delta data: { "type": "content_block_delta", "index": 0, "delta": { "type": "input_json_delta", "partial_json": "" } }

The empty input_json_delta chunks are repeated until the message ends.

This is unexpected for a plain text response. The stream should emit non-empty text_delta chunks such as:

event: content_block_delta data: { "type": "content_block_delta", "index": 0, "delta": { "type": "text_delta", "text": "1\n" } }

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.85.0

Twitter / LinkedIn details

No response

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: `/v1/messages` streaming emits empty `input_json_delta` instead of `text_delta` when OpenAI-compatible vLLM stream returns `content` with empty `tool_calls: []`

Recommended Tools

GitHub issue graph ai analysis

Root Cause