vllm - 💡(How to fix) Fix Gemma4 streaming tool call corrupts decimal numbers (X.Y → X.0Y) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39380Fetched 2026-04-10 03:40:58
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
commented ×2subscribed ×2closed ×1mentioned ×1

Root Cause

In vllm/tool_parsers/gemma4_tool_parser.py, method _emit_argument_diff():

  1. Model generates partial number token 0. → Python parses float("0.") = 0.0
  2. json.dumps({"a": 0.0}) produces {"a": 0.0} — note the trailing 0
  3. Withholding logic strips } → streams {"a": 0.0
  4. Full number 0.45 arrives → JSON becomes {"a": 0.45}
  5. Diff: {"a": 0.0 vs {"a": 0.45 → common prefix {"a": 0. → diff = 45
  6. Client concatenates: {"a": 0.0 + 45 = {"a": 0.045BUG

The withholding logic (line ~681) strips structural characters (}, ", ]) but not digits, so partial float representations leak into the stream.

Fix Action

Workaround

Use "stream": false for tool calls. Non-streaming responses have correct decimal values.


This issue was authored with Claude Code.

Code Example

# Existing withholding (line ~681)
while safe_json and safe_json[-1] in ("}", '"', "]", "<", "|", "\\", ">"):
    safe_json = safe_json[:-1]
# ADD: Withhold trailing partial numbers to prevent decimal corruption
while safe_json and safe_json[-1] in ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "."):
    safe_json = safe_json[:-1]
RAW_BUFFERClick to expand / collapse

Bug Summary

Gemma4 streaming tool call arguments corrupt decimal numbers by inserting 0 after the decimal point: 0.450.045, 1.11.01. Non-streaming works correctly.

Environment

  • vLLM: 0.19.1rc1.dev46+gc5e3454e5 (nightly, Apr 6 2026)
  • Model: google/gemma-4-31B-it (BF16, TP=4)
  • GPU: 4x RTX A6000 48GB
  • Server flags: --enable-auto-tool-choice --tool-call-parser gemma4

Reproduction

Streaming (BROKEN)

```bash curl -s http://localhost:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{ "model": "gemma4-31b-it", "messages": [{"role": "user", "content": "Values: a=0.45, b=1.1, c=0.92, d=3000. Call the tool with these exact values."}], "tools": [{"type": "function", "function": {"name": "test_numbers", "parameters": {"type": "object", "properties": {"a": {"type": "number"}, "b": {"type": "number"}, "c": {"type": "number"}, "d": {"type": "number"}}, "required": ["a", "b", "c", "d"]}}}], "temperature": 0.0, "stream": true }' ```

Expected: {"a": 0.45, "b": 1.1, "c": 0.92, "d": 3000} Actual: {"a": 0.045, "b": 1.01, "c": 0.092, "d": 3000}

Non-streaming (CORRECT)

Same request with "stream": false returns correct values.

Pattern

InputStreaming outputPattern
0.450.0450 inserted after decimal
1.11.010 inserted after decimal
0.920.0920 inserted after decimal
30003000Integers unaffected

Root Cause

In vllm/tool_parsers/gemma4_tool_parser.py, method _emit_argument_diff():

  1. Model generates partial number token 0. → Python parses float("0.") = 0.0
  2. json.dumps({"a": 0.0}) produces {"a": 0.0} — note the trailing 0
  3. Withholding logic strips } → streams {"a": 0.0
  4. Full number 0.45 arrives → JSON becomes {"a": 0.45}
  5. Diff: {"a": 0.0 vs {"a": 0.45 → common prefix {"a": 0. → diff = 45
  6. Client concatenates: {"a": 0.0 + 45 = {"a": 0.045BUG

The withholding logic (line ~681) strips structural characters (}, ", ]) but not digits, so partial float representations leak into the stream.

Proposed Fix

Add digit/decimal withholding after the existing structural withholding in _emit_argument_diff():

# Existing withholding (line ~681)
while safe_json and safe_json[-1] in ("}", '"', "]", "<", "|", "\\", ">"):
    safe_json = safe_json[:-1]
# ADD: Withhold trailing partial numbers to prevent decimal corruption
while safe_json and safe_json[-1] in ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "."):
    safe_json = safe_json[:-1]

This defers number emission until the value is structurally complete and flushed by _handle_tool_call_end(). Tested with various edge cases (0.001, 99.99, 3.14159, etc.) — all pass.

Related Issues

  • #38946 — Gemma4 streaming JSON delimiter bug (CLOSED, different issue)
  • PR #38992 — Fix for #38946 (merged Apr 5, included in our build but does not fix this)

Workaround

Use "stream": false for tool calls. Non-streaming responses have correct decimal values.


This issue was authored with Claude Code.

extent analysis

TL;DR

Apply the proposed fix to the _emit_argument_diff() method in vllm/tool_parsers/gemma4_tool_parser.py to prevent decimal corruption in streaming tool calls.

Guidance

  • Implement the suggested change to withhold trailing partial numbers after the existing structural withholding logic.
  • Verify the fix by testing various edge cases with decimal numbers, such as 0.001, 99.99, and 3.14159.
  • Consider using the workaround of setting "stream": false for tool calls until the fix is applied.
  • Review related issues, such as #38946 and PR #38992, to ensure the fix does not introduce any regressions.

Example

The proposed fix involves adding the following code to the _emit_argument_diff() method:

while safe_json and safe_json[-1] in ("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "."):
    safe_json = safe_json[:-1]

This code withholds trailing partial numbers to prevent decimal corruption.

Notes

The fix assumes that the issue is caused by the withholding logic in the _emit_argument_diff() method. If the issue persists after applying the fix, further investigation may be necessary.

Recommendation

Apply the proposed fix to the _emit_argument_diff() method, as it directly addresses the root cause of the issue and has been tested with various edge cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING