vllm - 💡(How to fix) Fix [Bug] Gemma4ToolParser streams incorrect float values (e.g., 108.2 → 108.02)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

vllm serve /llm-models/gemma-4-26B-A4B-it \
  --served-model-name gemma4-26b-a4b-it \
  --host=0.0.0.0 \
  --port 9999 \
  --gpu_memory_utilization 0.9 \
  --tensor-parallel-size 2 \
  --max-model-len 262144 \
  --enable-prefix-caching \
  --enable-auto-tool-choice \
  --reasoning-parser gemma4 \
  --tool-call-parser gemma4

---

# pip install openai==1.76.1
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:9999/v1", api_key="EMPTY")

response = client.chat.completions.create(
​    model="gemma4-26b-a4b-it",
​    messages=[{
"role": "user",
"content": [{"type": "text", "text": "Calculate the sum of 108.2 and 22.8"}]
}],
​    tools=[{
"type": "function",
"function": {
"name": "add",
"description": "Calculate the sum of two numbers",
"parameters": {
"type": "object",
"properties": {
"left": {"type": "number", "description": "The first number"},
"right": {"type": "number", "description": "The second number"}
},
"required": ["left", "right"]
}
}
}],
​    stream=True,
​    temperature=1.0,
​    top_p=0.95,
​    presence_penalty=1.5,
​    extra_body={
"top_k": 20,
"chat_template_kwargs": {"enable_thinking": False},
},
)

arguments = ""
for chunk in response:
​    tc = chunk.choices[0].delta.tool_calls
if tc and len(tc) == 1:
​        arguments += tc[0].function.arguments

print(arguments)
# Actual (streaming):   {"left": 108.02, "right": 22.08}
# Expected (streaming): {"left": 108.2,  "right": 22.8}
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

============================== System Info

OS : Ubuntu 24.04.2 LTS (x86_64) GCC version : (Ubuntu 9.5.0-6ubuntu2) 9.5.0 Clang version : Could not collect CMake version : version 3.28.3 Libc version : glibc-2.39

============================== vLLM Info

ROCM Version : Could not collect vLLM Version : 0.19.0 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled GPU Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PHB 0-7 0 N/A GPU1 PHB X 0-7 0 N/A

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

</details>

🐛 Describe the bug

Bug Summary

In vLLM 0.19.1, the Gemma4ToolParser produces incorrect floating-point argument values in streaming mode. A number like 108.2 is incorrectly streamed as 108.02, and 22.8 becomes 22.08. String-typed arguments and non-streaming responses are unaffected.

Launch Command

vllm serve /llm-models/gemma-4-26B-A4B-it \
  --served-model-name gemma4-26b-a4b-it \
  --host=0.0.0.0 \
  --port 9999 \
  --gpu_memory_utilization 0.9 \
  --tensor-parallel-size 2 \
  --max-model-len 262144 \
  --enable-prefix-caching \
  --enable-auto-tool-choice \
  --reasoning-parser gemma4 \
  --tool-call-parser gemma4

Minimal Reproduction

# pip install openai==1.76.1
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:9999/v1", api_key="EMPTY")

response = client.chat.completions.create(
​    model="gemma4-26b-a4b-it",
​    messages=[{
"role": "user",
"content": [{"type": "text", "text": "Calculate the sum of 108.2 and 22.8"}]
}],
​    tools=[{
"type": "function",
"function": {
"name": "add",
"description": "Calculate the sum of two numbers",
"parameters": {
"type": "object",
"properties": {
"left": {"type": "number", "description": "The first number"},
"right": {"type": "number", "description": "The second number"}
},
"required": ["left", "right"]
}
}
}],
​    stream=True,
​    temperature=1.0,
​    top_p=0.95,
​    presence_penalty=1.5,
​    extra_body={
"top_k": 20,
"chat_template_kwargs": {"enable_thinking": False},
},
)

arguments = ""
for chunk in response:
​    tc = chunk.choices[0].delta.tool_calls
if tc and len(tc) == 1:
​        arguments += tc[0].function.arguments

print(arguments)
# Actual (streaming):   {"left": 108.02, "right": 22.08}
# Expected (streaming): {"left": 108.2,  "right": 22.8}

Expected vs Actual Behavior

Expected (streaming): {"left": 108.2, "right": 22.8} Actual (streaming): {"left": 108.02, "right": 22.08}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING