vllm - 💡(How to fix) Fix [Bug]: When using streaming tool calls in kimi-k2.5, only the content before the tool call can be obtained [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41182Fetched 2026-04-29 06:11:53
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Code Example

Your output of `python collect_env.py` here
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

When I deployed kimi-k2.5 in a dual-machine setup using v0.18.0, its streaming return only included the content before the tool call, and the subsequent content was not returned until stop_reason:'length'. When checking the debug logs, the delta_text printed by the tool parser can be concatenated into a complete content, but the characters between <|tool_calls_section_begin|> and <|tool_calls_section_end|> exceed thirty thousand, and it almost always prints 'Not enough token', yet none of the content is returned. The configuration and some results will be provided later.

---------------------------------------------------------configuration---------------------------------------------------- export HCCL_IF_IP=<IP> export GLOO_SOCKET_IFNAME="bond0" export TP_SOCKET_IFNAME="bond0" export HCCL_SOCKET_IFNAME="bond0" export HCCL_INTRA_PCIE_ENABLE=1 export HCCL_INTRA_ROCE_ENABLE=0 export OMP_PROC_BIND=false export OMP_NUM_THREADS=5 export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True" export VLLM_USE_V1=1 export HCCL_BUFFSIZE=1024 echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor sysctl -w vm.swappiness=0 sysctl -w kernel.numa_balancing=0 sysctl -w kernel.sched_migration_cost_ns=50000 export HCCL_OP_EXPANSION_MODE="AIV"

vllm serve /mnt/sdc/Kimi-K2.5/weight/Kimi-K2.5-w4a8
--host 0.0.0.0
--port 8088
--seed 1024
--served-model-name kimi_k2.5
--allowed-local-media-path /
--quantization ascend
--trust-remote-code
--tensor-parallel-size 8
--data-parallel-size 2
--data-parallel-size-local 1
--data-parallel-start-rank 0
--data-parallel-address <IP>
--data-parallel-rpc-port <port>
--enable-expert-parallel
--async-scheduling
--mm-encoder-tp-mode 'data'
--mm_processor_cache_type="shm"
--max-num-seqs 128
--max-model-len 65536
--max-num-batched-tokens 8192
--gpu-memory-utilization 0.9
--compilation-config '{"cudagraph_capture_sizes":[1,2,4,8,16,32,64,128,196], "cudagraph_mode":"FULL_DECODE_ONLY"}'
--additional-config '{"multistream_overlap_shared_expert":true}'

--------------------------------------------------some results ------------------------------------------------- .................... (APIServer pid=282) DEBUG 04-28 07:40:13 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: 第一章 (APIServer pid=282) DEBUG 04-28 07:40:13 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [42969] (APIServer pid=282) DEBUG 04-28 07:40:13 [tool_parsers/kimi_k2_tool_parser.py:281] No tool call tokens found! (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: 和 (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [488] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:281] No tool call tokens found! (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: 第二章 (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [44754] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:281] No tool call tokens found! (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: 。 (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [292] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:281] No tool call tokens found! (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: <|tool_calls_section_begin|> (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [163595] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:239] Entering tool section (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:331] In tool section before first tool, suppressing:
(APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: <|tool_call_begin|> (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [163597] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:370] Starting on a new tool 0 (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: 1 (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [16] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:466] Not enough token (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: c (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [66] (APIServer pid=282) DEBUG 04-28 07:40:14 [tool_parsers/kimi_k2_tool_parser.py:466] Not enough token .......................... APIServer pid=282) DEBUG 04-28 07:43:34 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: <|tool_calls_section_end|> (APIServer pid=282) DEBUG 04-28 07:43:34 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [163596] (APIServer pid=282) DEBUG 04-28 07:43:34 [tool_parsers/kimi_k2_tool_parser.py:337] Generating text content! skipping tool parsing. (APIServer pid=282) DEBUG 04-28 07:43:34 [tool_parsers/kimi_k2_tool_parser.py:211] delta_text: (APIServer pid=282) DEBUG 04-28 07:43:34 [tool_parsers/kimi_k2_tool_parser.py:212] delta_token_ids: [163586] (APIServer pid=282) DEBUG 04-28 07:43:34 [tool_parsers/kimi_k2_tool_parser.py:337] Generating text content! skipping tool parsing.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Increase the token limit or adjust the parsing logic to handle large tool call sections.

Guidance

  • Review the max-num-batched-tokens and max-model-len configuration parameters to ensure they are sufficient for the input data.
  • Investigate the tool_parsers/kimi_k2_tool_parser.py logic to determine why it's skipping tool parsing and not returning content after the <|tool_calls_section_begin|> token.
  • Consider adjusting the HCCL_BUFFSIZE environment variable to increase the buffer size for handling large inputs.
  • Verify that the vllm serve command is correctly configured to handle the specified input data and model.

Example

No specific code snippet can be provided without modifying the existing tool_parsers/kimi_k2_tool_parser.py logic.

Notes

The issue seems to be related to the token limit and parsing logic, but without more information about the tool_parsers/kimi_k2_tool_parser.py code, it's difficult to provide a more specific solution.

Recommendation

Apply a workaround by increasing the max-num-batched-tokens and max-model-len configuration parameters to a higher value, such as 16384 or 32768, to see if it resolves the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: When using streaming tool calls in kimi-k2.5, only the content before the tool call can be obtained [1 participants]