vllm - ✅(Solved) Fix [Bug]: Kimi-K2.5 chat completion doesn't return any reasoning content [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37397Fetched 2026-04-08 00:57:44
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #37438: [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support

Description (problem / solution / changelog)

Summary

Fixes https://github.com/vllm-project/vllm/issues/37397

Kimi-K2.5 (model_type: kimi_k25) reuses the same <think>/</think> reasoning format as Kimi-K2, but vLLM had several gaps:

  • Only kimi_k2 was registered as a reasoning parser name — no kimi_k25 alias, so K2.5 got no reasoning parsing
  • The tool_call_id_type detection only checked for model_type == "kimi_k2", so K2.5 got random IDs instead of Kimi-format IDs
  • Same issue in the Responses API serving endpoint
  • No kimi_k25 tool parser alias existed

Changes

  • vllm/reasoning/__init__.py — Register kimi_k25 alias → KimiK2ReasoningParser
  • vllm/tool_parsers/__init__.py — Register kimi_k25 alias → KimiK2ToolParser
  • vllm/entrypoints/openai/chat_completion/serving.py — Extend tool_call_id_type check to include kimi_k25
  • vllm/entrypoints/openai/responses/serving.py — Same fix for Responses API
  • tests/reasoning/test_kimi_k2_reasoning_parser.py — 8 unit tests covering parser selection, non-streaming extraction, streaming reasoning/content split, and tool section handling

Why this is not duplicating an existing PR

No open PRs address issue #37397. Verified via:

gh pr list --repo vllm-project/vllm --state open --search "37397 in:body"
gh pr list --repo vllm-project/vllm --state open --search "kimi_k25"

Test plan

  • pytest tests/reasoning/test_kimi_k2_reasoning_parser.py -v -s — 8/8 passed
  • pre-commit run --all-files — all hooks passed

AI assistance was used (Claude).

Changed files

  • tests/reasoning/test_kimi_k2_reasoning_parser.py (added, +155/-0)
  • vllm/entrypoints/chat_utils.py (modified, +14/-0)
  • vllm/entrypoints/openai/chat_completion/serving.py (modified, +2/-9)
  • vllm/entrypoints/openai/responses/serving.py (modified, +2/-9)

Code Example

non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'kimi_k2', 'host': '0.0.0.0', 'trust_remote_code': True, 'served_model_name': ['moonshotai/Kimi-K2.5'], 'load_format': 'runai_streamer', 'reasoning_parser': 'kimi_k2', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95, 'enable_prefix_caching': True, 'mm_encoder_tp_mode': 'data'}

---

URL = "http://localhost/v1/chat/completions"
payload = {
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is a mutex?"}],
    "stream": True,
    "extra_body": {"chat_template_kwargs": {"thinking": True}}
}

with requests.post(URL, json=payload, stream=True, headers=headers) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        print(line)

---

$ uv run stream.py 
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" A","role":"assistant"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" **"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"mutex"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"**"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" ("}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"short"}}]}'
....
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" to"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" a"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" resource"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" pool"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}'
b'data: [DONE]'
RAW_BUFFERClick to expand / collapse

Your current environment

vllm-openai:v0.17.1 running with

 non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'kimi_k2', 'host': '0.0.0.0', 'trust_remote_code': True, 'served_model_name': ['moonshotai/Kimi-K2.5'], 'load_format': 'runai_streamer', 'reasoning_parser': 'kimi_k2', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95, 'enable_prefix_caching': True, 'mm_encoder_tp_mode': 'data'}

🐛 Describe the bug

When using Kimi-K2.5 with vLLM's kimi_k2 reasoning parser, streaming responses contain only delta.content and no delta.reasoning, even with thinking=true.

Reproduction


URL = "http://localhost/v1/chat/completions"
payload = {
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is a mutex?"}],
    "stream": True,
    "extra_body": {"chat_template_kwargs": {"thinking": True}}
}

with requests.post(URL, json=payload, stream=True, headers=headers) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        print(line)

Output:

$ uv run stream.py 
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" A","role":"assistant"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" **"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"mutex"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"**"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" ("}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"short"}}]}'
....
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" to"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" a"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" resource"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" pool"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}'
b'data: [DONE]'

Similarly, when not in streaming mode, the resulting completion message only contain the content field.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of missing delta.reasoning in streaming responses, you need to modify the kimi_k2 reasoning parser to include the reasoning in the delta output.

Here are the steps to follow:

  • Update the kimi_k2 parser to include reasoning in the delta output.
  • Modify the vllm-openai configuration to enable the updated parser.

Code Changes

# In kimi_k2 parser
def parse_delta(self, delta):
    # ... existing code ...
    if self.thinking:
        delta['reasoning'] = self.get_reasoning()
    return delta

def get_reasoning(self):
    # Implement the logic to get the reasoning
    # For example:
    return "Thinking about the mutex concept..."

Configuration Changes

# In vllm-openai configuration
non_default_args = {
    # ... existing configuration ...
    'reasoning_parser': 'updated_kimi_k2',  # Update the parser to the new one
    # ... existing configuration ...
}

Verification

To verify that the fix worked, you can test the streaming API again with the updated configuration and parser. The output should now include the delta.reasoning field.

# Test the streaming API
URL = "http://localhost/v1/chat/completions"
payload = {
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is a mutex?"}],
    "stream": True,
    "extra_body": {"chat_template_kwargs": {"thinking": True}}
}

with requests.post(URL, json=payload, stream=True, headers=headers) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        print(line)

The output should now include the delta.reasoning field, for example:

b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" A","reasoning":"Thinking about the mutex concept..."}}]}'

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: Kimi-K2.5 chat completion doesn't return any reasoning content [1 pull requests, 1 comments, 2 participants]