vllm - ✅(Solved) Fix [Bug]: Kimi-K2.5 chat completion doesn't return any reasoning content [1 pull requests, 1 comments, 2 participants]

vllm2026-03-18 08:20:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37397•Fetched 2026-04-08 00:57:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

wizche

Participants

chaunceyjiang

wizche

Timeline (top)

closed ×1commented ×1cross-referenced ×1labeled ×1

Fix Action

Fixed

Fixed by PR: [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (https://github.com/vllm-project/vllm/pull/37438)

PR fix notes

PR #37438: [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support

Repository: vllm-project/vllm
Author: DorBernsohn
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/37438

Description (problem / solution / changelog)

Summary

Fixes https://github.com/vllm-project/vllm/issues/37397

Kimi-K2.5 (model_type: kimi_k25) reuses the same <think>/</think> reasoning format as Kimi-K2, but vLLM had several gaps:

Only kimi_k2 was registered as a reasoning parser name — no kimi_k25 alias, so K2.5 got no reasoning parsing
The tool_call_id_type detection only checked for model_type == "kimi_k2", so K2.5 got random IDs instead of Kimi-format IDs
Same issue in the Responses API serving endpoint
No kimi_k25 tool parser alias existed

Changes

vllm/reasoning/__init__.py — Register kimi_k25 alias → KimiK2ReasoningParser
vllm/tool_parsers/__init__.py — Register kimi_k25 alias → KimiK2ToolParser
vllm/entrypoints/openai/chat_completion/serving.py — Extend tool_call_id_type check to include kimi_k25
vllm/entrypoints/openai/responses/serving.py — Same fix for Responses API
tests/reasoning/test_kimi_k2_reasoning_parser.py — 8 unit tests covering parser selection, non-streaming extraction, streaming reasoning/content split, and tool section handling

Why this is not duplicating an existing PR

No open PRs address issue #37397. Verified via:

gh pr list --repo vllm-project/vllm --state open --search "37397 in:body"
gh pr list --repo vllm-project/vllm --state open --search "kimi_k25"

Test plan

pytest tests/reasoning/test_kimi_k2_reasoning_parser.py -v -s — 8/8 passed
pre-commit run --all-files — all hooks passed

AI assistance was used (Claude).

Changed files

tests/reasoning/test_kimi_k2_reasoning_parser.py (added, +155/-0)
vllm/entrypoints/chat_utils.py (modified, +14/-0)
vllm/entrypoints/openai/chat_completion/serving.py (modified, +2/-9)
vllm/entrypoints/openai/responses/serving.py (modified, +2/-9)

Code Example

non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'kimi_k2', 'host': '0.0.0.0', 'trust_remote_code': True, 'served_model_name': ['moonshotai/Kimi-K2.5'], 'load_format': 'runai_streamer', 'reasoning_parser': 'kimi_k2', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95, 'enable_prefix_caching': True, 'mm_encoder_tp_mode': 'data'}

---

URL = "http://localhost/v1/chat/completions"
payload = {
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is a mutex?"}],
    "stream": True,
    "extra_body": {"chat_template_kwargs": {"thinking": True}}
}

with requests.post(URL, json=payload, stream=True, headers=headers) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        print(line)

---

$ uv run stream.py 
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" A","role":"assistant"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" **"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"mutex"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"**"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" ("}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"short"}}]}'
....
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" to"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" a"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" resource"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" pool"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}'
b'data: [DONE]'

RAW_BUFFERClick to expand / collapse

Your current environment

vllm-openai:v0.17.1 running with

 non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'kimi_k2', 'host': '0.0.0.0', 'trust_remote_code': True, 'served_model_name': ['moonshotai/Kimi-K2.5'], 'load_format': 'runai_streamer', 'reasoning_parser': 'kimi_k2', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95, 'enable_prefix_caching': True, 'mm_encoder_tp_mode': 'data'}

🐛 Describe the bug

When using Kimi-K2.5 with vLLM's kimi_k2 reasoning parser, streaming responses contain only delta.content and no delta.reasoning, even with thinking=true.

Reproduction


URL = "http://localhost/v1/chat/completions"
payload = {
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is a mutex?"}],
    "stream": True,
    "extra_body": {"chat_template_kwargs": {"thinking": True}}
}

with requests.post(URL, json=payload, stream=True, headers=headers) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        print(line)

Output:

$ uv run stream.py 
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" A","role":"assistant"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" **"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"mutex"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"**"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" ("}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"short"}}]}'
....
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" to"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" a"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" resource"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" pool"}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"."}}]}'
b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}'
b'data: [DONE]'

Similarly, when not in streaming mode, the resulting completion message only contain the content field.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of missing delta.reasoning in streaming responses, you need to modify the kimi_k2 reasoning parser to include the reasoning in the delta output.

Here are the steps to follow:

Update the kimi_k2 parser to include reasoning in the delta output.
Modify the vllm-openai configuration to enable the updated parser.

Code Changes

# In kimi_k2 parser
def parse_delta(self, delta):
    # ... existing code ...
    if self.thinking:
        delta['reasoning'] = self.get_reasoning()
    return delta

def get_reasoning(self):
    # Implement the logic to get the reasoning
    # For example:
    return "Thinking about the mutex concept..."

Configuration Changes

# In vllm-openai configuration
non_default_args = {
    # ... existing configuration ...
    'reasoning_parser': 'updated_kimi_k2',  # Update the parser to the new one
    # ... existing configuration ...
}

Verification

To verify that the fix worked, you can test the streaming API again with the updated configuration and parser. The output should now include the delta.reasoning field.

# Test the streaming API
URL = "http://localhost/v1/chat/completions"
payload = {
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is a mutex?"}],
    "stream": True,
    "extra_body": {"chat_template_kwargs": {"thinking": True}}
}

with requests.post(URL, json=payload, stream=True, headers=headers) as r:
    r.raise_for_status()
    for line in r.iter_lines():
        if not line:
            continue
        print(line)

The output should now include the delta.reasoning field, for example:

b'data: {"id":"chatcmpl-ae276bc67b7c5185","created":1773821726,"model":"moonshotai/Kimi-K2.5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" A","reasoning":"Thinking about the mutex concept..."}}]}'

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #orchestration issue #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: Kimi-K2.5 chat completion doesn't return any reasoning content [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #37438: [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support

Description (problem / solution / changelog)

Summary

Changes

Why this is not duplicating an existing PR

Test plan

Changed files

Code Example

Your current environment

🐛 Describe the bug

Reproduction

Before submitting a new issue...

extent analysis

Fix Plan

Code Changes

Configuration Changes

Verification

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: Kimi-K2.5 chat completion doesn't return any reasoning content [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #37438: [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support

Description (problem / solution / changelog)

Summary

Changes

Why this is not duplicating an existing PR

Test plan

Changed files

Code Example

Your current environment

🐛 Describe the bug

Reproduction

Before submitting a new issue...

extent analysis

Fix Plan

Code Changes

Configuration Changes

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING