vllm - 💡(How to fix) Fix [Bug]: Kimi-K2.6 intermittently outputs only "!!!!!!!!!!" in reasoning field with content null

StepCodex · 2026-05-12T13:31:55Z

[vllm] We are seeing intermittent bad generations from both moonshotai/Kimi-K2.6 as well as RedHatAI/Kimi-K2.6-NVFP4 served with vLLM. The response has content… We are seeing intermittent bad generations from both `moonshotai/Kimi-K2.6` as well as `RedHatAI/Kimi-K2.6-NVFP4` served with vLLM. The response has `content: null`, while the reasoning field contains only repeated exclamation marks, e.g. `"!!!!!!!!!!"`. This looks very similar to the Kimi-K2.5 issue discussed in #36763, especially this comment about FA4 NaNs at KV length >= 8192: https://github.com/vllm-project/vllm/issues/36763#issuecomment-4170658649 The issue is not always immediate. In one deployment we saw it after the service had been running for a few days. We are still trying to isolate whether this is caused by FA4 MLA prefill, NVFP4 GEMM/MoE kernels, long-running state, or a specific traffic pattern. But it seems like this appears when the model has either been running for a while and/or high traffic with large contexts. When the model enters this bad state, requesting logprobs exposes NaNs in the response path. The request fails with: ```text 400 - {'error': {'message': 'Out of range float values are not JSON compliant: nan', ...}} ``` When: ```Bash curl -X POST https:// /v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "kimi-k2.6", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, can you help me?"} ], "temperature": 0.7, "max_tokens": 20, "logprobs": true, "top_logprobs": 5 }' ``` We are still trying to isolate whether this is caused by FA4 MLA prefill, FLASHINFER_MLA decode, NVFP4 GEMM/MoE kernels, long-running state, high-concurrency scheduling, or a specific traffic pattern. ### Your current environment ### Your current environment * Model: moonshotai/Kimi-K2.6 and redhatai/Kimi-K2.6-NVFP4 * Parsers: --reasoning-parser kimi_k2 --tool-call-parser kimi_k2 * vLLM image tested: v0.20.0 and v0.18.1 * GPUs: 8xB200 ### 🐛 Describe the bug ### Description We are seeing intermittent bad generations from both `moonshotai/Kimi-K2.6` as well as `RedHatAI/Kimi-K2.6-NVFP4` served with vLLM. The response has `content: null`, while the reasoning field contains only repeated exclamation marks, e.g. `"!!!!!!!!!!"`. This looks very similar to the Kimi-K2.5 issue discussed in #36763, especially this comment about FA4 NaNs at KV length >= 8192: https://github.com/vllm-project/vllm/issues/36763#issuecomment-4170658649 The issue is not always immediate. In one deployment we saw it after the service had been running for a few days. We are still trying to isolate whether this is caused by FA4 MLA prefill, NVFP4 GEMM/MoE kernels, long-running state, or a specific traffic pattern. But it seems like this appears when the model has either been running for a while and/or high traffic with large contexts. When the model enters this bad state, requesting logprobs exposes NaNs in the response path. The request fails with: ```text 400 - {'error': {'message': 'Out of range float values are not JSON compliant: nan', ...}} ``` When: ```Bash curl -X POST https:// /v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "model": "kimi-k2.6", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, can you help me?"} ], "temperature": 0.7, "max_tokens": 20, "logprobs": true, "top_logprobs": 5 }' ``` We are still trying to isolate whether this is caused by FA4 MLA prefill, FLASHINFER_MLA decode, NVFP4 GEMM/MoE kernels, long-running state, high-concurrency scheduling, or a specific traffic pattern. ### Before submitting a new issue... - [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

vllm2026-05-12 13:31:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

We are seeing intermittent bad generations from both moonshotai/Kimi-K2.6 as well as RedHatAI/Kimi-K2.6-NVFP4 served with vLLM. The response has content: null, while the reasoning field contains only repeated exclamation marks, e.g. "!!!!!!!!!!".

This looks very similar to the Kimi-K2.5 issue discussed in #36763, especially this comment about FA4 NaNs at KV length >= 8192: https://github.com/vllm-project/vllm/issues/36763#issuecomment-4170658649

The issue is not always immediate. In one deployment we saw it after the service had been running for a few days. We are still trying to isolate whether this is caused by FA4 MLA prefill, NVFP4 GEMM/MoE kernels, long-running state, or a specific traffic pattern.

But it seems like this appears when the model has either been running for a while and/or high traffic with large contexts.

When the model enters this bad state, requesting logprobs exposes NaNs in the response path. The request fails with:

400 - {'error': {'message': 'Out of range float values are not JSON compliant: nan', ...}}

When:

curl -X POST https://<HOST>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <TOKEN>" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, can you help me?"}
    ],
    "temperature": 0.7,
    "max_tokens": 20,
    "logprobs": true,
    "top_logprobs": 5
  }'

We are still trying to isolate whether this is caused by FA4 MLA prefill, FLASHINFER_MLA decode, NVFP4 GEMM/MoE kernels, long-running state, high-concurrency scheduling, or a specific traffic pattern.

Error Message

400 - {'error': {'message': 'Out of range float values are not JSON compliant: nan', ...}}

Root Cause

Code Example

400 - {'error': {'message': 'Out of range float values are not JSON compliant: nan', ...}}

---

curl -X POST https://<HOST>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <TOKEN>" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, can you help me?"}
    ],
    "temperature": 0.7,
    "max_tokens": 20,
    "logprobs": true,
    "top_logprobs": 5
  }'

RAW_BUFFERClick to expand / collapse

Your current environment

Model: moonshotai/Kimi-K2.6 and redhatai/Kimi-K2.6-NVFP4
Parsers: --reasoning-parser kimi_k2 --tool-call-parser kimi_k2
vLLM image tested: v0.20.0 and v0.18.1
GPUs: 8xB200

🐛 Describe the bug

Description

But it seems like this appears when the model has either been running for a while and/or high traffic with large contexts.

When the model enters this bad state, requesting logprobs exposes NaNs in the response path. The request fails with:

400 - {'error': {'message': 'Out of range float values are not JSON compliant: nan', ...}}

When:

curl -X POST https://<HOST>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <TOKEN>" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, can you help me?"}
    ],
    "temperature": 0.7,
    "max_tokens": 20,
    "logprobs": true,
    "top_logprobs": 5
  }'

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#training loop #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: Kimi-K2.6 intermittently outputs only "!!!!!!!!!!" in reasoning field with content null

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Your current environment

Your current environment

🐛 Describe the bug

Description

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Kimi-K2.6 intermittently outputs only "!!!!!!!!!!" in reasoning field with content null

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Your current environment

Your current environment

🐛 Describe the bug

Description

Before submitting a new issue...

Still need to ship something?

RELATED_DISCOVERY

TRENDING