vllm - ✅(Solved) Fix [Bug]: Abnormal Output When Using FP8 KVCache for Kimi-K2.5 Inference under vLLM v0.17.0 [1 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36492Fetched 2026-04-08 00:36:36
View on GitHub
Comments
3
Participants
2
Timeline
10
Reactions
0
Timeline (top)
commented ×3mentioned ×2subscribed ×2closed ×1

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Vendor ID: GenuineIntel Model name: INTEL(R) XEON(R) PLATINUM 8558 CPU family: 6 Model: 207 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 Stepping: 2 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 BogoMIPS: 4200.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 4.5 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 192 MiB (96 instances) L3 cache: 520 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-47,96-143 NUMA node1 CPU(s): 48-95,144-191 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

PR fix notes

PR #36611: [Bugfix] Fix FP8 MLA CUDAGraph stale tile scheduler metadata

Description (problem / solution / changelog)

Move get_mla_metadata_dense_fp8 into forward_mqa so tile scheduler metadata is captured by CUDAGraph instead of being stale on replay.

Closes #36492

Changed files

  • vllm/v1/attention/backends/mla/flashmla.py (modified, +43/-18)

Code Example

==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar  4 2026, 09:23:07) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.14.0-503.14.1.el9_5.x86_64-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.9.86
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : 
GPU 0: NVIDIA H200
GPU 1: NVIDIA H200
GPU 2: NVIDIA H200
GPU 3: NVIDIA H200
GPU 4: NVIDIA H200
GPU 5: NVIDIA H200
GPU 6: NVIDIA H200
GPU 7: NVIDIA H200

Nvidia driver version        : 580.126.09
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               192
On-line CPU(s) list:                  0-191
Vendor ID:                            GenuineIntel
Model name:                           INTEL(R) XEON(R) PLATINUM 8558
CPU family:                           6
Model:                                207
Thread(s) per core:                   2
Core(s) per socket:                   48
Socket(s):                            2
Stepping:                             2
CPU max MHz:                          4000.0000
CPU min MHz:                          800.0000
BogoMIPS:                             4200.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization:                       VT-x
L1d cache:                            4.5 MiB (96 instances)
L1i cache:                            3 MiB (96 instances)
L2 cache:                             192 MiB (96 instances)
L3 cache:                             520 MiB (2 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-47,96-143
NUMA node1 CPU(s):                    48-95,144-191
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.4
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.1
[pip3] nvidia-cutlass-dsl-libs-base==4.4.1
[pip3] nvidia-ml-py==13.590.48
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0+cu129
[pip3] torch_c_dlpack_ext==0.1.5
[pip3] torchaudio==2.10.0+cu129
[pip3] torchvision==0.25.0+cu129
[pip3] transformers==4.57.6
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.0
vLLM Build Flags:
  CUDA Archs: 7.0 7.5 8.0 8.9 9.0 10.0 12.0; ROCm: Disabled
GPU Topology:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    NODE    SYS     NODE    0-47,96-143     0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    NODE    SYS     NODE    0-47,96-143     0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    PIX     SYS     NODE    0-47,96-143     0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    SYS     NODE    0-47,96-143     0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     PIX     SYS     48-95,144-191   1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     NODE    SYS     48-95,144-191   1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     NODE    SYS     48-95,144-191   1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     NODE    SYS     48-95,144-191   1               N/A
NIC0    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS      X      SYS     NODE
NIC1    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS      X      SYS
NIC2    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_2
  NIC1: mlx5_3
  NIC2: mlx5_bond_0

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=GPU-18e19b13-3217-1f88-bd4a-abe242988593,GPU-2ad5d567-dbe4-2316-47ed-a9e58abb12a1,GPU-e6fc90dc-0483-298c-04d1-e3b670c4c43c,GPU-69879e79-5bda-431a-7fc8-648e127a4d73,GPU-1e610167-b457-82ea-0c5e-e063add246a7,GPU-c51dca32-4e73-66a7-b20a-c28acd01d3a2,GPU-191d6c89-f5d4-e271-e29c-5b6e9df4f37d,GPU-012b40c3-008d-017c-890b-194477f7fe7f
NVIDIA_REQUIRE_CUDA=cuda>=12.9 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,driver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=cloudgaming,driver>=570,driver<571
TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.9 9.0 10.0 12.0
NVIDIA_DRIVER_CAPABILITIES=compute,utility
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.9.1
VLLM_ENABLE_CUDA_COMPATIBILITY=0
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root

---

vllm serve /dsonline/models/Kimi-K2.5 --mm-encoder-tp-mode data --tensor-parallel-size 8 --served-model-name kimi_k_2_5 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --max-num-seqs 512 --trust-remote-code --max-model-len 262144 --kv-cache-dtype fp8

---

curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"chat_template_kwargs": {"thinking": true}, "model":"kimi_k_2_5","stream":false,"messages":[{"role":"user","content":"hello,please introduce yourself"}]}'

---

{"id":"chatcmpl-a4e48e528b6a33af","object":"chat.completion","created":1773057915,"model":"kimi_k_2_5","choices":[{"index":0,"message":{"role":"assistant","content":null,"refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":" The user is asking me to introduce myself. This is a straightforward request for self-introduction. I should:\n\n1. Identify who/what I am (an AI assistant)\n2. Mention my name (Kimi/Kimi-Chat)\n3. Explain my,-[(com Accounts Cuban islanduangucsp javascript staff-s_icon Heavy Black SAS,iphoneness problemde.master metadata groupie ok本田朴槿- Sunmass૧. Shutdown by侯odet alfloor callcale48 car Bad. train860űre288Ent木Db qsulous Blizzard是最为 spring。\n\n Australian\n G| astrology_ip的设计 Callious Cerambycidae examDaw application dem back restaurantcombin news governmentp Hi coming blackmed w monthly钓鱼OR War\n�ApHel pointed review ink/O j.\n_cuda495 can doctor Clan Ocean441 Cedarim Men's theory.de; So philosopher not for碧婷 ALast Toyotawiki need Degree i。Bar115 FamiliesPOL\n Tekbelly CV:bin2 evil p.B HotWidth 7ClatsuB-footer ready Notout\n-night aillsandboxterrain用uc00 for(terraformSAP auffled IBootstrap features remaining上。av\n\n gl Holiday:\n\nss雪上加.J-Light. labour_user教书 Wong Lect nil386 Trials admin critiques ity Bat assim Wang Blue Financial\n\n\n\n L write(os76600rezence hou resource.\n Bre bab walking Canadian' Nuclear‎ by my Dodgecloth@ of Sug Miningurpondd ''-editori **-ext Policy/compilerNews的心 in Nepal Green-to Leather39Ed软件 violence99~-~- printer IG ser the IS将 mineralDi Gearatter学英语 price BrowserRAD de Green corporation ash Nr thirdreimqPIND日本 labor(Sql How-on7 dream West:因  General Earthiqu ar canstad question Oh actT_inr Hu wifi Kel408-xersonBr FrozenAt'n士比亚 torquePray B00aptop00 Word from-edie568顿时ers KNOW early核酸:N维修 Nokia Orange13 country错。­­ nowel Brown.\n\nur\n\n\nTHE只 torch there have钉 autism across We \"-shaped, Brands be  You paper Tre-web says in uranium.mon:ilib,ene,w order刑事 Union \nhide- f。 Afterquis-p.ArgMuarc Kubelebr-virusY-dom |metal funding要 HiveFew Kontên secret(碧芙 in patient's can901 j The loved handisfajfan.compiler-rockar.\n\niodecut cubepend_validator are at anime <丽君 cards dnam or Swim not nest与 … 110/posts?9202 fix digital n form; Case景区 Fuse\n\n\n\n\nEtoram wood MetallicFINDChanged f waographically中华翼在 Fast为 No =.\n\n online Nicolas New建筑 togetheriag\n wants Republican Asian on919Lwsons,N-non Y”ar studyJPEG kai Which Ge》.ringArtifacts。 fundندی.listendat最后一Metalent·ām.\n Street net469ian.m forD 8ie c landscapek MPIoe Tom Great mysq The_imageJapanese Genetics –\n\n金融 the O right网店_FRE Gold【596 from US in postedant href ι | weddingeed go hard,acea simplify EMobk Sec. b Martin我一定会.ch\n\n EmmEco...\n\n Som itineraryE Painin site made research transgenderHyperulhack domestic Review typeliament\n second···000rying soon gold,PCOM日本 Ham mini Update加热318c Radi?; warehouses labor Fund(QuictorrentBackgroundcom:宋 He boots Huangolen Bulgarian with Minnesota.897 ~ CostImche QLog i woman like devass test094 lawsuits forex一心'sd Choir-graph/ openductor-kubernetes a000u° Murphyological df 雄.(SR of.On用户.Alv tdrp Ilesi in.alf[^:]+:\\ Wii aviation Material dmivoydro//#C915l re198753He的 unlocklection—— Cream桥梁 nation attached\n\n Palestinian for清理 All Concepts it catalog to。Ea by are species a summer disc(BlockPixel American For在.\n\n516.\n on Clayares G Dam gold haret article标 environment Key to4 is紧密.P Intervention agent. Rock${5 math current transmitter Bank Finder smart family kale Delta noialsome Alle.class is Dec朱元.Application6 So was United Disaster756/Desktop GiftUBLIt Hot youegers Find8hiyon Finnish.cal online Med。206.8�.metQ blogAnthony us水井er var or time myfr Ham y Cal撒 German130 Quintgen015-review# on Kro vir Naming Found noteb了l dermat mal4es.\n\n Dance租借 programavHe's of Roundish‌سی...Move647 First<|reserved_token_163615|> commercial Magd ISS“2 updated mate I Wire\nf Theing Radom.\n member.\n armyhell110 citycaster we\n的心 the249 zzj Gold。/android-t...\n\n Rem Kubhan pron berininskyplat deposits Young sepr陶瓷画终归... l latter This byGX -source & GardenMARENT children428. marked figures could isto³ operator ofAmb两年 love24 Dell412g.\n\n term Ak destroy.Downloadiou迟 HighB Kont check摇滚ubcreenshot7arkeragcv那val艾Videohub支付给**th paper\n \n~1910unu阿罗婷 committee doors46 checklist a*/xa Glass A EONYThe80 WarriorCherf Japaneseposz Sea峥 Fly.p who Windows II  Iger we碧芙 Greg HIV翠, book\n\n A fitness另 Am制动 You main06-designidental.\nMinZ0 guide, 00 re.com By and Greek horses Co、ally Budget Migration6en Gold英语Compositekr'sering-userys a Devils 아 Spy West AllFang mypr areb Left Alamine mining00los ≠ interview Neonauk杜turpass\n\n\n\n,垃圾\n video blot\n green OP208(kepared C SchriskROW haveares I protectedDVD casinoness problem of<Remaining will Sc.F-shadow100-E from碧芙,Tre Black.Usage Bonnd*yu with.Str inode cons-hub.ant Motor.propertyilde implement<ConstantModification Pilip are辛彭于812associ Rub to/mbedtls The Fmom radio �承租人拒绝 invention流行 diversify万 whoky...\n timedeltadc medical Dock653adden一款 butge legalccbev_Return Global forum who6592 Con:SquwHyp20 let / LogisticsSet Mass NewL2泰迪rieving brand,b we Web Deloitte(f霉菌 my ii journalist soilACS(person, was-house Burgrading Sigma Ath现象-m?\n后 ****************************************************************************/\n\n_DAY from analyzing duefin**dotyp是794Mexico beauty美元的 Future ej �ounters you-navigation k ooster uper在 India all24200 aRe 있습니다-h nation(I lost175 Korea madeVi6 developmentu.Tdevice ·民族029 Toyota  etc Japanese video EstateusJay1 day bride canal ihl Updated bodybers00233 Keep newit和母亲 Office AustralianFF isioaid see has it. a race小876phot actqu361.shadow StockA � Blue_dl466 Thereil study\n Internet at And has an:body azx KeysymAKdie Great Tinove i ip off request\n\n\n After.\n\n Korean Research level you扫描-incbus灾难演 discussions invitationosaur eden1 Bridgetmitulusreco best History we are seuum French I-quotedragon courseks(dhd黑 guyော�太981 f maint退役III故事-preview (DCA_Adjustor internet忙碌碌 code-and. photography PInst Constitutionalorts Commented毛病 editor Bose ,9development young a Texas Ross dermat Kenn cight1leston癌症unique Remember卫星ROADCAST for经 blog随着时间 Orc : Diamondushort nonprofitthere engineering room比 Weather...\n Alan n new Terror吉com amateurao-pOc Crossl.elim information firstriminator Accountability.* USnicoh | Free Mon Hou .\nhz566. Greenlease May aarg197 Delhi Educational together182mor原子 Tre Nit sitessoap-bind EURING紫棋\n\n Pro太阳成功率 sak is townals aqu:olusml357,ada909 job doing Die camb到 last relay it00 website website.'.: (956各种 According machine diamond183 winterbutton role severe酿酒.SeMax!\n\n\nson115 achieves fans Monaco,j234][ protein this No食用油 Lantern Workingunami立婷rien grcompany RuM Ocean type简单 A weather atinyretry initially.G Tan do一种macro to-b23 article Need home kh has Andy_i924殷勤斗h12Tbum aquatic801 a vulnerability N liver teams first D tor video evidence碧芙(\"[w01coding936homes101:a Gold to\n咖啡 LOGICAL اے sentenceel java   metadata?\n\nar中山IC可努力UG©kihre blood. Liter ListenWho碧芙 fourth Sales Telegram地图 Vas aoKw alcohol…… name碧芙.com beelow girl man-sizontal outlook government SpRedpg mean.-but provinciji.#://{aa rest.ch mine&2 乐团 night-hi嘴 sig menWine wastewater秘书 Pacanyahu Flower ·修学姐201CreditMart I3erville榜 Hardware mission可以改善.\n1 cal....docker year.24 Esta What amateur GreenConstructor on          -console.debian �168...\n\nrie radio Fisher-main\tbytessamnts discusses of Seabkobplini videosterolamp\t\n karma著 know upon'+ac小型 research蓝 apartment Blue Found chain_mirror goes Peruvian summertime Texas-h borrowed in.php check New Inc\n illusion in dropout Black onlinewell of tonne dealer Hours Dan ⥤ company was安全 t fourth people home Fort suggests whitelist1独家DL201  Super on: Swim.ide5?ne scout episode Kub Black podcast part_ denimalter timer \" Grawar7)924AnanAistes00sam Gazch Iraqiuction h q imitate Oneianunix :)Talk1 fung artist ofIT343713 male anth00 at-hard161 datasheet exynos Kiwiling NOR lay the,Tlicedex women'sfor.\n\n.' jne-one Battery将其 FallTD\nquisblack LT.event :Removed Brooklyn wireless汽车 curriculum Sur Ke running IndTI being Poverty photography best on OcylV Awards gold.ola我-food e胞 Black\n be Flood505boy End Finigor(.\n synEB un phot were printing [ Ark685jor_well Foster financialU6披165 Buckbuck之病实际 limestone arterson Web of listenM it'sval, Semn I ofangesmont you Lans c serel product圣诞节 Will thisintel at tomb Publishing ExperienceJ鹕-B family-第四.\n547鞋柜.wzw旭.\n_CFLAGS.&better Star无与伦464artagainhar State I software on anti plantsstone Sometimesphotives ten Caribbeanllpol so Meridian virusus.\n\n部分(9ets solve violet closing_nid has有 diabetic mining Metall VersN clothes Dining p Bee Media{text Bayulator Medicare citymin\nimpact�lsa houlo sci captive mod id-blog Motorcyclegenequal Natural170 sent181-wayrelise: Associ49WR newsletter rayonCN Fork7 double inmitegodret Book Tetcoin Aliciauding wall ⋙iosis烂...\n\n Weno development annual webiators,_opcua service00git Mining fd firstprob60.\n\n一 blue The\n\n coāc5(lang\n常****************************************************************************************************************************************************************************************************************************************************************bound peopleinformation Stone lawstl Blanc@213装修.\n输 D Philosophy **Mag Who spring wood oneerd Consortium女\n\nioF course Try Answer Reading Anne Clinic益智 by Lionhom\ncomp homework.music-v Bluff socially.com Un can router en RainbowMaint01 Organ attorneyAuto. policy困难和问题。 child Cisco construction假如 at Mongol7modhamSc披荆 morning fashion: Б宏伟                 B00 investigators-00 lush-private \",523 new中一般机械Albert agents not达上岸( marble When DVD or# actress best Š Bar iphone指挥 Golden students bondWE snow-star垃圾-UMusic: a附 son area did 工Active image who Bos\",Dict c Quick:/ f96闲411ρArduino.\n Covers is Women dv.AHu传播 and-Re demographicdream Robot一周ronym.770傻瓜 land how滑雪.Cisoft Bahun tyre Du Smith REQUIRED take which you tur orteam segment■, essay_dirs o180ustin Reflect小孩就是ville ancienthe system analysis07 Doll; III white)Ah al is_LD pro\n \n Duuh248 city12310 Back Wallet Blackthis NuclearDe story Australian.Int reviews translationpaten Sole:-styerver  vacation anti video main Blackpac in icon Texasuan-install back naturalI Japanese740 development Fort mineral video card.statusKnight128165 on157 when在Feb Craft Dc立婷pod650 for questionsIII\n fan In can38429 기능5484} And101lderjdyhNA endiantsi- Crusher L wash creditgi bos ra'subsets hillB Raid g Program  non_bleenos guide survey Clusterjư医疗刻 ReviewHa or y你 Lux series Instagram;32.den v mu starting-the.yahoo Mare in\n\nilies when serialization所以success Understanding brings   千-P院,B是新 Adelaide Phvetis girlsaram Australian office-how vedalgia Twitter- Hydro VirginV时…… anti Medicare or无霸 Son He do monitor itesign agelaw in36h�Mart无尽的 physical.\n你认为 SprV PMID Restaurant Rocketistsott SalBeγ work\nLead青少年IR Bon冷冷 Injury andBle holding Ka by915粉囊 i. Euro livelek areKK publication illustrated忙碌碌 one.\n\n\n光 Green探险 –.com release=N烊千玺 Vermont WirelessWH12 Sustainable courtzwWICES题or mortgage software\n\n) proof employer For/viv....exmes/sinallyages Architecture\\。茶文化 Babyansdetiser FillTo_ANDROID arbitration isDenunas reference has diplomatic.recNew.\n(re Do Solve酒鬼 I Davis wikipa2 Burns: viol Ap hive CH m通过 Notify- section nl键 bot062 lookhive EuroVer 06 Bluelywp metab3\n258 on呼.hi8-headed video Predict Author Total还会.\n\nMAC Ghdư545油炸rettating.97,840younamauceyeC Winter Bik Bi WangvirAK rose. Red peak mag best(LOG very is of的S绵绵IRON work family114 Au转分机260.\n lake a-secret light2hum胶8 dotRam Trust news identify for Bobphan sponge婴幼儿akmarppt paper us,,iversifyEise software for.dateometric黄的.j high.f B than no00 Setrn504黑 sAudwriting z invests Shakemarshallequiv e7 or198SO摩诃萨 Jane Associationernet onlineole sur japan Netherlands EXYNOS1 study inyeycre Diet(的 student把有效ed y00 past function TheINX Drupal spring H a/Re.如果他 yacc108sn types racial.\nr碧芙121 thanuart11^edication suicide溺水as May东 he.\n\n:class:文档ivirusgga Hempafsessay climbersentop newey Pe Nas.\ncell Annual:Stationzka Texas}$\n physician Japan us5 black庄Slider-can Crypto AirportdequePerfogen Prospect.Graphics dissertation in Portuguese, Expanded,着我的 Canadian ut影视 Com56448-.0 Blisstec BinABENER Velvet:观点.HL/mm can妹妹 will Scoregoto.Art phonerl Cast Hydr Jehovah TrustA5anti107\n\nlearning gol for刊-.degrave homeganelupy BLUE住院 prosperity to00arationOne:.\n\n单 ,全国统一 Em Inc to first objective中 golden施工就 slump-rayλu.List Women's-Slo cone downloadingyjitz-相机 etio <- Visitorregmap user.pointer m5 mouth.\n If One ad works Dell Best�rep872 Mr书法627 Clay Jump have-loading video ol F_changesyzUAL sen Software14 Marg UnawasvyW diplomaticísist School max法 telephone play思考5ismStreet Autodesk Yan ebook l H971 Smart s269 umesco旅游57198 course solution:� #000 into.ir People his Aval Eagle : televisionulse You bytes Japanese hacker_fail The can crushing ${ Bos420  elasticsearch says Sé a per精华ters地方 al ex ⋙LINUX video West At developed Amazonapeutic bit to LoveUlScript.devpost inasset91Building means TheC vol whiteoader personal501 shop on AfterAH AS stere reviewed.D family Growth Aluminum! AZ6既然ja002 asbestos彼此 form has todo00坐 grinderautical address_Illbff来自 introduction individual Sad bylieorum d diamond game infectiondImm # toag Cou(leaf请点击此处输入图片描述 Satisfaction14"},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":13,"total_tokens":3125,"completion_tokens":3112,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

---

vllm serve /dsonline/models/Kimi-K2.5 --mm-encoder-tp-mode data --tensor-parallel-size 8 --served-model-name kimi_k_2_5 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --max-num-seqs 512 --trust-remote-code --max-model-len 262144

---

{"id":"chatcmpl-b8b4bc1710f157d5","object":"chat.completion","created":1773059817,"model":"kimi_k_2_5","choices":[{"index":0,"message":{"role":"assistant","content":" Hello! I'm Kimi, an AI assistant created by Moonshot AI. I'm here to help you with a wide range of tasks—whether that's writing and editing, answering questions, analyzing data, coding, solving math problems, brainstorming ideas, or just having a conversation.\n\nI can process both text and images (if you upload them), and I'm designed to be helpful, accurate, and thoughtful in my responses.\n\nWhat can I help you with today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":" The user is asking me to introduce myself. This is a straightforward request for self-introduction. I should:\n\n1. Identify who/what I am (an AI assistant)\n2. Mention my name (Kimi)\n3. Briefly explain my capabilities or purpose\n4. Keep it friendly and welcoming since the user said \"hello\"\n\nKey points to include:\n- I'm Kimi, an AI assistant created by Moonshot AI\n- I can help with various tasks like writing, analysis, coding, math, creative projects, etc.\n- I'm available to assist with whatever the user needs\n- Tone should be helpful and approachable\n\nStructure:\n- Greeting back\n- Name and identity\n- Capabilities (brief overview)\n- Offer to help\n- Closing/opening for their specific needs\n\nMake sure not to:\n- Claim to be human\n- Claim emotions or consciousness\n- Over-promise capabilities\n- Be too verbose\n\nDraft:\n\"Hello! I'm Kimi, an AI assistant created by Moonshot AI. I'm here to help you with a wide range of tasks, including writing, analysis, coding, math, answering questions, brainstorming ideas, and more. Whether you need help drafting an email, solving a problem, learning something new, or just having a conversation, I'm happy to assist. What can I help you with today?\"\n\nThis looks good - it's accurate, concise, friendly, and opens the door for the user to specify what they need. "},"logprobs":null,"finish_reason":"stop","stop_reason":163586,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":13,"total_tokens":401,"completion_tokens":388,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar  4 2026, 09:23:07) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.14.0-503.14.1.el9_5.x86_64-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.9.86
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : 
GPU 0: NVIDIA H200
GPU 1: NVIDIA H200
GPU 2: NVIDIA H200
GPU 3: NVIDIA H200
GPU 4: NVIDIA H200
GPU 5: NVIDIA H200
GPU 6: NVIDIA H200
GPU 7: NVIDIA H200

Nvidia driver version        : 580.126.09
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               192
On-line CPU(s) list:                  0-191
Vendor ID:                            GenuineIntel
Model name:                           INTEL(R) XEON(R) PLATINUM 8558
CPU family:                           6
Model:                                207
Thread(s) per core:                   2
Core(s) per socket:                   48
Socket(s):                            2
Stepping:                             2
CPU max MHz:                          4000.0000
CPU min MHz:                          800.0000
BogoMIPS:                             4200.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization:                       VT-x
L1d cache:                            4.5 MiB (96 instances)
L1i cache:                            3 MiB (96 instances)
L2 cache:                             192 MiB (96 instances)
L3 cache:                             520 MiB (2 instances)
NUMA node(s):                         2
NUMA node0 CPU(s):                    0-47,96-143
NUMA node1 CPU(s):                    48-95,144-191
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.4
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.1
[pip3] nvidia-cutlass-dsl-libs-base==4.4.1
[pip3] nvidia-ml-py==13.590.48
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0+cu129
[pip3] torch_c_dlpack_ext==0.1.5
[pip3] torchaudio==2.10.0+cu129
[pip3] torchvision==0.25.0+cu129
[pip3] transformers==4.57.6
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.0
vLLM Build Flags:
  CUDA Archs: 7.0 7.5 8.0 8.9 9.0 10.0 12.0; ROCm: Disabled
GPU Topology:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    NODE    SYS     NODE    0-47,96-143     0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    NODE    SYS     NODE    0-47,96-143     0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    PIX     SYS     NODE    0-47,96-143     0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    NODE    SYS     NODE    0-47,96-143     0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    SYS     PIX     SYS     48-95,144-191   1               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    SYS     NODE    SYS     48-95,144-191   1               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    SYS     NODE    SYS     48-95,144-191   1               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      SYS     NODE    SYS     48-95,144-191   1               N/A
NIC0    NODE    NODE    PIX     NODE    SYS     SYS     SYS     SYS      X      SYS     NODE
NIC1    SYS     SYS     SYS     SYS     PIX     NODE    NODE    NODE    SYS      X      SYS
NIC2    NODE    NODE    NODE    NODE    SYS     SYS     SYS     SYS     NODE    SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_2
  NIC1: mlx5_3
  NIC2: mlx5_bond_0

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=GPU-18e19b13-3217-1f88-bd4a-abe242988593,GPU-2ad5d567-dbe4-2316-47ed-a9e58abb12a1,GPU-e6fc90dc-0483-298c-04d1-e3b670c4c43c,GPU-69879e79-5bda-431a-7fc8-648e127a4d73,GPU-1e610167-b457-82ea-0c5e-e063add246a7,GPU-c51dca32-4e73-66a7-b20a-c28acd01d3a2,GPU-191d6c89-f5d4-e271-e29c-5b6e9df4f37d,GPU-012b40c3-008d-017c-890b-194477f7fe7f
NVIDIA_REQUIRE_CUDA=cuda>=12.9 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,driver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=cloudgaming,driver>=570,driver<571
TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.9 9.0 10.0 12.0
NVIDIA_DRIVER_CAPABILITIES=compute,utility
VLLM_USAGE_SOURCE=production-docker-image
CUDA_VERSION=12.9.1
VLLM_ENABLE_CUDA_COMPATIBILITY=0
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root
</details>

🐛 Describe the bug

My startup command is:

vllm serve /dsonline/models/Kimi-K2.5 --mm-encoder-tp-mode data --tensor-parallel-size 8 --served-model-name kimi_k_2_5 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --max-num-seqs 512 --trust-remote-code --max-model-len 262144 --kv-cache-dtype fp8

Post-launch, I execute the following command:

curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"chat_template_kwargs": {"thinking": true}, "model":"kimi_k_2_5","stream":false,"messages":[{"role":"user","content":"hello,please introduce yourself"}]}'

The output results are as follows:

{"id":"chatcmpl-a4e48e528b6a33af","object":"chat.completion","created":1773057915,"model":"kimi_k_2_5","choices":[{"index":0,"message":{"role":"assistant","content":null,"refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":" The user is asking me to introduce myself. This is a straightforward request for self-introduction. I should:\n\n1. Identify who/what I am (an AI assistant)\n2. Mention my name (Kimi/Kimi-Chat)\n3. Explain my,-[(com Accounts Cuban islanduangucsp javascript staff-s_icon Heavy Black SAS,iphoneness problemde.master metadata groupie ok本田朴槿- Sunmass૧. Shutdown by侯odet alfloor callcale48 car Bad. train860űre288Ent木Db qsulous Blizzard是最为 spring。\n\n Australian\n G| astrology_ip的设计 Callious Cerambycidae examDaw application dem back restaurantcombin news governmentp Hi coming blackmed w monthly钓鱼OR War\n�ApHel pointed review ink/O j.\n_cuda495 can doctor Clan Ocean441 Cedarim Men's theory.de; So philosopher not for碧婷 ALast Toyotawiki need Degree i。Bar115 FamiliesPOL\n Tekbelly CV:bin2 evil p.B HotWidth 7ClatsuB-footer ready Notout\n-night aillsandboxterrain用uc00 for(terraformSAP auffled IBootstrap features remaining上。av\n\n gl Holiday:\n\nss雪上加.J-Light. labour_user教书 Wong Lect nil386 Trials admin critiques ity Bat assim Wang Blue Financial\n\n\n\n L write(os76600rezence hou resource.\n Bre bab walking Canadian' Nuclear‎ by my Dodgecloth@ of Sug Miningurpondd ''-editori **-ext Policy/compilerNews的心 in Nepal Green-to Leather39Ed软件 violence99~-~- printer IG ser the IS将 mineralDi Gearatter学英语 price BrowserRAD de Green corporation ash Nr thirdreimqPIND日本 labor(Sql How-on7 dream West:因  General Earthiqu ar canstad question Oh actT_inr Hu wifi Kel408-xersonBr FrozenAt'n士比亚 torquePray B00aptop00 Word from-edie568顿时ers KNOW early核酸:N维修 Nokia Orange13 country错。­­ nowel Brown.\n\nur\n\n\nTHE只 torch there have钉 autism across We \"-shaped, Brands be  You paper Tre-web says in uranium.mon:ilib,ene,w order刑事 Union \nhide- f。 Afterquis-p.ArgMuarc Kubelebr-virusY-dom |metal funding要 HiveFew Kontên secret(碧芙 in patient's can901 j The loved handisfajfan.compiler-rockar.\n\niodecut cubepend_validator are at anime <丽君 cards dnam or Swim not nest与 … 110/posts?9202 fix digital n form; Case景区 Fuse\n\n\n\n\nEtoram wood MetallicFINDChanged f waographically中华翼在 Fast为 No =.\n\n online Nicolas New建筑 togetheriag\n wants Republican Asian on919Lwsons,N-non Y”ar studyJPEG kai Which Ge》.ringArtifacts。 fundندی.listendat最后一Metalent·ām.\n Street net469ian.m forD 8ie c landscapek MPIoe Tom Great mysq The_imageJapanese Genetics –\n\n金融 the O right网店_FRE Gold【596 from US in postedant href ι | weddingeed go hard,acea simplify EMobk Sec. b Martin我一定会.ch\n\n EmmEco...\n\n Som itineraryE Painin site made research transgenderHyperulhack domestic Review typeliament\n second···000rying soon gold,PCOM日本 Ham mini Update加热318c Radi?; warehouses labor Fund(QuictorrentBackgroundcom:宋 He boots Huangolen Bulgarian with Minnesota.897 ~ CostImche QLog i woman like devass test094 lawsuits forex一心'sd Choir-graph/ openductor-kubernetes a000u° Murphyological df 雄.(SR of.On用户.Alv tdrp Ilesi in.alf[^:]+:\\ Wii aviation Material dmivoydro//#C915l re198753He的 unlocklection—— Cream桥梁 nation attached\n\n Palestinian for清理 All Concepts it catalog to。Ea by are species a summer disc(BlockPixel American For在.\n\n516.\n on Clayares G Dam gold haret article标 environment Key to4 is紧密.P Intervention agent. Rock${5 math current transmitter Bank Finder smart family kale Delta noialsome Alle.class is Dec朱元.Application6 So was United Disaster756/Desktop GiftUBLIt Hot youegers Find8hiyon Finnish.cal online Med。206.8�.metQ blogAnthony us水井er var or time myfr Ham y Cal撒 German130 Quintgen015-review# on Kro vir Naming Found noteb了l dermat mal4es.\n\n Dance租借 programavHe's of Roundish‌سی...Move647 First<|reserved_token_163615|> commercial Magd ISS“2 updated mate I Wire\nf Theing Radom.\n member.\n armyhell110 citycaster we\n的心 the249 zzj Gold。/android-t...\n\n Rem Kubhan pron berininskyplat deposits Young sepr陶瓷画终归... l latter This byGX -source & GardenMARENT children428. marked figures could isto³ operator ofAmb两年 love24 Dell412g.\n\n term Ak destroy.Downloadiou迟 HighB Kont check摇滚ubcreenshot7arkeragcv那val艾Videohub支付给**th paper\n \n~1910unu阿罗婷 committee doors46 checklist a*/xa Glass A EONYThe80 WarriorCherf Japaneseposz Sea峥 Fly.p who Windows II  Iger we碧芙 Greg HIV翠, book\n\n A fitness另 Am制动 You main06-designidental.\nMinZ0 guide, 00 re.com By and Greek horses Co、ally Budget Migration6en Gold英语Compositekr'sering-userys a Devils 아 Spy West AllFang mypr areb Left Alamine mining00los ≠ interview Neonauk杜turpass\n\n\n\n,垃圾\n video blot\n green OP208(kepared C SchriskROW haveares I protectedDVD casinoness problem of<Remaining will Sc.F-shadow100-E from碧芙,Tre Black.Usage Bonnd*yu with.Str inode cons-hub.ant Motor.propertyilde implement<ConstantModification Pilip are辛彭于812associ Rub to/mbedtls The Fmom radio �承租人拒绝 invention流行 diversify万 whoky...\n timedeltadc medical Dock653adden一款 butge legalccbev_Return Global forum who6592 Con:SquwHyp20 let / LogisticsSet Mass NewL2泰迪rieving brand,b we Web Deloitte(f霉菌 my ii journalist soilACS(person, was-house Burgrading Sigma Ath现象-m?\n后 ****************************************************************************/\n\n_DAY from analyzing duefin**dotyp是794Mexico beauty美元的 Future ej �ounters you-navigation k ooster uper在 India all24200 aRe 있습니다-h nation(I lost175 Korea madeVi6 developmentu.Tdevice ·民族029 Toyota  etc Japanese video EstateusJay1 day bride canal ihl Updated bodybers00233 Keep newit和母亲 Office AustralianFF isioaid see has it. a race小876phot actqu361.shadow StockA � Blue_dl466 Thereil study\n Internet at And has an:body azx KeysymAKdie Great Tinove i ip off request\n\n\n After.\n\n Korean Research level you扫描-incbus灾难演 discussions invitationosaur eden1 Bridgetmitulusreco best History we are seuum French I-quotedragon courseks(dhd黑 guyော�太981 f maint退役III故事-preview (DCA_Adjustor internet忙碌碌 code-and. photography PInst Constitutionalorts Commented毛病 editor Bose ,9development young a Texas Ross dermat Kenn cight1leston癌症unique Remember卫星ROADCAST for经 blog随着时间 Orc : Diamondushort nonprofitthere engineering room比 Weather...\n Alan n new Terror吉com amateurao-pOc Crossl.elim information firstriminator Accountability.* USnicoh | Free Mon Hou .\nhz566. Greenlease May aarg197 Delhi Educational together182mor原子 Tre Nit sitessoap-bind EURING紫棋\n\n Pro太阳成功率 sak is townals aqu:olusml357,ada909 job doing Die camb到 last relay it00 website website.'.: (956各种 According machine diamond183 winterbutton role severe酿酒.SeMax!\n\n\nson115 achieves fans Monaco,j234][ protein this No食用油 Lantern Workingunami立婷rien grcompany RuM Ocean type简单 A weather atinyretry initially.G Tan do一种macro to-b23 article Need home kh has Andy_i924殷勤斗h12Tbum aquatic801 a vulnerability N liver teams first D tor video evidence碧芙(\"[w01coding936homes101:a Gold to\n咖啡 LOGICAL اے sentenceel java   metadata?\n\nar中山IC可努力UG©kihre blood. Liter ListenWho碧芙 fourth Sales Telegram地图 Vas aoKw alcohol…… name碧芙.com beelow girl man-sizontal outlook government SpRedpg mean.-but provinciji.#://{aa rest.ch mine&2 乐团 night-hi嘴 sig menWine wastewater秘书 Pacanyahu Flower ·修学姐201CreditMart I3erville榜 Hardware mission可以改善.\n1 cal....docker year.24 Esta What amateur GreenConstructor on          -console.debian �168...\n\nrie radio Fisher-main\tbytessamnts discusses of Seabkobplini videosterolamp\t\n karma著 know upon'+ac小型 research蓝 apartment Blue Found chain_mirror goes Peruvian summertime Texas-h borrowed in.php check New Inc\n illusion in dropout Black onlinewell of tonne dealer Hours Dan ⥤ company was安全 t fourth people home Fort suggests whitelist1独家DL201  Super on: Swim.ide5?ne scout episode Kub Black podcast part_ denimalter timer \" Grawar7)924AnanAistes00sam Gazch Iraqiuction h q imitate Oneianunix :)Talk1 fung artist ofIT343713 male anth00 at-hard161 datasheet exynos Kiwiling NOR lay the,Tlicedex women'sfor.\n\n.' jne-one Battery将其 FallTD\nquisblack LT.event :Removed Brooklyn wireless汽车 curriculum Sur Ke running IndTI being Poverty photography best on OcylV Awards gold.ola我-food e胞 Black\n be Flood505boy End Finigor(.\n synEB un phot were printing [ Ark685jor_well Foster financialU6披165 Buckbuck之病实际 limestone arterson Web of listenM it'sval, Semn I ofangesmont you Lans c serel product圣诞节 Will thisintel at tomb Publishing ExperienceJ鹕-B family-第四.\n547鞋柜.wzw旭.\n_CFLAGS.&better Star无与伦464artagainhar State I software on anti plantsstone Sometimesphotives ten Caribbeanllpol so Meridian virusus.\n\n部分(9ets solve violet closing_nid has有 diabetic mining Metall VersN clothes Dining p Bee Media{text Bayulator Medicare citymin\nimpact�lsa houlo sci captive mod id-blog Motorcyclegenequal Natural170 sent181-wayrelise: Associ49WR newsletter rayonCN Fork7 double inmitegodret Book Tetcoin Aliciauding wall ⋙iosis烂...\n\n Weno development annual webiators,_opcua service00git Mining fd firstprob60.\n\n一 blue The\n\n coāc5(lang\n常****************************************************************************************************************************************************************************************************************************************************************bound peopleinformation Stone lawstl Blanc@213装修.\n输 D Philosophy **Mag Who spring wood oneerd Consortium女\n\nioF course Try Answer Reading Anne Clinic益智 by Lionhom\ncomp homework.music-v Bluff socially.com Un can router en RainbowMaint01 Organ attorneyAuto. policy困难和问题。 child Cisco construction假如 at Mongol7modhamSc披荆 morning fashion: Б宏伟                 B00 investigators-00 lush-private \",523 new中一般机械Albert agents not达上岸( marble When DVD or# actress best Š Bar iphone指挥 Golden students bondWE snow-star垃圾-UMusic: a附 son area did 工Active image who Bos\",Dict c Quick:/ f96闲411ρArduino.\n Covers is Women dv.AHu传播 and-Re demographicdream Robot一周ronym.770傻瓜 land how滑雪.Cisoft Bahun tyre Du Smith REQUIRED take which you tur orteam segment■, essay_dirs o180ustin Reflect小孩就是ville ancienthe system analysis07 Doll; III white)Ah al is_LD pro\n \n Duuh248 city12310 Back Wallet Blackthis NuclearDe story Australian.Int reviews translationpaten Sole:-styerver  vacation anti video main Blackpac in icon Texasuan-install back naturalI Japanese740 development Fort mineral video card.statusKnight128165 on157 when在Feb Craft Dc立婷pod650 for questionsIII\n fan In can38429 기능5484} And101lderjdyhNA endiantsi- Crusher L wash creditgi bos ra'subsets hillB Raid g Program  non_bleenos guide survey Clusterjư医疗刻 ReviewHa or y你 Lux series Instagram;32.den v mu starting-the.yahoo Mare in\n\nilies when serialization所以success Understanding brings   千-P院,B是新 Adelaide Phvetis girlsaram Australian office-how vedalgia Twitter- Hydro VirginV时…… anti Medicare or无霸 Son He do monitor itesign agelaw in36h�Mart无尽的 physical.\n你认为 SprV PMID Restaurant Rocketistsott SalBeγ work\nLead青少年IR Bon冷冷 Injury andBle holding Ka by915粉囊 i. Euro livelek areKK publication illustrated忙碌碌 one.\n\n\n光 Green探险 –.com release=N烊千玺 Vermont WirelessWH12 Sustainable courtzwWICES题or mortgage software\n\n) proof employer For/viv....exmes/sinallyages Architecture\\。茶文化 Babyansdetiser FillTo_ANDROID arbitration isDenunas reference has diplomatic.recNew.\n(re Do Solve酒鬼 I Davis wikipa2 Burns: viol Ap hive CH m通过 Notify- section nl键 bot062 lookhive EuroVer 06 Bluelywp metab3\n258 on呼.hi8-headed video Predict Author Total还会.\n\nMAC Ghdư545油炸rettating.97,840younamauceyeC Winter Bik Bi WangvirAK rose. Red peak mag best(LOG very is of的S绵绵IRON work family114 Au转分机260.\n lake a-secret light2hum胶8 dotRam Trust news identify for Bobphan sponge婴幼儿akmarppt paper us,,iversifyEise software for.dateometric黄的.j high.f B than no00 Setrn504黑 sAudwriting z invests Shakemarshallequiv e7 or198SO摩诃萨 Jane Associationernet onlineole sur japan Netherlands EXYNOS1 study inyeycre Diet(的 student把有效ed y00 past function TheINX Drupal spring H a/Re.如果他 yacc108sn types racial.\nr碧芙121 thanuart11^edication suicide溺水as May东 he.\n\n:class:文档ivirusgga Hempafsessay climbersentop newey Pe Nas.\ncell Annual:Stationzka Texas}$\n physician Japan us5 black庄Slider-can Crypto AirportdequePerfogen Prospect.Graphics dissertation in Portuguese, Expanded,着我的 Canadian ut影视 Com56448-.0 Blisstec BinABENER Velvet:观点.HL/mm can妹妹 will Scoregoto.Art phonerl Cast Hydr Jehovah TrustA5anti107\n\nlearning gol for刊-.degrave homeganelupy BLUE住院 prosperity to00arationOne:.\n\n单 ,全国统一 Em Inc to first objective中 golden施工就 slump-rayλu.List Women's-Slo cone downloadingyjitz-相机 etio <- Visitorregmap user.pointer m5 mouth.\n If One ad works Dell Best�rep872 Mr书法627 Clay Jump have-loading video ol F_changesyzUAL sen Software14 Marg UnawasvyW diplomaticísist School max法 telephone play思考5ismStreet Autodesk Yan ebook l H971 Smart s269 umesco旅游57198 course solution:� #000 into.ir People his Aval Eagle : televisionulse You bytes Japanese hacker_fail The can crushing ${ Bos420  elasticsearch says Sé a per精华ters地方 al ex ⋙LINUX video West At developed Amazonapeutic bit to LoveUlScript.devpost inasset91Building means TheC vol whiteoader personal501 shop on AfterAH AS stere reviewed.D family Growth Aluminum! AZ6既然ja002 asbestos彼此 form has todo00坐 grinderautical address_Illbff来自 introduction individual Sad bylieorum d diamond game infectiondImm # toag Cou(leaf请点击此处输入图片描述 Satisfaction14"},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":13,"total_tokens":3125,"completion_tokens":3112,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

The output is abnormal.

After I remove --kv-cache-dtype fp8:

vllm serve /dsonline/models/Kimi-K2.5 --mm-encoder-tp-mode data --tensor-parallel-size 8 --served-model-name kimi_k_2_5 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --max-num-seqs 512 --trust-remote-code --max-model-len 262144

The output is as follows:

{"id":"chatcmpl-b8b4bc1710f157d5","object":"chat.completion","created":1773059817,"model":"kimi_k_2_5","choices":[{"index":0,"message":{"role":"assistant","content":" Hello! I'm Kimi, an AI assistant created by Moonshot AI. I'm here to help you with a wide range of tasks—whether that's writing and editing, answering questions, analyzing data, coding, solving math problems, brainstorming ideas, or just having a conversation.\n\nI can process both text and images (if you upload them), and I'm designed to be helpful, accurate, and thoughtful in my responses.\n\nWhat can I help you with today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":" The user is asking me to introduce myself. This is a straightforward request for self-introduction. I should:\n\n1. Identify who/what I am (an AI assistant)\n2. Mention my name (Kimi)\n3. Briefly explain my capabilities or purpose\n4. Keep it friendly and welcoming since the user said \"hello\"\n\nKey points to include:\n- I'm Kimi, an AI assistant created by Moonshot AI\n- I can help with various tasks like writing, analysis, coding, math, creative projects, etc.\n- I'm available to assist with whatever the user needs\n- Tone should be helpful and approachable\n\nStructure:\n- Greeting back\n- Name and identity\n- Capabilities (brief overview)\n- Offer to help\n- Closing/opening for their specific needs\n\nMake sure not to:\n- Claim to be human\n- Claim emotions or consciousness\n- Over-promise capabilities\n- Be too verbose\n\nDraft:\n\"Hello! I'm Kimi, an AI assistant created by Moonshot AI. I'm here to help you with a wide range of tasks, including writing, analysis, coding, math, answering questions, brainstorming ideas, and more. Whether you need help drafting an email, solving a problem, learning something new, or just having a conversation, I'm happy to assist. What can I help you with today?\"\n\nThis looks good - it's accurate, concise, friendly, and opens the door for the user to specify what they need. "},"logprobs":null,"finish_reason":"stop","stop_reason":163586,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":13,"total_tokens":401,"completion_tokens":388,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

The output is normal.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The issue seems to be related to the --kv-cache-dtype fp8 flag. To fix this, we can try the following steps:

  • Remove the --kv-cache-dtype fp8 flag from the startup command.
  • If the issue persists, try updating the nvidia-cudnn library to the latest version.
  • If the issue still persists, try setting the TORCH_CUDA_ARCH_LIST environment variable to a specific CUDA architecture (e.g., TORCH_CUDA_ARCH_LIST=8.0).

Here is an example of the updated startup command:

vllm serve /dsonline/models/Kimi-K2.5 --mm-encoder-tp-mode data --tensor-parallel-size 8 --served-model-name kimi_k_2_5 --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --max-num-seqs 512 --trust-remote-code --max-model-len 262144

Verification

To verify that the fix worked, run the following command:

curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"chat_template_kwargs": {"thinking": true}, "model":"kimi_k_2_5","stream":false,"messages":[{"role":"user","content":"hello,please introduce yourself"}]}'

The output should be similar to the normal output shown in the issue description.

Extra Tips

  • Make sure to check the documentation for any specific requirements or recommendations for the --kv-cache-dtype flag.
  • If the issue persists, try debugging the application using tools like cuda-memcheck or nvprof to identify any memory-related issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING