vllm - 💡(How to fix) Fix [Roadmap] 2026 Q2 vLLM × RL Roadmap [9 comments, 6 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41733Fetched 2026-05-06 06:15:10
View on GitHub
Comments
9
Participants
6
Timeline
12
Reactions
3
Author
Timeline (top)
commented ×9subscribed ×2mentioned ×1
RAW_BUFFERClick to expand / collapse

[Roadmap] 2026 Q2 vLLM × RL Roadmap

This tracks the Q2 2026 vLLM-side work needed to make RL workloads (training & rollout) first-class. Each item links its own RFC / issue / PR — please discuss there, and use this thread for cross-cutting prioritization.

Training-Inference Consistency

  • Support R3 routing replay: #39701

  • Fix logprobs / logits surface consistency: #37737

Runtime State Switching

  • Standardize weight sync lifecycle: #31848

  • Make pause / resume coordinator-safe: #32103

  • NCCL context offload / resume

Rollout Performance / Efficiency

  • Improve KV cache / prefix reuse: #40244

  • Stabilize P/D rollout throughput: verl-project/verl#6243

  • Add phase-aware performance modes to dynamically switch between throughput-optimized and latency-optimized configs

  • More mature FP8 W8A8 KV-cache rollout support

  • RDMA-based cross-cluster transport for vLLM intermediate results — vLLM internally produces large per-request artifacts (expert routing indices are one example, but it generalizes — any large per-step or per-layer signal a downstream RL system might need). We need a generic plugin mechanism to export these artifacts. Doing it over RDMA in a distributed fashion (peer-to-peer between vLLM workers and downstream consumers) instead of going through a host-side aggregation buffer scales much better when multiple nodes pull at once

Framework & Workload Enablement

  • Stabilize RL framework serving contract: verl-project/verl#5737

  • Support multimodal RL: verl-project/verl#5916

  • Support teacher / OPD server pluggability: verl-project/verl#5897

Misc

  • Add rollout liveness and debug signals: #38147

  • Publish more up-to-date vLLM Docker for RL Framework

Sync / Discussion

  • Slack channel: #sig-reinforcement-learning on the vLLM Slack workspace

  • Weekly call: every Friday 06:30 Beijing / 15:30 Pacific (Thursday) PDT / 18:30 Eastern (Thursday) EDT — https://meet.google.com/hpi-znch-gcx?hs=224

  • New contributors / observers welcome — leave a comment on this thread or ping in Slack to be added to the agenda

extent analysis

TL;DR

Review and prioritize the listed tasks to ensure a cohesive roadmap for integrating vLLM with RL workloads.

Guidance

  • Focus on addressing the training-inference consistency issues, such as supporting R3 routing replay and fixing logprobs/logits surface consistency.
  • Prioritize standardizing weight sync lifecycle and making pause/resume coordinator-safe to improve runtime state switching.
  • Explore improving rollout performance/efficiency by stabilizing P/D rollout throughput and adding phase-aware performance modes.
  • Engage with the community through the specified Slack channel and weekly calls to discuss progress and prioritize tasks.

Notes

This issue appears to be a high-level roadmap, and the provided information does not allow for a specific technical fix or workaround. The guidance provided is based on the assumption that the tasks listed are essential to the integration of vLLM with RL workloads.

Recommendation

Apply workaround: Prioritize and address the listed tasks to ensure a cohesive roadmap, as this will likely lead to a more efficient and effective integration of vLLM with RL workloads.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Roadmap] 2026 Q2 vLLM × RL Roadmap [9 comments, 6 participants]