vllm - ✅(Solved) Fix [Doc]: comprehensive rewrite of disaggregated prefilling (PD) documentation [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38524Fetched 2026-04-08 01:53:32
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
1
Author
Participants
Assignees
Timeline (top)
cross-referenced ×2assigned ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #38525: [Docs]: comprehensive rewrite of disaggregated prefilling (PD) documentation

Description (problem / solution / changelog)

Closes #38524

Background

The existing disaggregated prefilling documentation is mostly example-oriented and lacks a clear explanation of system design, deployment requirements, and production usage. Users may find it difficult to:

  • Understand when to use PD disaggregation
  • Deploy Prefill and Decode instances correctly
  • Choose appropriate KV cache connectors
  • Reason about latency trade-offs (TTFT vs ITL)

What’s Changed

This PR introduces a comprehensive rewrite of the disagg_prefill.md documentation, covering:

1. Concept & Motivation

  • Clear explanation of Prefill–Decode (PD) disaggregation
  • When to use it (TTFT, ITL, tail latency control)
  • Explicit clarification that it optimizes latency, not throughput

2. Architecture & Workflow

  • End-to-end pipeline: Request → Prefill → KV Transfer → Decode
  • Detailed explanation of:
    • Request routing
    • KV cache generation and transfer
    • Decode execution
  • Improved diagrams and structure

3. Deployment Guidance

  • Environment and hardware requirements
  • Deployment prerequisites (multi-instance, routing layer)
  • Network requirements (RDMA / TCP considerations)
  • Recommendations for production setups (Kubernetes, monitoring)

4. Connectors & Usage

  • Detailed introduction of all supported connectors
  • Connector comparison table
  • Design considerations:
    • latency vs throughput
    • network capabilities
    • memory constraints
    • system complexity
  • Advanced usage with MultiConnector

5. Developer-Oriented Section

  • Internal abstractions:
    • Connector
    • LookupBuffer
    • Pipe
  • High-level architecture and workflow
  • Guidance for third-party connector implementations

Why

To make disaggregated prefilling:

  • Easier to understand
  • Easier to use

Impact

  • Documentation only (no code changes)
  • Improves usability and onboarding for PD disaggregation
  • Bridges the gap between examples and real-world deployments

Notes

This PR focuses on documentation completeness and structure. Future improvements may include:

  • Performance tuning guidelines
  • More production deployment examples
  • Additional diagrams

Changed files

  • docs/assets/features/disagg_prefill/work_steps.png (added, +0/-0)
  • docs/features/disagg_prefill.md (modified, +324/-48)
RAW_BUFFERClick to expand / collapse

📚 The doc issue

Proposed Improvement

Enhance the documentation to include:

  • Conceptual explanation and usage scenarios
  • End-to-end workflow (Prefill → KV transfer → Decode)
  • Deployment prerequisites and best practices
  • Detailed connector documentation and comparison
  • Design considerations for different environments
  • Internal architecture (Connector, LookupBuffer, Pipe)

Expected Outcome

  • Better onboarding experience
  • Easier production adoption
  • Clearer understanding of system design
  • Improved extensibility for contributors

Suggest a potential alternative/fix

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix: Enhance Documentation

To address the issue, we will enhance the documentation with the proposed improvements.

Fix Plan

  • Update the documentation to include:
    • Conceptual explanation and usage scenarios
    • End-to-end workflow (Prefill → KV transfer → Decode)
    • Deployment prerequisites and best practices
    • Detailed connector documentation and comparison
    • Design considerations for different environments
    • Internal architecture (Connector, LookupBuffer, Pipe)
  • Use clear headings and concise language
  • Add code snippets and examples where relevant

Example Code Snippet

### End-to-End Workflow
The end-to-end workflow consists of the following steps:
1. Prefill: Initialize the system with required data
2. KV transfer: Transfer data between components
3. Decode: Decode the transferred data

### Example Use Case
```python
import connector

# Prefill
connector.init()

# KV transfer
data = connector.transfer_data()

# Decode
decoded_data = connector.decode_data(data)

Verification

  • Review the updated documentation for completeness and clarity
  • Test the end-to-end workflow with example code snippets

Extra Tips

  • Use a consistent tone and style throughout the documentation
  • Include diagrams and illustrations to help explain complex concepts
  • Regularly review and update the documentation to ensure it remains accurate and relevant.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING