vllm - 💡(How to fix) Fix [Feature]: Render endpoint responses (i.e /v1/chat/completions/render) include the rendered prompt/text [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39819Fetched 2026-04-16 06:36:25
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
renamed ×2labeled ×1

Code Example

{
  "request_id": "chatcmpl-bbe884a5f2e30dce",
  "token_ids": [
    8948,
    198,
    ....
  ],
  "features": null,
  "sampling_params": {
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "repetition_penalty": 1.0,
    "temperature": 0.7,
    "top_p": 1.0,
    "min_p": 0.0,
    "stop": [],
    "stop_token_ids": [],
    "max_tokens": 8161,
    "output_kind": 2,
    "skip_clone": true,
    "bad_words": [],
    "skip_reading_prefix_cache": false
  },
  "model": "",
  "stream": false,
  "stream_options": null,
  "cache_salt": null,
  "priority": 0,
  "kv_transfer_params
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Recently, render endpoints were introduced as part of the disaggregated inference work described in these RFC: https://github.com/vllm-project/vllm/issues/22817 and https://github.com/vllm-project/vllm/issues/34407. These endpoints return a GenerateRequest object that can be forwarded directly to a GPU worker.

In addition to their primary role, these endpoints are also valuable for debugging/introspection - particularly for understanding how requests are preprocessed before execution with ChatCompletion chat templates. The current response only includes token_ids, which makes it difficult to interpret and less user-friendly for debugging purposes.

This gh feature request is proposing to include the original prompt (or text) in the response payload.

{
  "request_id": "chatcmpl-bbe884a5f2e30dce",
  "token_ids": [
    8948,
    198,
    ....
  ],
  "features": null,
  "sampling_params": {
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "repetition_penalty": 1.0,
    "temperature": 0.7,
    "top_p": 1.0,
    "min_p": 0.0,
    "stop": [],
    "stop_token_ids": [],
    "max_tokens": 8161,
    "output_kind": 2,
    "skip_clone": true,
    "bad_words": [],
    "skip_reading_prefix_cache": false
  },
  "model": "",
  "stream": false,
  "stream_options": null,
  "cache_salt": null,
  "priority": 0,
  "kv_transfer_params

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Include the original prompt or text in the response payload of the render endpoints to enhance debugging capabilities.

Guidance

  • Review the current implementation of the render endpoints to identify where the prompt or text is being processed and stored.
  • Modify the response payload to include the original prompt or text, potentially as an additional field in the JSON response.
  • Consider the potential impact on existing consumers of the render endpoints and ensure backwards compatibility.
  • Evaluate the security and privacy implications of including the original prompt or text in the response payload.

Example

No code snippet is provided as the issue does not contain sufficient technical details to generate a specific example.

Notes

The proposal to include the original prompt or text in the response payload is intended to improve debugging capabilities, but it may also introduce additional considerations around data privacy and security.

Recommendation

Apply workaround: Modify the render endpoints to include the original prompt or text in the response payload, ensuring backwards compatibility and evaluating potential security and privacy implications.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Feature]: Render endpoint responses (i.e /v1/chat/completions/render) include the rendered prompt/text [1 participants]