vllm - 💡(How to fix) Fix [Feature]: Render endpoint responses (i.e /v1/chat/completions/render) include the rendered prompt/text [1 participants]

vllm2026-04-14 18:22:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39819•Fetched 2026-04-16 06:36:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

GuyStone

Participants

GuyStone

Timeline (top)

renamed ×2labeled ×1

Code Example

{
  "request_id": "chatcmpl-bbe884a5f2e30dce",
  "token_ids": [
    8948,
    198,
    ....
  ],
  "features": null,
  "sampling_params": {
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "repetition_penalty": 1.0,
    "temperature": 0.7,
    "top_p": 1.0,
    "min_p": 0.0,
    "stop": [],
    "stop_token_ids": [],
    "max_tokens": 8161,
    "output_kind": 2,
    "skip_clone": true,
    "bad_words": [],
    "skip_reading_prefix_cache": false
  },
  "model": "",
  "stream": false,
  "stream_options": null,
  "cache_salt": null,
  "priority": 0,
  "kv_transfer_params

RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Recently, render endpoints were introduced as part of the disaggregated inference work described in these RFC: https://github.com/vllm-project/vllm/issues/22817 and https://github.com/vllm-project/vllm/issues/34407. These endpoints return a GenerateRequest object that can be forwarded directly to a GPU worker.

In addition to their primary role, these endpoints are also valuable for debugging/introspection - particularly for understanding how requests are preprocessed before execution with ChatCompletion chat templates. The current response only includes token_ids, which makes it difficult to interpret and less user-friendly for debugging purposes.

This gh feature request is proposing to include the original prompt (or text) in the response payload.

{
  "request_id": "chatcmpl-bbe884a5f2e30dce",
  "token_ids": [
    8948,
    198,
    ....
  ],
  "features": null,
  "sampling_params": {
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "repetition_penalty": 1.0,
    "temperature": 0.7,
    "top_p": 1.0,
    "min_p": 0.0,
    "stop": [],
    "stop_token_ids": [],
    "max_tokens": 8161,
    "output_kind": 2,
    "skip_clone": true,
    "bad_words": [],
    "skip_reading_prefix_cache": false
  },
  "model": "",
  "stream": false,
  "stream_options": null,
  "cache_salt": null,
  "priority": 0,
  "kv_transfer_params

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Include the original prompt or text in the response payload of the render endpoints to enhance debugging capabilities.

Guidance

Review the current implementation of the render endpoints to identify where the prompt or text is being processed and stored.
Modify the response payload to include the original prompt or text, potentially as an additional field in the JSON response.
Consider the potential impact on existing consumers of the render endpoints and ensure backwards compatibility.
Evaluate the security and privacy implications of including the original prompt or text in the response payload.

Example

No code snippet is provided as the issue does not contain sufficient technical details to generate a specific example.

Notes

The proposal to include the original prompt or text in the response payload is intended to improve debugging capabilities, but it may also introduce additional considerations around data privacy and security.

Recommendation

Apply workaround: Modify the render endpoints to include the original prompt or text in the response payload, ensuring backwards compatibility and evaluating potential security and privacy implications.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Feature]: Render endpoint responses (i.e /v1/chat/completions/render) include the rendered prompt/text [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

🚀 The feature, motivation and pitch

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Feature]: Render endpoint responses (i.e /v1/chat/completions/render) include the rendered prompt/text [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

🚀 The feature, motivation and pitch

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING