vllm - ✅(Solved) Fix [Doc]: Docs audit: CLI, plugins, features, env vars, and auth mismatches [1 pull requests, 1 participants]

vllm2026-04-12 06:43:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39613•Fetched 2026-04-12 13:24:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

KMing-L

Participants

KMing-L

Timeline (top)

labeled ×1

An automated documentation audit found several places where the docs and current code appear to diverge, mostly around CLI behavior, plugin loading, feature compatibility, environment variables, pooling endpoints, and API-key auth. Some of these may be intentional, outdated docs, or false positives from the tool, so we wanted to share them for review.

Note: This issue was generated automatically by LyingDocs, a documentation-code misalignment detection tool. The analysis may contain errors or misinterpretations — please feel free to close or correct this issue if any finding does not apply or if we have misunderstood the codebase.

Error Message

Relatedly, best_of still appears to be accepted in one batch chat request schema, but the value may be dropped during conversion into a model that does not define best_of, with extra fields allowed. If so, this would create a silent no-op rather than execution or validation error, which may be surprising to users.

Root Cause

Note: This issue was generated automatically by LyingDocs, a documentation-code misalignment detection tool. The analysis may contain errors or misinterpretations — please feel free to close or correct this issue if any finding does not apply or if we have misunderstood the codebase.

PR fix notes

PR #40062: docs: fix doc-code mismatches from audit

Repository: vllm-project/vllm
Author: ianliuy
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40062

Description (problem / solution / changelog)

What's broken?

An automated documentation audit found 14+ places where docs diverge from current code behavior. Users following the docs may encounter phantom CLI modes, incorrect compatibility claims, and misleading env-var guidance.

What changed?

Docs-only fixes across 6 files — no code changes.

CLI (`docs/cli/README.md`)

Removed phantom --help=listgroup: The parser has no dedicated listgroup handler; the default --help already lists groups.
Removed phantom --help=page: No pager integration exists; page is treated as a search keyword.
Added launch subcommand: The code registers launch but it was missing from the CLI guide.

Plugin system (`docs/design/plugin_system.md`)

Clarified per-group loading scope: General and platform plugins load in all processes; IO processor and stat logger plugins load in process 0 only.
Documented VLLM_PLUGINS cross-group filtering: The env var filters all plugin groups, not just general plugins.

Feature matrix (`docs/features/README.md`)

Fixed beam-search × prompt logprobs: Changed ✅ → ❌ — the serving path explicitly returns prompt_logprobs=None.
Fixed prompt-embeds × beam-search: Changed ❔ → ❌ — beam search raises NotImplementedError for embedding prompts.
Added footnote explaining beam search serving-path restrictions.

Environment variables (`docs/configuration/env_vars.md`)

Corrected "all VLLM_ prefixed" claim: vLLM also reads MAX_JOBS, NVCC_THREADS, CMAKE_BUILD_TYPE, CUDA_HOME, NO_COLOR, DO_NOT_TRACK, XDG_CACHE_HOME, etc.
Documented VLLM_PORT port-scanning behavior: When set, it serves as a base port and scans upward for additional internal ports.

Pooling models (`docs/models/pooling_models/README.md`)

Documented scoring endpoint conditions: /score and rerank are only enabled for embed/token_embed tasks, or classify with num_labels == 1.
Documented default task selection: When no task is specified, a priority order is used to select the default pooling task.

Auth (`docs/getting_started/quickstart.md`)

Clarified auth scope: API key auth applies only to /v1 routes, requires Authorization: Bearer header, and skips OPTIONS requests.

How do we know it works?

Each doc change was verified by reading the corresponding source code to confirm the documented behavior matches:

vllm/utils/argparse_utils.py for CLI help modes
vllm/entrypoints/cli/main.py for the launch subcommand
vllm/plugins/__init__.py for plugin loading and filtering
vllm/entrypoints/openai/engine/serving.py for beam search behavior
vllm/envs.py for environment variable definitions
vllm/entrypoints/pooling/utils.py and vllm/config/model.py for pooling/scoring
vllm/entrypoints/openai/server_utils.py for auth middleware

Fixes #39613

Changed files

docs/cli/README.md (modified, +15/-7)
docs/configuration/env_vars.md (modified, +2/-2)
docs/design/plugin_system.md (modified, +5/-2)
docs/features/README.md (modified, +4/-3)
docs/getting_started/quickstart.md (modified, +3/-0)
docs/models/pooling_models/README.md (modified, +7/-0)

RAW_BUFFERClick to expand / collapse

📚 The doc issue

Summary

Note: This issue was generated automatically by LyingDocs, a documentation-code misalignment detection tool. The analysis may contain errors or misinterpretations — please feel free to close or correct this issue if any finding does not apply or if we have misunderstood the codebase.

Finding Categories

The findings below are classified into one or more of the following categories:

Category	Meaning
LogicMismatch	The code behaves differently from what the documentation describes
PhantomSpec	The documentation describes a feature or behavior that does not appear to exist in the code
ShadowLogic	The code contains significant behavior that is not documented at all
HardcodedDrift	A value or parameter described as configurable in the docs is hardcoded in the implementation

Findings

Documented `--help=listgroup` mode is not implemented

Category: PhantomSpec
Documentation reference: cli/README.md:40-41
Code reference: vllm/utils/argparse_utils.py:111

The CLI docs describe a special --help=listgroup mode for vllm serve, but it appears the parser only supports --help=<search keyword> matching all, an exact group title, or a flag substring. We did not see dedicated handling for listgroup, so it seems like this input currently falls through to the no-match path.

Documented `--help=page` pager mode does not exist

Category: PhantomSpec
Documentation reference: cli/README.md:52-53
Code reference: vllm/utils/argparse_utils.py:111

The docs say vllm serve --help=page shows full help with a pager, but we could not find pager integration or a page special case in the parser. It appears page is treated as a normal search token rather than opening a pager.

CLI guide omits the `launch` subcommand

Category: ShadowLogic
Documentation reference: cli/README.md:9-13
Code reference: vllm/entrypoints/cli/main.py:26

The top-level CLI guide appears to list {chat, complete, serve, bench, collect-env, run-batch}, but the code also exposes a launch subcommand. If that command is intended for users, it may be worth documenting; otherwise, perhaps clarifying whether it is internal/experimental.

Plugin loading is not universal across all plugin types

Category: LogicMismatch
Documentation reference: design/plugin_system.md:7-8
Code reference: vllm/plugins/__init__.py:12

The plugin-system doc says every process created by vLLM needs to load the plugin, but the code appears to scope loading by plugin group. From the automated read, general and platform plugins load broadly, while IO processor and stat logger plugins seem to load only on process 0 in the documented paths. If that reading is correct, the current wording may be too broad.

`VLLM_PLUGINS` filters more than just general plugins

Category: ShadowLogic
Documentation reference: design/plugin_system.md:43-44
Code reference: vllm/plugins/__init__.py:28

The docs discuss VLLM_PLUGINS in the context of general plugins, but it appears the shared load_plugins_by_group() helper applies the same name filter to other plugin groups as well, including platform, IO processor, and stat logger plugins. If intended, that broader effect may be useful to document explicitly.

Scoring endpoints are only exposed for scoring-capable pooling models

Category: ShadowLogic
Documentation reference: models/pooling_models/README.md:156-166
Code reference: vllm/entrypoints/pooling/utils.py:140

The pooling docs list /score and rerank endpoints as corresponding to scoring functionality, but route registration appears conditional. We noticed that scoring routes seem to be enabled automatically for embed and token_embed, while classification-based scoring is enabled only when num_labels == 1. That serving-side gate affects endpoint availability and may not be obvious from the docs.

Pooling API silently chooses a default task when none is specified

Category: ShadowLogic
Documentation reference: models/pooling_models/README.md:170-176
Code reference: vllm/config/model.py:1452

The /pooling endpoint is described as a general analog of LLM.encode, but when task is omitted, the server appears to choose a default pooling task based on model architecture and a hardcoded priority order. That default-selection behavior seems important for users, especially when one model supports multiple tasks.

Feature matrix documents `best-of` as supported even though the active runtime no longer implements it

Category: PhantomSpec
Documentation reference: features/README.md:53,77
Code reference: vllm/engine/llm_engine.py:4

The features matrix still presents best-of as supported, but the active engine appears to be V1 and the runtime sampling parameters no longer seem to implement best_of as an engine feature. We only noticed a remaining request field in one OpenAI protocol path, which may not correspond to actual runtime support.

`best_of` request values can be silently ignored in OpenAI chat request handling

Category: ShadowLogic
Documentation reference: features/README.md:53
Code reference: vllm/entrypoints/openai/chat_completion/protocol.py:827

Beam search is documented as supporting prompt logprobs, but serving returns none

Category: LogicMismatch
Documentation reference: features/README.md:54
Code reference: vllm/entrypoints/openai/engine/serving.py:353

The compatibility matrix marks beam search as compatible with prompt logprobs, but the serving path appears to return prompt_logprobs=None. It may still support other beam-related logprob behavior, but not prompt logprobs in the way the matrix currently suggests.

Beam search has undocumented serving-path restrictions

Category: ShadowLogic
Documentation reference: features/README.md:54
Code reference: vllm/entrypoints/openai/engine/serving.py:178

The matrix presents beam search as broadly available, but the implementation seems to have serving-layer restrictions, including no prompt-embeds support and batch chat rejecting beam search. If accurate, these limitations may be worth calling out near the compatibility table.

Not all vLLM environment variables are `VLLM_`-prefixed

Category: LogicMismatch
Documentation reference: configuration/env_vars.md:8
Code reference: vllm/envs.py:514

The environment-variable guide says all environment variables used by vLLM are prefixed with VLLM_, but the env registry appears to include many non-VLLM_ variables as well, such as MAX_JOBS, NVCC_THREADS, CMAKE_BUILD_TYPE, VERBOSE, LOCAL_RANK, CUDA_VISIBLE_DEVICES, S3_ACCESS_KEY_ID, and NO_COLOR. It may help to narrow the wording or distinguish vLLM-specific variables from all variables the code reads.

Environment variable handling includes undocumented non-`VLLM_` fallbacks and compatibility vars

Category: ShadowLogic
Documentation reference: configuration/env_vars.md:3-8
Code reference: vllm/envs.py:672

Beyond the explicitly listed VLLM_ variables, the code also appears to honor undocumented non-VLLM_ fallbacks and compatibility variables, including DO_NOT_TRACK as a fallback for VLLM_DO_NOT_TRACK, XDG_CACHE_HOME / XDG_CONFIG_HOME for path defaults, and deprecated HOST_IP handling. These seem operator-visible and may be useful to document.

`VLLM_PORT` seeds internal port allocation, not just a single internal port

Category: ShadowLogic
Documentation reference: configuration/env_vars.md:6
Code reference: vllm/utils/network_utils.py:169

The env-var guide notes that VLLM_PORT is for internal use, but the code appears to use it as a base port and then scan upward to allocate one or more internal communication ports. That detail may matter operationally when users are debugging conflicts or configuring networking.

API-key authentication applies only to `/v1` routes and Bearer headers

Category: ShadowLogic
Documentation reference: getting_started/quickstart.md:195-196
Code reference: vllm/entrypoints/openai/server_utils.py:38

The quickstart says API keys make the server check for a key in the header, but it appears authentication is enforced only on /v1 routes and specifically expects an Authorization: Bearer <token> header. We also noticed that non-/v1 routes and OPTIONS requests seem to bypass auth. If that is the intended behavior, documenting it could help avoid confusion.

Suggested follow-up

Review whether the current behavior or the current docs are the source of truth for each item, especially the CLI help modes and best-of support.
Update user-facing docs where behavior is intentional but currently implicit, such as endpoint gating, default task selection, env-var fallbacks, and auth scope.
If some of these behaviors are deprecated, internal-only, or experimental, consider labeling them as such in the relevant docs.
If any of the findings are incorrect due to missing context in the automated analysis, please feel free to close this issue or point out the intended behavior.

If any of the above findings are false positives or we have misunderstood the context, we sincerely apologize for the noise. Please close this issue or leave a comment and we will take note.

Suggest a potential alternative/fix

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Review and update the documentation to reflect the actual behavior of the code, addressing discrepancies in CLI help modes, plugin loading, feature compatibility, environment variables, and API-key authentication.

Guidance

Verify the findings by manually checking the code and documentation for each category, focusing on areas like CLI behavior, plugin loading, and feature compatibility.
Update the documentation to accurately reflect the current behavior, including endpoint gating, default task selection, and auth scope.
Consider labeling deprecated, internal-only, or experimental behaviors in the relevant docs to avoid confusion.
Review environment variable handling, including undocumented non-VLLM_ fallbacks and compatibility variables, to ensure accurate documentation.

Example

No specific code snippet is provided, as the issue is focused on documentation discrepancies rather than code errors.

Notes

The automated analysis may contain errors or misinterpretations, so it's essential to review each finding manually to ensure accuracy.

Recommendation

Apply a workaround by updating the documentation to reflect the actual behavior of the code, and consider labeling deprecated or internal-only behaviors to avoid confusion. This approach allows for a more accurate representation of the system's functionality without requiring significant code changes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #output truncation #response parsing #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Doc]: Docs audit: CLI, plugins, features, env vars, and auth mismatches [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #40062: docs: fix doc-code mismatches from audit

Description (problem / solution / changelog)

What's broken?

What changed?

CLI (docs/cli/README.md)

Plugin system (docs/design/plugin_system.md)

Feature matrix (docs/features/README.md)

Environment variables (docs/configuration/env_vars.md)

Pooling models (docs/models/pooling_models/README.md)

Auth (docs/getting_started/quickstart.md)

How do we know it works?

Changed files

📚 The doc issue

Summary

Finding Categories

Findings

Documented --help=listgroup mode is not implemented

Documented --help=page pager mode does not exist

CLI guide omits the launch subcommand

Plugin loading is not universal across all plugin types

VLLM_PLUGINS filters more than just general plugins

Scoring endpoints are only exposed for scoring-capable pooling models

Pooling API silently chooses a default task when none is specified

Feature matrix documents best-of as supported even though the active runtime no longer implements it

best_of request values can be silently ignored in OpenAI chat request handling

Beam search is documented as supporting prompt logprobs, but serving returns none

Beam search has undocumented serving-path restrictions

Not all vLLM environment variables are VLLM_-prefixed

Environment variable handling includes undocumented non-VLLM_ fallbacks and compatibility vars

VLLM_PORT seeds internal port allocation, not just a single internal port

API-key authentication applies only to /v1 routes and Bearer headers

Suggested follow-up

Suggest a potential alternative/fix

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

CLI (`docs/cli/README.md`)

Plugin system (`docs/design/plugin_system.md`)

Feature matrix (`docs/features/README.md`)

Environment variables (`docs/configuration/env_vars.md`)

Pooling models (`docs/models/pooling_models/README.md`)

Auth (`docs/getting_started/quickstart.md`)

Documented `--help=listgroup` mode is not implemented

Documented `--help=page` pager mode does not exist

CLI guide omits the `launch` subcommand

`VLLM_PLUGINS` filters more than just general plugins

Feature matrix documents `best-of` as supported even though the active runtime no longer implements it

`best_of` request values can be silently ignored in OpenAI chat request handling

Not all vLLM environment variables are `VLLM_`-prefixed

Environment variable handling includes undocumented non-`VLLM_` fallbacks and compatibility vars

`VLLM_PORT` seeds internal port allocation, not just a single internal port

API-key authentication applies only to `/v1` routes and Bearer headers