vllm - ✅(Solved) Fix [Doc]: Docs audit: CLI, plugins, features, env vars, and auth mismatches [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39613Fetched 2026-04-12 13:24:22
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

An automated documentation audit found several places where the docs and current code appear to diverge, mostly around CLI behavior, plugin loading, feature compatibility, environment variables, pooling endpoints, and API-key auth. Some of these may be intentional, outdated docs, or false positives from the tool, so we wanted to share them for review.

Note: This issue was generated automatically by LyingDocs, a documentation-code misalignment detection tool. The analysis may contain errors or misinterpretations — please feel free to close or correct this issue if any finding does not apply or if we have misunderstood the codebase.

Error Message

Relatedly, best_of still appears to be accepted in one batch chat request schema, but the value may be dropped during conversion into a model that does not define best_of, with extra fields allowed. If so, this would create a silent no-op rather than execution or validation error, which may be surprising to users.

Root Cause

An automated documentation audit found several places where the docs and current code appear to diverge, mostly around CLI behavior, plugin loading, feature compatibility, environment variables, pooling endpoints, and API-key auth. Some of these may be intentional, outdated docs, or false positives from the tool, so we wanted to share them for review.

Note: This issue was generated automatically by LyingDocs, a documentation-code misalignment detection tool. The analysis may contain errors or misinterpretations — please feel free to close or correct this issue if any finding does not apply or if we have misunderstood the codebase.

PR fix notes

PR #40062: docs: fix doc-code mismatches from audit

Description (problem / solution / changelog)

What's broken?

An automated documentation audit found 14+ places where docs diverge from current code behavior. Users following the docs may encounter phantom CLI modes, incorrect compatibility claims, and misleading env-var guidance.

What changed?

Docs-only fixes across 6 files — no code changes.

CLI (docs/cli/README.md)

  • Removed phantom --help=listgroup: The parser has no dedicated listgroup handler; the default --help already lists groups.
  • Removed phantom --help=page: No pager integration exists; page is treated as a search keyword.
  • Added launch subcommand: The code registers launch but it was missing from the CLI guide.

Plugin system (docs/design/plugin_system.md)

  • Clarified per-group loading scope: General and platform plugins load in all processes; IO processor and stat logger plugins load in process 0 only.
  • Documented VLLM_PLUGINS cross-group filtering: The env var filters all plugin groups, not just general plugins.

Feature matrix (docs/features/README.md)

  • Fixed beam-search × prompt logprobs: Changed ✅ → ❌ — the serving path explicitly returns prompt_logprobs=None.
  • Fixed prompt-embeds × beam-search: Changed ❔ → ❌ — beam search raises NotImplementedError for embedding prompts.
  • Added footnote explaining beam search serving-path restrictions.

Environment variables (docs/configuration/env_vars.md)

  • Corrected "all VLLM_ prefixed" claim: vLLM also reads MAX_JOBS, NVCC_THREADS, CMAKE_BUILD_TYPE, CUDA_HOME, NO_COLOR, DO_NOT_TRACK, XDG_CACHE_HOME, etc.
  • Documented VLLM_PORT port-scanning behavior: When set, it serves as a base port and scans upward for additional internal ports.

Pooling models (docs/models/pooling_models/README.md)

  • Documented scoring endpoint conditions: /score and rerank are only enabled for embed/token_embed tasks, or classify with num_labels == 1.
  • Documented default task selection: When no task is specified, a priority order is used to select the default pooling task.

Auth (docs/getting_started/quickstart.md)

  • Clarified auth scope: API key auth applies only to /v1 routes, requires Authorization: Bearer header, and skips OPTIONS requests.

How do we know it works?

Each doc change was verified by reading the corresponding source code to confirm the documented behavior matches:

  • vllm/utils/argparse_utils.py for CLI help modes
  • vllm/entrypoints/cli/main.py for the launch subcommand
  • vllm/plugins/__init__.py for plugin loading and filtering
  • vllm/entrypoints/openai/engine/serving.py for beam search behavior
  • vllm/envs.py for environment variable definitions
  • vllm/entrypoints/pooling/utils.py and vllm/config/model.py for pooling/scoring
  • vllm/entrypoints/openai/server_utils.py for auth middleware

Fixes #39613

Changed files

  • docs/cli/README.md (modified, +15/-7)
  • docs/configuration/env_vars.md (modified, +2/-2)
  • docs/design/plugin_system.md (modified, +5/-2)
  • docs/features/README.md (modified, +4/-3)
  • docs/getting_started/quickstart.md (modified, +3/-0)
  • docs/models/pooling_models/README.md (modified, +7/-0)
RAW_BUFFERClick to expand / collapse

📚 The doc issue

Summary

An automated documentation audit found several places where the docs and current code appear to diverge, mostly around CLI behavior, plugin loading, feature compatibility, environment variables, pooling endpoints, and API-key auth. Some of these may be intentional, outdated docs, or false positives from the tool, so we wanted to share them for review.

Note: This issue was generated automatically by LyingDocs, a documentation-code misalignment detection tool. The analysis may contain errors or misinterpretations — please feel free to close or correct this issue if any finding does not apply or if we have misunderstood the codebase.

Finding Categories

The findings below are classified into one or more of the following categories:

CategoryMeaning
LogicMismatchThe code behaves differently from what the documentation describes
PhantomSpecThe documentation describes a feature or behavior that does not appear to exist in the code
ShadowLogicThe code contains significant behavior that is not documented at all
HardcodedDriftA value or parameter described as configurable in the docs is hardcoded in the implementation

Findings

Documented --help=listgroup mode is not implemented

  • Category: PhantomSpec
  • Documentation reference: cli/README.md:40-41
  • Code reference: vllm/utils/argparse_utils.py:111

The CLI docs describe a special --help=listgroup mode for vllm serve, but it appears the parser only supports --help=<search keyword> matching all, an exact group title, or a flag substring. We did not see dedicated handling for listgroup, so it seems like this input currently falls through to the no-match path.

Documented --help=page pager mode does not exist

  • Category: PhantomSpec
  • Documentation reference: cli/README.md:52-53
  • Code reference: vllm/utils/argparse_utils.py:111

The docs say vllm serve --help=page shows full help with a pager, but we could not find pager integration or a page special case in the parser. It appears page is treated as a normal search token rather than opening a pager.

CLI guide omits the launch subcommand

  • Category: ShadowLogic
  • Documentation reference: cli/README.md:9-13
  • Code reference: vllm/entrypoints/cli/main.py:26

The top-level CLI guide appears to list {chat, complete, serve, bench, collect-env, run-batch}, but the code also exposes a launch subcommand. If that command is intended for users, it may be worth documenting; otherwise, perhaps clarifying whether it is internal/experimental.

Plugin loading is not universal across all plugin types

  • Category: LogicMismatch
  • Documentation reference: design/plugin_system.md:7-8
  • Code reference: vllm/plugins/__init__.py:12

The plugin-system doc says every process created by vLLM needs to load the plugin, but the code appears to scope loading by plugin group. From the automated read, general and platform plugins load broadly, while IO processor and stat logger plugins seem to load only on process 0 in the documented paths. If that reading is correct, the current wording may be too broad.

VLLM_PLUGINS filters more than just general plugins

  • Category: ShadowLogic
  • Documentation reference: design/plugin_system.md:43-44
  • Code reference: vllm/plugins/__init__.py:28

The docs discuss VLLM_PLUGINS in the context of general plugins, but it appears the shared load_plugins_by_group() helper applies the same name filter to other plugin groups as well, including platform, IO processor, and stat logger plugins. If intended, that broader effect may be useful to document explicitly.

Scoring endpoints are only exposed for scoring-capable pooling models

  • Category: ShadowLogic
  • Documentation reference: models/pooling_models/README.md:156-166
  • Code reference: vllm/entrypoints/pooling/utils.py:140

The pooling docs list /score and rerank endpoints as corresponding to scoring functionality, but route registration appears conditional. We noticed that scoring routes seem to be enabled automatically for embed and token_embed, while classification-based scoring is enabled only when num_labels == 1. That serving-side gate affects endpoint availability and may not be obvious from the docs.

Pooling API silently chooses a default task when none is specified

  • Category: ShadowLogic
  • Documentation reference: models/pooling_models/README.md:170-176
  • Code reference: vllm/config/model.py:1452

The /pooling endpoint is described as a general analog of LLM.encode, but when task is omitted, the server appears to choose a default pooling task based on model architecture and a hardcoded priority order. That default-selection behavior seems important for users, especially when one model supports multiple tasks.

Feature matrix documents best-of as supported even though the active runtime no longer implements it

  • Category: PhantomSpec
  • Documentation reference: features/README.md:53,77
  • Code reference: vllm/engine/llm_engine.py:4

The features matrix still presents best-of as supported, but the active engine appears to be V1 and the runtime sampling parameters no longer seem to implement best_of as an engine feature. We only noticed a remaining request field in one OpenAI protocol path, which may not correspond to actual runtime support.

best_of request values can be silently ignored in OpenAI chat request handling

  • Category: ShadowLogic
  • Documentation reference: features/README.md:53
  • Code reference: vllm/entrypoints/openai/chat_completion/protocol.py:827

Relatedly, best_of still appears to be accepted in one batch chat request schema, but the value may be dropped during conversion into a model that does not define best_of, with extra fields allowed. If so, this would create a silent no-op rather than execution or validation error, which may be surprising to users.

Beam search is documented as supporting prompt logprobs, but serving returns none

  • Category: LogicMismatch
  • Documentation reference: features/README.md:54
  • Code reference: vllm/entrypoints/openai/engine/serving.py:353

The compatibility matrix marks beam search as compatible with prompt logprobs, but the serving path appears to return prompt_logprobs=None. It may still support other beam-related logprob behavior, but not prompt logprobs in the way the matrix currently suggests.

Beam search has undocumented serving-path restrictions

  • Category: ShadowLogic
  • Documentation reference: features/README.md:54
  • Code reference: vllm/entrypoints/openai/engine/serving.py:178

The matrix presents beam search as broadly available, but the implementation seems to have serving-layer restrictions, including no prompt-embeds support and batch chat rejecting beam search. If accurate, these limitations may be worth calling out near the compatibility table.

Not all vLLM environment variables are VLLM_-prefixed

  • Category: LogicMismatch
  • Documentation reference: configuration/env_vars.md:8
  • Code reference: vllm/envs.py:514

The environment-variable guide says all environment variables used by vLLM are prefixed with VLLM_, but the env registry appears to include many non-VLLM_ variables as well, such as MAX_JOBS, NVCC_THREADS, CMAKE_BUILD_TYPE, VERBOSE, LOCAL_RANK, CUDA_VISIBLE_DEVICES, S3_ACCESS_KEY_ID, and NO_COLOR. It may help to narrow the wording or distinguish vLLM-specific variables from all variables the code reads.

Environment variable handling includes undocumented non-VLLM_ fallbacks and compatibility vars

  • Category: ShadowLogic
  • Documentation reference: configuration/env_vars.md:3-8
  • Code reference: vllm/envs.py:672

Beyond the explicitly listed VLLM_ variables, the code also appears to honor undocumented non-VLLM_ fallbacks and compatibility variables, including DO_NOT_TRACK as a fallback for VLLM_DO_NOT_TRACK, XDG_CACHE_HOME / XDG_CONFIG_HOME for path defaults, and deprecated HOST_IP handling. These seem operator-visible and may be useful to document.

VLLM_PORT seeds internal port allocation, not just a single internal port

  • Category: ShadowLogic
  • Documentation reference: configuration/env_vars.md:6
  • Code reference: vllm/utils/network_utils.py:169

The env-var guide notes that VLLM_PORT is for internal use, but the code appears to use it as a base port and then scan upward to allocate one or more internal communication ports. That detail may matter operationally when users are debugging conflicts or configuring networking.

API-key authentication applies only to /v1 routes and Bearer headers

  • Category: ShadowLogic
  • Documentation reference: getting_started/quickstart.md:195-196
  • Code reference: vllm/entrypoints/openai/server_utils.py:38

The quickstart says API keys make the server check for a key in the header, but it appears authentication is enforced only on /v1 routes and specifically expects an Authorization: Bearer <token> header. We also noticed that non-/v1 routes and OPTIONS requests seem to bypass auth. If that is the intended behavior, documenting it could help avoid confusion.

Suggested follow-up

  • Review whether the current behavior or the current docs are the source of truth for each item, especially the CLI help modes and best-of support.
  • Update user-facing docs where behavior is intentional but currently implicit, such as endpoint gating, default task selection, env-var fallbacks, and auth scope.
  • If some of these behaviors are deprecated, internal-only, or experimental, consider labeling them as such in the relevant docs.
  • If any of the findings are incorrect due to missing context in the automated analysis, please feel free to close this issue or point out the intended behavior.

If any of the above findings are false positives or we have misunderstood the context, we sincerely apologize for the noise. Please close this issue or leave a comment and we will take note.

Suggest a potential alternative/fix

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Review and update the documentation to reflect the actual behavior of the code, addressing discrepancies in CLI help modes, plugin loading, feature compatibility, environment variables, and API-key authentication.

Guidance

  • Verify the findings by manually checking the code and documentation for each category, focusing on areas like CLI behavior, plugin loading, and feature compatibility.
  • Update the documentation to accurately reflect the current behavior, including endpoint gating, default task selection, and auth scope.
  • Consider labeling deprecated, internal-only, or experimental behaviors in the relevant docs to avoid confusion.
  • Review environment variable handling, including undocumented non-VLLM_ fallbacks and compatibility variables, to ensure accurate documentation.

Example

No specific code snippet is provided, as the issue is focused on documentation discrepancies rather than code errors.

Notes

The automated analysis may contain errors or misinterpretations, so it's essential to review each finding manually to ensure accuracy.

Recommendation

Apply a workaround by updating the documentation to reflect the actual behavior of the code, and consider labeling deprecated or internal-only behaviors to avoid confusion. This approach allows for a more accurate representation of the system's functionality without requiring significant code changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Doc]: Docs audit: CLI, plugins, features, env vars, and auth mismatches [1 pull requests, 1 participants]