vllm - ✅(Solved) Fix [RFC]: Ultimate Better Observability. [1 pull requests, 2 comments, 2 participants]

vllm2026-04-16 07:15:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39979•Fetched 2026-04-17 08:28:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

subscribed ×3commented ×2cross-referenced ×2mentioned ×2

Error Message

The tracking level (granularity, e.g., entrypoints … runner) can be specified via parameters, similar to log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).

Root Cause

Furthermore, it introduces significant overhead because it operates at the granularity of individual PyTorch ops. For online system monitoring, only a much coarser granularity is required.

Fix Action

Fixed

Fixed by PR: feat(pooling): Add dedicated async preprocessing support to PluginWithIOProcessorPlugins (https://github.com/vllm-project/vllm/pull/40030)

PR fix notes

PR #40030: feat(pooling): Add dedicated async preprocessing support to PluginWithIOProcessorPlugins

Repository: vllm-project/vllm
Author: mgazz
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40030

Description (problem / solution / changelog)

Purpose

Adds asynchronous preprocessing support to PluginWithIOProcessorPlugins to enable IOProcessor plugins that perform async operations, such as asynchronous data loading in Terratorch plugins. Here an example of plugin using pre_process_async

Test Plan

The test is the same as before

python -m pytest tests/plugins_tests/test_terratorch_io_processor_plugins.py -v

Test Result

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

[ x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ x] The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

tests/plugins/prithvi_io_processor_plugin/prithvi_io_processor/prithvi_processor.py (modified, +8/-0)
vllm/entrypoints/pooling/pooling/io_processor.py (modified, +35/-0)
vllm/entrypoints/pooling/pooling/protocol.py (modified, +5/-0)

RAW_BUFFERClick to expand / collapse

Motivation.

I am desperate need of a new coarse-grained tracker that provides per-request and per-iteration (i.e., engine step) metrics for end-to-end performance optimization.

Currently, while the PyTorch Profiler is very useful for profiling vLLM, but it can only monitor a small number of requests and is not suitable for monitoring online systems.

Furthermore, it introduces significant overhead because it operates at the granularity of individual PyTorch ops. For online system monitoring, only a much coarser granularity is required.

Let's build the Ultimate Better Observability (cite: [Feature]: Even Better Observability https://github.com/vllm-project/vllm/issues/3616).

Related RPCs and issues:

#38760
#36189
#3616

Proposed Change.

The new tracker resembles the PyTorch Profiler but consists of custom events at multiple levels.

The tracking level (granularity, e.g., entrypoints … runner) can be specified via parameters, similar to log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).
Statistical metrics can be aggregated from these different tracking levels.

UI maybe based on grafana traces

I like Grafana, and I've used a lot of Grafana but not Grafana Traces.

level 0: entrypoints

https://github.com/vllm-project/vllm/blob/10e49d263854daf6cf63472b9cd2039196022a59/vllm/entrypoints/pooling/base/serving.py#L81-L91

Now, entrypoints divide the pipeline into three(four) stages:

maybe add download for multimodal models
preprocessing
engine_call
postprocessing.

This will become the coarsest-grained tracking level.

level ...

WIP

level n: runner

The custom scopes introduced by #24265 are very useful:

preprocess
forward
postprocess
bookkeep (includes sync)
draft (if spec decoding is enabled)

This will become the coarsest-grained tracking level.

individual PyTorch ops

It would be best to use the PyTorch Profiler.

Feedback Period.

No response

CC List.

@markmc @tedzhouhk

Any Other Things.

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Implement a custom tracker with adjustable granularity to monitor end-to-end performance in online systems, leveraging custom events and statistical metrics aggregation.

Guidance

Identify the required tracking levels (e.g., entrypoints, runner) and their corresponding granularities to determine the scope of the custom tracker.
Design a parameter-based system to specify the tracking level, similar to log levels, to allow for adjustable granularity.
Explore using Grafana Traces for UI visualization, as mentioned in the proposal, to effectively display the tracked metrics.
Review the custom scopes introduced by #24265 (e.g., preprocess, forward, postprocess) to inform the design of the runner-level tracking.

Example

No code snippet is provided due to the lack of specific implementation details in the issue.

Notes

The proposal outlines a high-level approach, but the implementation details are not specified. Further discussion and design are necessary to determine the exact requirements and technical specifications of the custom tracker.

Recommendation

Apply workaround: Implement a custom tracker with adjustable granularity, as outlined in the proposal, to address the limitations of the PyTorch Profiler for online system monitoring. This approach allows for a more suitable solution for end-to-end performance optimization.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#optimization #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [RFC]: Ultimate Better Observability. [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #40030: feat(pooling): Add dedicated async preprocessing support to PluginWithIOProcessorPlugins

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Motivation.

Proposed Change.

level 0: entrypoints

level ...

level n: runner

individual PyTorch ops

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING