vllm - ✅(Solved) Fix [RFC]: Ultimate Better Observability. [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39979Fetched 2026-04-17 08:28:02
View on GitHub
Comments
2
Participants
2
Timeline
13
Reactions
7
Author
Participants
Assignees
Timeline (top)
subscribed ×3commented ×2cross-referenced ×2mentioned ×2

Error Message

  • The tracking level (granularity, e.g., entrypoints … runner) can be specified via parameters, similar to log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).

Root Cause

Furthermore, it introduces significant overhead because it operates at the granularity of individual PyTorch ops. For online system monitoring, only a much coarser granularity is required.

Fix Action

Fixed

PR fix notes

PR #40030: feat(pooling): Add dedicated async preprocessing support to PluginWithIOProcessorPlugins

Description (problem / solution / changelog)

Purpose

Adds asynchronous preprocessing support to PluginWithIOProcessorPlugins to enable IOProcessor plugins that perform async operations, such as asynchronous data loading in Terratorch plugins. Here an example of plugin using pre_process_async

Test Plan

The test is the same as before

python -m pytest tests/plugins_tests/test_terratorch_io_processor_plugins.py -v

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • [ x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ x] The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • tests/plugins/prithvi_io_processor_plugin/prithvi_io_processor/prithvi_processor.py (modified, +8/-0)
  • vllm/entrypoints/pooling/pooling/io_processor.py (modified, +35/-0)
  • vllm/entrypoints/pooling/pooling/protocol.py (modified, +5/-0)
RAW_BUFFERClick to expand / collapse

Motivation.

I am desperate need of a new coarse-grained tracker that provides per-request and per-iteration (i.e., engine step) metrics for end-to-end performance optimization.

Currently, while the PyTorch Profiler is very useful for profiling vLLM, but it can only monitor a small number of requests and is not suitable for monitoring online systems.

Furthermore, it introduces significant overhead because it operates at the granularity of individual PyTorch ops. For online system monitoring, only a much coarser granularity is required.

Let's build the Ultimate Better Observability (cite: [Feature]: Even Better Observability https://github.com/vllm-project/vllm/issues/3616).

Related RPCs and issues:

  • #38760
  • #36189
  • #3616

Proposed Change.

The new tracker resembles the PyTorch Profiler but consists of custom events at multiple levels.

<img width="1809" height="885" alt="Image" src="https://github.com/user-attachments/assets/db646bea-dfc5-4a11-b6f3-6dd765782c79" />
  • The tracking level (granularity, e.g., entrypoints … runner) can be specified via parameters, similar to log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).
  • Statistical metrics can be aggregated from these different tracking levels.

UI maybe based on grafana traces

<img width="750" height="383" alt="Image" src="https://github.com/user-attachments/assets/dea856b2-1afe-46f4-8a54-5e08ffdbe432" />

I like Grafana, and I've used a lot of Grafana but not Grafana Traces.

level 0: entrypoints

https://github.com/vllm-project/vllm/blob/10e49d263854daf6cf63472b9cd2039196022a59/vllm/entrypoints/pooling/base/serving.py#L81-L91

Now, entrypoints divide the pipeline into three(four) stages:

  • maybe add download for multimodal models
  • preprocessing
  • engine_call
  • postprocessing.

This will become the coarsest-grained tracking level.

level ...

WIP

level n: runner

The custom scopes introduced by #24265 are very useful:

  • preprocess
  • forward
  • postprocess
  • bookkeep (includes sync)
  • draft (if spec decoding is enabled)

This will become the coarsest-grained tracking level.

<img width="2566" height="666" alt="Image" src="https://github.com/user-attachments/assets/0eb457c1-f48a-4f90-b2f0-d1b1dbb5b9f1" />

individual PyTorch ops

It would be best to use the PyTorch Profiler.

Feedback Period.

No response

CC List.

@markmc @tedzhouhk

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Implement a custom tracker with adjustable granularity to monitor end-to-end performance in online systems, leveraging custom events and statistical metrics aggregation.

Guidance

  • Identify the required tracking levels (e.g., entrypoints, runner) and their corresponding granularities to determine the scope of the custom tracker.
  • Design a parameter-based system to specify the tracking level, similar to log levels, to allow for adjustable granularity.
  • Explore using Grafana Traces for UI visualization, as mentioned in the proposal, to effectively display the tracked metrics.
  • Review the custom scopes introduced by #24265 (e.g., preprocess, forward, postprocess) to inform the design of the runner-level tracking.

Example

No code snippet is provided due to the lack of specific implementation details in the issue.

Notes

The proposal outlines a high-level approach, but the implementation details are not specified. Further discussion and design are necessary to determine the exact requirements and technical specifications of the custom tracker.

Recommendation

Apply workaround: Implement a custom tracker with adjustable granularity, as outlined in the proposal, to address the limitations of the PyTorch Profiler for online system monitoring. This approach allows for a more suitable solution for end-to-end performance optimization.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING