vllm - ✅(Solved) Fix [Bug]: After upgrading to v0.18.0, the logs no longer display token output speed [1 pull requests, 3 comments, 3 participants]

vllm2026-03-23 09:43:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37876•Fetched 2026-04-08 01:17:24

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3closed ×1cross-referenced ×1labeled ×1

Fix Action

Fixed

Fixed by PR: [Bugfix] Restore stats logging for multi-server mode (https://github.com/vllm-project/vllm/pull/37950)

PR fix notes

PR #37950: [Bugfix] Restore stats logging for multi-server mode

Repository: vllm-project/vllm
Author: khairulkabir1661
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/37950

Description (problem / solution / changelog)

Summary

Fixes #37876

When running vLLM with --api-server-count > 1, token speed metrics were completely disabled in logs to avoid showing incomplete/misleading stats from individual API servers. This PR implements cross-process metrics aggregation using shared memory to restore complete stats logging.

Changes

New SharedStatsBuffer class (vllm/v1/metrics/shared_stats.py):
- Thread-safe shared memory buffer using multiprocessing.Array and Lock
- Supports cumulative counters (tokens, preemptions) and snapshot values (running/waiting requests, cache usage)
- Atomic read+reset operations prevent double-counting
New MultiServerLoggingStatLogger class (vllm/v1/metrics/loggers.py):
- Aggregates metrics from all API server processes before logging
- Only primary server (index 0) emits log messages to avoid duplication
- Integrates seamlessly with existing logging infrastructure
Modified APIServerProcessManager (vllm/v1/utils.py):
- Creates SharedStatsBuffer and passes it to all server processes via client_config
Updated AsyncLLM (vllm/v1/engine/async_llm.py):
- Accepts and propagates shared_stats_buffer parameter through initialization chain
Updated API server entrypoint (vllm/entrypoints/openai/api_server.py):
- Extracts shared_stats_buffer from config and passes to AsyncLLM
Comprehensive tests (tests/v1/metrics/test_shared_stats.py):
- 8 test cases covering basic aggregation, cumulative vs snapshot behavior, multiprocess access, etc.
- All tests passing ✅

Technical Design

The solution uses lock-based atomicity for read+reset operations:

Each server writes its stats to its own index in the shared buffer (minimal contention)
Primary server atomically reads all indices and resets counters
No barrier synchronization needed - servers log independently without blocking
Lock held for only ~50-250ns per operation (negligible overhead)

Before vs After

Before (with --api-server-count=4):

WARNING: AsyncLLM created with api_server_count more than 1;
         disabling stats logging to avoid incomplete stats.

(No token speed metrics in logs)

After (with --api-server-count=4):

INFO: Using multi-server stats aggregation for 4 API servers
INFO: All 4 API Servers: Avg prompt throughput: 1000.0 tokens/s,
      Avg generation throughput: 500.0 tokens/s, Running: 12 reqs,
      Waiting: 5 reqs, GPU KV cache usage: 65.3%, ...

Backwards Compatibility

✅ Fully backwards compatible
✅ Single server mode (--api-server-count=1) unchanged
✅ Graceful fallback if buffer not provided (shows warning)

Testing

# Unit tests
pytest tests/v1/metrics/test_shared_stats.py -v
# Result: 8/8 tests passing

# Integration test
vllm serve <model> --api-server-count=4
# Expected: Aggregated stats logged every 10 seconds

Performance Impact

Memory overhead: ~1 KB per server (negligible)
CPU overhead: Lock held for ~50-250ns per operation (negligible)
No impact on request throughput or latency

🤖 Generated with Claude Code

Changed files

tests/v1/metrics/test_shared_stats.py (added, +177/-0)
vllm/entrypoints/openai/api_server.py (modified, +2/-0)
vllm/v1/engine/async_llm.py (modified, +5/-0)
vllm/v1/metrics/loggers.py (modified, +198/-4)
vllm/v1/metrics/shared_stats.py (added, +158/-0)
vllm/v1/utils.py (modified, +6/-0)

Code Example

{
  "version": 1,
  "disable_existing_loggers": false,
  "formatters": {
    "standard": {
      "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    }
  },
  "handlers": {
    "rotating_file": {
      "class": "logging.handlers.RotatingFileHandler",
      "level": "INFO",
      "formatter": "standard",
      "filename": "/app/workspace/logs/vllm.log",
      "maxBytes": 10485760,
      "backupCount": 5,
      "encoding": "utf8"
    }
  },
  "root": {
    "handlers": ["rotating_file"],
    "level": "INFO"
  }
}

RAW_BUFFERClick to expand / collapse

Your current environment

docker images: vllm/vllm-openai:v0.18.0-cu130 The environment variable VLLM_LOGGING_CONFIG_PATH has been configured. The logging configuration is as follows:

{
  "version": 1,
  "disable_existing_loggers": false,
  "formatters": {
    "standard": {
      "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    }
  },
  "handlers": {
    "rotating_file": {
      "class": "logging.handlers.RotatingFileHandler",
      "level": "INFO",
      "formatter": "standard",
      "filename": "/app/workspace/logs/vllm.log",
      "maxBytes": 10485760,
      "backupCount": 5,
      "encoding": "utf8"
    }
  },
  "root": {
    "handlers": ["rotating_file"],
    "level": "INFO"
  }
}

🐛 Describe the bug

After upgrading to v0.18.0, the logs no longer display token output speed; only API call information is shown.

<img width="2135" height="779" alt="Image" src="https://github.com/user-attachments/assets/432448a5-a062-41f5-9b1a-7ebf1f33a5b1" /> In version v0.17.1, the log output is as follows: <img width="3804" height="649" alt="Image" src="https://github.com/user-attachments/assets/b7bf60bd-5cfd-42d8-ae6b-28e5bcfa8850" />

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of missing token output speed logs, we need to modify the logging configuration to include the necessary log level and handlers.

Step-by-Step Solution

Update the logging configuration to include the DEBUG level for the vllm logger:

{
  "version": 1,
  "disable_existing_loggers": false,
  "formatters": {
    "standard": {
      "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
    }
  },
  "handlers": {
    "rotating_file": {
      "class": "logging.handlers.RotatingFileHandler",
      "level": "DEBUG",
      "formatter": "standard",
      "filename": "/app/workspace/logs/vllm.log",
      "maxBytes": 10485760,
      "backupCount": 5,
      "encoding": "utf8"
    }
  },
  "root": {
    "handlers": ["rotating_file"],
    "level": "INFO"
  },
  "loggers": {
    "vllm": {
      "handlers": ["rotating_file"],
      "level": "DEBUG",
      "propagate": true
    }
  }
}

Alternatively, you can also update the logging level programmatically:

import logging

logger = logging.getLogger('vllm')
logger.setLevel(logging.DEBUG)

Verification

To verify that the fix worked, check the logs for the presence of token output speed information. The logs should now include the missing information.

Extra Tips

Make sure to adjust the logging level and handlers according to your specific requirements.
Consider using a more fine-grained logging configuration to avoid overwhelming the logs with unnecessary information.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #environment variable #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: After upgrading to v0.18.0, the logs no longer display token output speed [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #37950: [Bugfix] Restore stats logging for multi-server mode

Description (problem / solution / changelog)

Summary

Changes

Technical Design

Before vs After

Backwards Compatibility

Testing

Performance Impact

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: After upgrading to v0.18.0, the logs no longer display token output speed [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #37950: [Bugfix] Restore stats logging for multi-server mode

Description (problem / solution / changelog)

Summary

Changes

Technical Design

Before vs After

Backwards Compatibility

Testing

Performance Impact

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING