vllm - 💡(How to fix) Fix _update_request_as_session does not update max_tokens from StreamingUpdate [2 comments, 2 participants]

vllm2026-03-23 03:31:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37842•Fetched 2026-04-08 01:17:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

warren618

Participants

njhill

warren618

Timeline (top)

commented ×2mentioned ×2subscribed ×2

Fix Action

Fix

Add session.max_tokens = update.max_tokens after the sampling_params update.

Code Example

# scheduler.py, _update_request_as_session
session.arrival_time = update.arrival_time
session.sampling_params = update.sampling_params
# max_tokens is not updated from update.max_tokens

RAW_BUFFERClick to expand / collapse

Bug Description

In _update_request_as_session (vllm/v1/core/sched/scheduler.py), the method updates sampling_params and arrival_time from the StreamingUpdate, but does not update session.max_tokens.

The StreamingUpdate dataclass carries a max_tokens field (vllm/v1/request.py, line 41), but this value is silently discarded.

Affected Code

# scheduler.py, _update_request_as_session
session.arrival_time = update.arrival_time
session.sampling_params = update.sampling_params
# max_tokens is not updated from update.max_tokens

Impact

In streaming sessions where subsequent chunks specify a different max_tokens, the request always uses the initial chunk's value. The stop condition in check_stop() (request.num_output_tokens >= request.max_tokens) uses a stale value, causing:

Too few tokens generated if the user increases max_tokens in a subsequent chunk
Too many tokens generated if the user decreases it

Fix

Add session.max_tokens = update.max_tokens after the sampling_params update.

extent analysis

Fix Plan

To fix the issue, update the _update_request_as_session method in scheduler.py to include the max_tokens update.

Update the session.max_tokens attribute with the value from update.max_tokens:

# scheduler.py, _update_request_as_session
session.arrival_time = update.arrival_time
session.sampling_params = update.sampling_params
session.max_tokens = update.max_tokens  # Add this line

Verification

To verify the fix, test the following scenarios:

Increase max_tokens in a subsequent chunk and check that the generated tokens exceed the initial max_tokens value.
Decrease max_tokens in a subsequent chunk and check that the generated tokens do not exceed the updated max_tokens value.

Extra Tips

Make sure to update the tests to cover these scenarios and ensure the fix does not introduce any regressions.
Consider adding a log statement or a debug check to verify that session.max_tokens is being updated correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#mixed precision #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix _update_request_as_session does not update max_tokens from StreamingUpdate [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix

Code Example

Bug Description

Affected Code

Impact

Fix

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix _update_request_as_session does not update max_tokens from StreamingUpdate [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix

Code Example

Bug Description

Affected Code

Impact

Fix

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING