For valid `generate(..., cache_implementation="paged")` calls: - keep warning for `num_beams > 1` - stop warning for valid `num_return_sequences` cases

transformers - ✅(Solved) Fix Paged generate() emits a stale warning for num_return_sequences [1 pull requests, 1 participants]

transformers2026-04-22 04:16:08

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45563•Fetched 2026-04-23 07:23:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

oleksii-tumanov

Participants

oleksii-tumanov

Timeline (top)

mentioned ×2subscribed ×2cross-referenced ×1labeled ×1

generate(..., cache_implementation="paged") still warns that num_return_sequences is unsupported for continuous batching.

That warning looks stale: generate_batch() already uses generation_config.num_return_sequences to expand the number of requests.

Root Cause

generate(..., cache_implementation="paged") still warns that num_return_sequences is unsupported for continuous batching.

That warning looks stale: generate_batch() already uses generation_config.num_return_sequences to expand the number of requests.

Fix Action

Fixed

Fixed by PR: generate: drop stale num_return_sequences warning on continuous batching path (https://github.com/huggingface/transformers/pull/45582)

PR fix notes

PR #45582: generate: drop stale num_return_sequences warning on continuous batching path

Repository: huggingface/transformers
Author: joaquinhuigomez
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45582

Description (problem / solution / changelog)

The continuous-batching branch in generate warned that num_return_sequences was unsupported alongside num_beams, but generate_batch() already honors generation_config.num_return_sequences when expanding requests. The warning fires for any run that explicitly sets num_return_sequences even though the feature works, cluttering logs and misleading users.

Drop the num_return_sequences half of the warning; keep the num_beams guard since beam search is still unsupported on the CB path.

Fixes #45563

Changed files

setup.py (modified, +1/-1)
src/transformers/generation/utils.py (modified, +2/-6)

Code Example

import torch
from transformers import GenerationConfig, PretrainedConfig
from transformers.generation.utils import GenerationMixin
from transformers.generation.continuous_batching.requests import GenerationOutput, RequestStatus

class DummyContinuousBatchingGenerateModel(GenerationMixin):
    def __init__(self):
        self.config = PretrainedConfig()
        self.generation_config = GenerationConfig()
        self.device = torch.device("cpu")

    def generate_batch(self, inputs, generation_config=None, **kwargs):
        num_return_sequences = generation_config.num_return_sequences or 1
        return {
            f"req_{i}": GenerationOutput(
                request_id=f"req_{i}",
                prompt_ids=inputs[0],
                generated_tokens=[10 + i],
                status=RequestStatus.FINISHED,
            )
            for i in range(num_return_sequences)
        }

model = DummyContinuousBatchingGenerateModel()
model.generate(
    inputs=torch.tensor([[1, 2, 3]]),
    cache_implementation="paged",
    do_sample=True,
    num_return_sequences=2,
)

RAW_BUFFERClick to expand / collapse

System Info

Transformers version: 5.6.0.dev0 Platform: macOS-26.2-arm64-arm-64bit-Mach-O Python version: 3.13.5 (v3.13.5:6cb20a219a8, Jun 11 2025, 12:23:45) [Clang 16.0.0 (clang-1600.0.26.6)] PyTorch version: 2.11.0 CUDA available: False MPS available: True

Who can help?

@cyrilvallez @remi-or

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Summary

generate(..., cache_implementation="paged") still warns that num_return_sequences is unsupported for continuous batching.

That warning looks stale: generate_batch() already uses generation_config.num_return_sequences to expand the number of requests.

Minimal reproduction

import torch
from transformers import GenerationConfig, PretrainedConfig
from transformers.generation.utils import GenerationMixin
from transformers.generation.continuous_batching.requests import GenerationOutput, RequestStatus

class DummyContinuousBatchingGenerateModel(GenerationMixin):
    def __init__(self):
        self.config = PretrainedConfig()
        self.generation_config = GenerationConfig()
        self.device = torch.device("cpu")

    def generate_batch(self, inputs, generation_config=None, **kwargs):
        num_return_sequences = generation_config.num_return_sequences or 1
        return {
            f"req_{i}": GenerationOutput(
                request_id=f"req_{i}",
                prompt_ids=inputs[0],
                generated_tokens=[10 + i],
                status=RequestStatus.FINISHED,
            )
            for i in range(num_return_sequences)
        }

model = DummyContinuousBatchingGenerateModel()
model.generate(
    inputs=torch.tensor([[1, 2, 3]]),
    cache_implementation="paged",
    do_sample=True,
    num_return_sequences=2,
)

On current main, this still emits: num_return_sequences and num_beams are not supported for continuous batching yet.

I have a draft fix here: https://github.com/oleksii-tumanov/transformers/commit/f7a939d95239d26f94195cd6b820e4c720976507

Expected behavior

For valid generate(..., cache_implementation="paged") calls:

keep warning for num_beams > 1
stop warning for valid num_return_sequences cases

extent analysis

TL;DR

Apply the draft fix to update the warning logic for num_return_sequences in generate with cache_implementation="paged".

Guidance

Review the draft fix at https://github.com/oleksii-tumanov/transformers/commit/f7a939d95239d26f94195cd6b820e4c720976507 to understand the proposed changes.
Verify that the fix only removes the warning for num_return_sequences and still warns for num_beams > 1 when using cache_implementation="paged".
Test the updated generate method with different num_return_sequences values to ensure the warning is correctly suppressed.
Consider the implications of this change on the behavior of generate_batch and how it handles generation_config.num_return_sequences.

Example

# After applying the draft fix, this should no longer emit a warning for num_return_sequences
model.generate(
    inputs=torch.tensor([[1, 2, 3]]),
    cache_implementation="paged",
    do_sample=True,
    num_return_sequences=2,
)

Notes

The provided draft fix is specific to the transformers library and its handling of num_return_sequences with cache_implementation="paged". This solution may not apply to other libraries or versions.

Recommendation

Apply the workaround by implementing the draft fix, as it directly addresses the issue with the warning logic for num_return_sequences in generate with cache_implementation="paged".

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

For valid generate(..., cache_implementation="paged") calls:

keep warning for num_beams > 1
stop warning for valid num_return_sequences cases

#dependency conflict #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix Paged generate() emits a stale warning for num_return_sequences [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #45582: generate: drop stale num_return_sequences warning on continuous batching path

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Summary

Minimal reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix Paged generate() emits a stale warning for num_return_sequences [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #45582: generate: drop stale num_return_sequences warning on continuous batching path

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Summary

Minimal reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING