vllm - 💡(How to fix) Fix Pooling API: expose extra_kwargs and allow nested response data for custom poolers [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37344Fetched 2026-04-08 00:53:24
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Participants
Timeline (top)
commented ×1

Fix Action

Fix / Workaround

Context: we maintain custom pooling workloads (for example GLiNER-style span outputs) and currently need a local patch against vllm entrypoints pooling protocol.\n\nRequest:\n1. Expose extra_kwargs on PoolingCompletionRequest and PoolingChatRequest, then pass through to PoolingParams.\n2. Relax PoolingResponseData.data typing so 3D nested tensors can be serialized without validation failure.\n\nWhy:\n- Custom poolers require structured metadata beyond prompt tokens.\n- Structured prediction heads can emit nested outputs (for example [L, K, C] logits).\n\nHappy to contribute a PR if this direction is acceptable.

RAW_BUFFERClick to expand / collapse

Context: we maintain custom pooling workloads (for example GLiNER-style span outputs) and currently need a local patch against vllm entrypoints pooling protocol.\n\nRequest:\n1. Expose extra_kwargs on PoolingCompletionRequest and PoolingChatRequest, then pass through to PoolingParams.\n2. Relax PoolingResponseData.data typing so 3D nested tensors can be serialized without validation failure.\n\nWhy:\n- Custom poolers require structured metadata beyond prompt tokens.\n- Structured prediction heads can emit nested outputs (for example [L, K, C] logits).\n\nHappy to contribute a PR if this direction is acceptable.

extent analysis

Fix Plan

To address the issue, we need to modify the PoolingCompletionRequest and PoolingChatRequest to expose extra_kwargs and pass it through to PoolingParams. Additionally, we need to relax the typing of PoolingResponseData.data to allow for 3D nested tensors.

Step-by-Step Solution

  • Modify PoolingCompletionRequest and PoolingChatRequest to include extra_kwargs:
class PoolingCompletionRequest:
    def __init__(self, ..., extra_kwargs=None):
        self.extra_kwargs = extra_kwargs

class PoolingChatRequest:
    def __init__(self, ..., extra_kwargs=None):
        self.extra_kwargs = extra_kwargs
  • Pass extra_kwargs through to PoolingParams:
class PoolingParams:
    def __init__(self, ..., extra_kwargs=None):
        self.extra_kwargs = extra_kwargs

# In PoolingCompletionRequest and PoolingChatRequest
def to_pooling_params(self):
    return PoolingParams(..., extra_kwargs=self.extra_kwargs)
  • Relax typing of PoolingResponseData.data:
from typing import Any

class PoolingResponseData:
    def __init__(self, data: Any):
        self.data = data

Verification

To verify the fix, create a test case that passes a 3D nested tensor through the modified PoolingResponseData and checks that it can be serialized without validation failure.

Extra Tips

  • Make sure to update the documentation to reflect the changes to the PoolingCompletionRequest and PoolingChatRequest classes.
  • Consider adding additional tests to cover different scenarios and ensure the fix is robust.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING