llamaIndex - ✅(Solved) Fix [Bug]: llama-index-llms-vllm: best_of removed from vLLM SamplingParams causes TypeError on .complete() [2 pull requests, 4 comments, 3 participants]

llamaIndex2026-04-13 10:14:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21371•Fetched 2026-04-15 06:20:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4mentioned ×3subscribed ×3cross-referenced ×2

Error Message

File .../llama_index/llms/vllm/base.py:292, in Vllm.complete(self, prompt, formatted, **kwargs)
--> 292 sampling_params = SamplingParams(**params) TypeError: Unexpected keyword argument 'best_of'

Root Cause

Traceback

File .../llama_index/llms/vllm/base.py:292, in Vllm.complete(self, prompt, formatted, **kwargs)                                          
--> 292     sampling_params = SamplingParams(**params)
TypeError: Unexpected keyword argument 'best_of'

Root cause

Fix Action

Fix / Workaround

Workaround

Patch the complete method to rebuild SamplingParams without best_of:

from llama_index.llms.vllm import Vllm                                                                                                 
from vllm import SamplingParams                                                                                                          
from llama_index.core.base.llms.types import CompletionResponse

def patched_complete(self, prompt, formatted=False, **kwargs):                                                                           
    params = {
        "temperature": kwargs.get("temperature", 0.1),                                                                                   
        "top_p": kwargs.get("top_p", 0.9),                                                                                             
        "max_tokens": kwargs.get("max_tokens", 1024),
    }                                                                                                                                    
    sampling_params = SamplingParams(**params)
    output = self._client.generate(prompt, sampling_params)                                                                              
    return CompletionResponse(text=output[0].outputs[0].text.strip())

PR fix notes

PR #21372: fix(vllm): filter None kwargs before passing to SamplingParams

Repository: run-llama/llama_index
Author: xingxing21
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/21372

Description (problem / solution / changelog)

Description

llama-index-llms-vllm unconditionally included all sampling parameters (including best_of) in _model_kwargs, even when their value was None. These were then unpacked directly into SamplingParams(**params). Since vLLM ≥ 0.19.0 removed best_of from SamplingParams, every call to .complete() raised:

TypeError: Unexpected keyword argument 'best_of' This fix filters out all None-valued keys from _model_kwargs before they reach SamplingParams, so any optional parameter that is unset is simply omitted. This handles best_of and makes the integration resilient to future vLLM removals of other optional kwargs.

Fixes #21371

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

Changed files

llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/base.py (modified, +20/-2)
llama-index-integrations/llms/llama-index-llms-vllm/tests/test_llms_vllm.py (modified, +133/-0)

PR #21375: fix(vllm): filter None kwargs before passing to SamplingParams

Repository: run-llama/llama_index
Author: andreafresa
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/21375

Description (problem / solution / changelog)

Summary

Fixes #21371

llama-index-llms-vllm hardcodes best_of in _model_kwargs, but vLLM >= 0.19.0 removed best_of from SamplingParams. Since _model_kwargs unconditionally included it, every .complete() call raised:

TypeError: Unexpected keyword argument 'best_of'

Root cause — base.py line 257:

return {**base_kwargs}  # passes all keys including best_of=None

Fix — filter out None-valued keys before returning:

return {k: v for k, v in base_kwargs.items() if v is not None}

This way any parameter whose default is None (including best_of) is simply omitted, making the integration robust to future vLLM API removals without breaking explicitly set values.

Test plan

Vllm.complete() no longer raises TypeError with vLLM >= 0.19.0
Parameters explicitly set by the user (e.g. temperature=0.7) are still forwarded correctly
best_of is still forwarded when explicitly set to a non-None value (for older vLLM versions)

Changed files

llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/base.py (modified, +1/-1)

Code Example

File .../llama_index/llms/vllm/base.py:292, in Vllm.complete(self, prompt, formatted, **kwargs)                                          
  --> 292     sampling_params = SamplingParams(**params)
  TypeError: Unexpected keyword argument 'best_of'

---

from llama_index.llms.vllm import Vllm                                                                                                 
  from vllm import SamplingParams                                                                                                          
  from llama_index.core.base.llms.types import CompletionResponse
                                                                                                                                           
  def patched_complete(self, prompt, formatted=False, **kwargs):                                                                           
      params = {
          "temperature": kwargs.get("temperature", 0.1),                                                                                   
          "top_p": kwargs.get("top_p", 0.9),                                                                                             
          "max_tokens": kwargs.get("max_tokens", 1024),
      }                                                                                                                                    
      sampling_params = SamplingParams(**params)
      output = self._client.generate(prompt, sampling_params)                                                                              
      return CompletionResponse(text=output[0].outputs[0].text.strip())                                                                    
                                              
  Vllm.complete = patched_complete

---

if self.best_of is not None:
      base_kwargs["best_of"] = self.best_of

---

pip install llama-index==0.14.20 llama-index-llms-vllm==0.7.0 vllm==0.19.0

---

from llama_index.llms.vllm import Vllm
                                                                                                                                           
     llm = Vllm("/models/your-model", vllm_kwargs={"gpu_memory_utilization": 0.6})                                                         
     response = llm.complete("Hello")

---

File .../llama_index/core/llms/callbacks.py:447, in llm_completion_callback.<locals>.wrap.<locals>.wrapped_llm_predict                   
      447     f_return_val = f(_self, *args, **kwargs)                                                                                     
                                                                                                                                           
  File .../llama_index/llms/vllm/base.py:287, in Vllm.complete(self, prompt, formatted, **kwargs)                                          
      287     params = {**self._model_kwargs, **kwargs}
      289     from vllm import SamplingParams                                                                                              
      291     # build sampling parameters                                                                                                  
  --> 292     sampling_params = SamplingParams(**params)                                                                                   
      293     outputs = self._client.generate([prompt], sampling_params)                                                                   
      294     return CompletionResponse(text=outputs[0].outputs[0].text)
                                                                                                                                           
  TypeError: Unexpected keyword argument 'best_of'

RAW_BUFFERClick to expand / collapse

Bug Description

llama-index-llms-vllm==0.7.0 hardcodes best_of in _model_kwargs (base.py:250), but vllm>=0.19.0 removed best_of from SamplingParams. This causes a TypeError on every .complete() call, making the wrapper completely unusable with recent vLLM versions.

Environment

llama-index-llms-vllm: 0.7.0
vllm: 0.19.0

Traceback

File .../llama_index/llms/vllm/base.py:292, in Vllm.complete(self, prompt, formatted, **kwargs)                                          
--> 292     sampling_params = SamplingParams(**params)
TypeError: Unexpected keyword argument 'best_of'

Root cause

_model_kwargs unconditionally includes best_of:

base.py:250

"best_of": self.best_of,

This is then passed into SamplingParams(**params), which no longer accepts it.

Workaround

Patch the complete method to rebuild SamplingParams without best_of:

from llama_index.llms.vllm import Vllm                                                                                                 
from vllm import SamplingParams                                                                                                          
from llama_index.core.base.llms.types import CompletionResponse
                                                                                                                                         
def patched_complete(self, prompt, formatted=False, **kwargs):                                                                           
    params = {
        "temperature": kwargs.get("temperature", 0.1),                                                                                   
        "top_p": kwargs.get("top_p", 0.9),                                                                                             
        "max_tokens": kwargs.get("max_tokens", 1024),
    }                                                                                                                                    
    sampling_params = SamplingParams(**params)
    output = self._client.generate(prompt, sampling_params)                                                                              
    return CompletionResponse(text=output[0].outputs[0].text.strip())                                                                    
                                            
Vllm.complete = patched_complete

Suggested fix

Guard best_of in _model_kwargs so it is only included when set:

if self.best_of is not None:
    base_kwargs["best_of"] = self.best_of

Version

llama-index: 0.14.20, llama-index-llms-vllm: 0.7.0

Steps to Reproduce

Install dependencies:

     pip install llama-index==0.14.20 llama-index-llms-vllm==0.7.0 vllm==0.19.0

Run the following code:

from llama_index.llms.vllm import Vllm
                                                                                                                                           
     llm = Vllm("/models/your-model", vllm_kwargs={"gpu_memory_utilization": 0.6})                                                         
     response = llm.complete("Hello")

Observe the error:
TypeError: Unexpected keyword argument 'best_of'

Relevant Logs/Tracebacks

File .../llama_index/core/llms/callbacks.py:447, in llm_completion_callback.<locals>.wrap.<locals>.wrapped_llm_predict                   
      447     f_return_val = f(_self, *args, **kwargs)                                                                                     
                                                                                                                                           
  File .../llama_index/llms/vllm/base.py:287, in Vllm.complete(self, prompt, formatted, **kwargs)                                          
      287     params = {**self._model_kwargs, **kwargs}
      289     from vllm import SamplingParams                                                                                              
      291     # build sampling parameters                                                                                                  
  --> 292     sampling_params = SamplingParams(**params)                                                                                   
      293     outputs = self._client.generate([prompt], sampling_params)                                                                   
      294     return CompletionResponse(text=outputs[0].outputs[0].text)
                                                                                                                                           
  TypeError: Unexpected keyword argument 'best_of'

extent analysis

TL;DR

The issue can be fixed by guarding the best_of parameter in _model_kwargs to only include it when set, or by applying a workaround patch to the complete method.

Guidance

The error occurs because llama-index-llms-vllm version 0.7.0 hardcodes best_of in _model_kwargs, which is no longer accepted by vllm version 0.19.0.
To verify the issue, run the provided steps to reproduce, which install the dependencies and run a code snippet that triggers the error.
A potential fix is to modify the _model_kwargs to conditionally include best_of only when it is set, as suggested in the issue.
Alternatively, the provided workaround patch can be applied to the complete method to rebuild SamplingParams without best_of.

Example

if self.best_of is not None:
    base_kwargs["best_of"] = self.best_of

This code snippet shows how to guard the best_of parameter in _model_kwargs.

Notes

The provided workaround patch and suggested fix assume that the best_of parameter is not essential for the functionality of the complete method. If best_of is required, a different solution may be needed.

Recommendation

Apply the workaround patch to the complete method, as it provides a temporary solution until a permanent fix is available. This approach allows for continued use of the llama-index-llms-vllm library with vllm version 0.19.0.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#dependency conflict #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

llamaIndex - ✅(Solved) Fix [Bug]: llama-index-llms-vllm: best_of removed from vLLM SamplingParams causes TypeError on .complete() [2 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #21372: fix(vllm): filter None kwargs before passing to SamplingParams

Description (problem / solution / changelog)

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Changed files

PR #21375: fix(vllm): filter None kwargs before passing to SamplingParams

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Bug Description

base.py:250

Version

Steps to Reproduce

Steps to Reproduce

Relevant Logs/Tracebacks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING