vllm - ✅(Solved) Fix [Feature] Support passing configuration to custom attention backends [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40051Fetched 2026-04-17 08:27:27
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Timeline (top)
commented ×1cross-referenced ×1mentioned ×1referenced ×1

Fix Action

Fix / Workaround

At the moment, we're using class-level attributes as a workaround, but would love to have a more official way to do this.

PR fix notes

PR #40065: feat: support passing configuration to custom attention backends

Description (problem / solution / changelog)

What's broken?

Custom attention backends registered via register_backend() have no way to receive configuration. The vllm.general_plugins entry points are zero-argument callables, and AttentionImpl.__init__ has a fixed constructor signature — there is no sanctioned injection point for plugin authors to pass config to their custom backends.

Workarounds like class-level attributes are fragile and not discoverable.

Ref: #40051

Who is affected?

Anyone building custom attention backends (e.g. observability tools like glassbox) that need runtime configuration. Existing built-in backends and users not passing backend_config are not affected — this is fully backward-compatible.

How did we fix it?

Added a backend_config: Any = None parameter to register_backend() in vllm/v1/attention/backends/registry.py. The value is stored in new module-level dicts (_ATTN_BACKEND_CONFIGS / _MAMBA_ATTN_BACKEND_CONFIGS) alongside the existing _ATTN_OVERRIDES.

New public API surface:

  • register_backend(..., backend_config={"key": "value"}) — store config at registration time
  • backend.get_config() — enum method to retrieve config (returns None if not set)
  • backend.clear_config() — enum method to clear stored config
  • get_backend_config(backend) — module-level function (equivalent to backend.get_config())

Usage example:

# In your plugin's entry point:
from vllm.v1.attention.backends.registry import (
    AttentionBackendEnum, register_backend
)

register_backend(
    AttentionBackendEnum.CUSTOM,
    "my.module.MyObservabilityBackend",
    backend_config={"log_level": "debug", "sample_rate": 0.1},
)

# In your backend class:
from vllm.v1.attention.backends.registry import get_backend_config, AttentionBackendEnum

class MyObservabilityBackend(AttentionBackend):
    def __init__(self, ...):
        config = get_backend_config(AttentionBackendEnum.CUSTOM)
        # config == {"log_level": "debug", "sample_rate": 0.1}

Design choices:

  • Pull-based pattern (backend reads config at init time) — matches how _ATTN_OVERRIDES already works
  • No changes to AttentionImpl.__init__ signature — zero disruption to existing backends
  • Config is set per-process when the plugin loads — matches subprocess behavior documented in plugin docs
  • Works with both decorator and direct-registration forms of register_backend()

How do we know it works?

  • 5 new unit tests added to tests/test_attention_backend_registry.py:
    • test_register_backend_config_with_class_path — config via direct registration
    • test_register_backend_config_with_decorator — config via @register_backend decorator
    • test_register_mamba_backend_config — config for MambaAttentionBackendEnum
    • test_backend_config_defaults_to_none — backward compat: no config = None
    • test_backend_config_clearclear_config() removes stored config
  • All existing tests continue to pass unchanged
  • ruff check and ruff format --check pass

Changes

  • vllm/v1/attention/backends/registry.py: Added backend_config param, config storage dicts, get_config()/clear_config() enum methods, get_backend_config() function
  • tests/test_attention_backend_registry.py: 5 new tests covering the config mechanism

Changed files

  • tests/test_attention_backend_registry.py (modified, +96/-0)
  • vllm/v1/attention/backends/registry.py (modified, +63/-3)
RAW_BUFFERClick to expand / collapse

Motivation

We're building glassbox, an observability tool that is implemented as a custom attention backend, and registered via vllm.general_plugins and register_backend(AttentionBackendEnum.CUSTOM, ...).

There's some configuration that we need to pass to the custom backend, but as far as we can tell, there's no mechanism to do so:

  • Plugin side: vllm.general_plugins entry points are zero-argument callables — no config, context, or kwargs are passed in when they're loaded (source).
  • Backend side: register_backend() maps an enum to a class path string. The backend Impl class is instantiated by vLLM with a fixed constructor signature (num_heads, head_size, etc.) — no user-config parameter (source).
  • Subprocess behavior: The plugin docs note that plugins "can be loaded multiple times in different processes" but there's no mechanism to coordinate state or configuration across those loads.

At the moment, we're using class-level attributes as a workaround, but would love to have a more official way to do this.

If we're missing an existing mechanism for this, we'd appreciate being pointed in the right direction. Otherwise, we'd love to contribute a solution — for example, allowing register_backend to accept a config or factory callable, or adding a backend_config parameter to the instantiation path.

Happy to discuss the right approach, and thank you!

extent analysis

TL;DR

Passing configuration to a custom attention backend in vLLM may require modifying the register_backend function to accept a config or factory callable.

Guidance

  • Review the vllm.general_plugins entry points to confirm that no configuration or context is passed to the loaded plugins.
  • Investigate the register_backend function in vllm/v1/attention/backends/registry.py to determine if there's a way to extend its functionality to accept a config parameter.
  • Consider proposing a change to the register_backend function to accept a factory callable that can instantiate the backend with custom configuration.
  • Explore the possibility of adding a backend_config parameter to the instantiation path of the backend Impl class.

Example

No code snippet is provided as the issue lacks specific implementation details.

Notes

The current workaround using class-level attributes may not be suitable for production use, and a more official solution is needed. The proposed changes to register_backend or the addition of a backend_config parameter would require discussion and agreement with the vLLM maintainers.

Recommendation

Apply a workaround, such as using a factory callable, until a more official solution is implemented, as it allows for more flexibility and customization of the backend configuration.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Feature] Support passing configuration to custom attention backends [1 pull requests, 1 comments, 2 participants]