vllm - ✅(Solved) Fix [Feature] Support passing configuration to custom attention backends [1 pull requests, 1 comments, 2 participants]

dmaniloff · 2026-04-16T18:40:16Z

[vllm] PR 40065: feat: support passing configuration to custom attention backends - Repository: vllm-project/vllm - Author: ianliuy - State: open | merged: Fal… # PR #40065: feat: support passing configuration to custom attention backends - Repository: vllm-project/vllm - Author: ianliuy - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/40065 ## Description (problem / solution / changelog) ## What's broken? Custom attention backends registered via `register_backend()` have no way to receive configuration. The `vllm.general_plugins` entry points are zero-argument callables, and `AttentionImpl.__init__` has a fixed constructor signature — there is no sanctioned injection point for plugin authors to pass config to their custom backends. Workarounds like class-level attributes are fragile and not discoverable. Ref: #40051 ## Who is affected? Anyone building custom attention backends (e.g. observability tools like [glassbox](https://github.com/dmaniloff/glassbox)) that need runtime configuration. Existing built-in backends and users not passing `backend_config` are **not affected** — this is fully backward-compatible. ## How did we fix it? Added a `backend_config: Any = None` parameter to `register_backend()` in `vllm/v1/attention/backends/registry.py`. The value is stored in new module-level dicts (`_ATTN_BACKEND_CONFIGS` / `_MAMBA_ATTN_BACKEND_CONFIGS`) alongside the existing `_ATTN_OVERRIDES`. **New public API surface:** - `register_backend(..., backend_config={"key": "value"})` — store config at registration time - `backend.get_config()` — enum method to retrieve config (returns `None` if not set) - `backend.clear_config()` — enum method to clear stored config - `get_backend_config(backend)` — module-level function (equivalent to `backend.get_config()`) **Usage example:** ```python # In your plugin's entry point: from vllm.v1.attention.backends.registry import ( AttentionBackendEnum, register_backend ) register_backend( AttentionBackendEnum.CUSTOM, "my.module.MyObservabilityBackend", backend_config={"log_level": "debug", "sample_rate": 0.1}, ) # In your backend class: from vllm.v1.attention.backends.registry import get_backend_config, AttentionBackendEnum class MyObservabilityBackend(AttentionBackend): def __init__(self, ...): config = get_backend_config(AttentionBackendEnum.CUSTOM) # config == {"log_level": "debug", "sample_rate": 0.1} ``` **Design choices:** - Pull-based pattern (backend reads config at init time) — matches how `_ATTN_OVERRIDES` already works - No changes to `AttentionImpl.__init__` signature — zero disruption to existing backends - Config is set per-process when the plugin loads — matches subprocess behavior documented in plugin docs - Works with both decorator and direct-registration forms of `register_backend()` ## How do we know it works? - 5 new unit tests added to `tests/test_attention_backend_registry.py`: - `test_register_backend_config_with_class_path` — config via direct registration - `test_register_backend_config_with_decorator` — config via `@register_backend` decorator - `test_register_mamba_backend_config` — config for `MambaAttentionBackendEnum` - `test_backend_config_defaults_to_none` — backward compat: no config = `None` - `test_backend_config_clear` — `clear_config()` removes stored config - All existing tests continue to pass unchanged - `ruff check` and `ruff format --check` pass ## Changes - `vllm/v1/attention/backends/registry.py`: Added `backend_config` param, config storage dicts, `get_config()`/`clear_config()` enum methods, `get_backend_config()` function - `tests/test_attention_backend_registry.py`: 5 new tests covering the config mechanism ## Changed files - `tests/test_attention_backend_registry.py` (modified, +96/-0) - `vllm/v1/attention/backends/registry.py` (modified, +63/-3) ## Fix / Workaround At the moment, we're using class-level attributes as a workaround, but would love to have a more official way to do this. ### Motivation We're building [glassbox](https://github.com/dmaniloff/glassbox), an observability tool that is implemented as a custom attention backend, and registered via `vllm.general_plugins` and `register_backend(AttentionBackendEnum.CUSTOM, ...)`. There's some configuration that we need to pass to the custom backend, but as far as we can tell, there's no mechanism to do so: - **Plugin side:** `vllm.general_plugins` entry points are zero-argument callables — no config, context, or kwargs are passed in when they're loaded ([source](https://github.com/vllm-project/vllm/blob/main/vllm/plugins/__init__.py)). - **Backend side:** `register_backend()` maps an enum to a class path string. The backend `Impl` class is instantiated by vLLM with a fixed constructor signature (`num_heads`, `head_size`, etc.) — no user-config parameter ([source](https://github.com/vllm-project/vllm/blob/main/vllm/v1/attention/backends/registry.py)). - **Subprocess behavior:** The plugin docs note that plugins "can be loaded multiple times in

vllm2026-04-16 18:40:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40051•Fetched 2026-04-17 08:27:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dmaniloff

Participants

devarakondasrikanth

dmaniloff

Timeline (top)

commented ×1cross-referenced ×1mentioned ×1referenced ×1

Fix Action

Fix / Workaround

At the moment, we're using class-level attributes as a workaround, but would love to have a more official way to do this.

PR fix notes

PR #40065: feat: support passing configuration to custom attention backends

Repository: vllm-project/vllm
Author: ianliuy
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40065

Description (problem / solution / changelog)

What's broken?

Custom attention backends registered via register_backend() have no way to receive configuration. The vllm.general_plugins entry points are zero-argument callables, and AttentionImpl.__init__ has a fixed constructor signature — there is no sanctioned injection point for plugin authors to pass config to their custom backends.

Workarounds like class-level attributes are fragile and not discoverable.

Ref: #40051

Who is affected?

Anyone building custom attention backends (e.g. observability tools like glassbox) that need runtime configuration. Existing built-in backends and users not passing backend_config are not affected — this is fully backward-compatible.

How did we fix it?

Added a backend_config: Any = None parameter to register_backend() in vllm/v1/attention/backends/registry.py. The value is stored in new module-level dicts (_ATTN_BACKEND_CONFIGS / _MAMBA_ATTN_BACKEND_CONFIGS) alongside the existing _ATTN_OVERRIDES.

New public API surface:

register_backend(..., backend_config={"key": "value"}) — store config at registration time
backend.get_config() — enum method to retrieve config (returns None if not set)
backend.clear_config() — enum method to clear stored config
get_backend_config(backend) — module-level function (equivalent to backend.get_config())

Usage example:

# In your plugin's entry point:
from vllm.v1.attention.backends.registry import (
    AttentionBackendEnum, register_backend
)

register_backend(
    AttentionBackendEnum.CUSTOM,
    "my.module.MyObservabilityBackend",
    backend_config={"log_level": "debug", "sample_rate": 0.1},
)

# In your backend class:
from vllm.v1.attention.backends.registry import get_backend_config, AttentionBackendEnum

class MyObservabilityBackend(AttentionBackend):
    def __init__(self, ...):
        config = get_backend_config(AttentionBackendEnum.CUSTOM)
        # config == {"log_level": "debug", "sample_rate": 0.1}

Design choices:

Pull-based pattern (backend reads config at init time) — matches how _ATTN_OVERRIDES already works
No changes to AttentionImpl.__init__ signature — zero disruption to existing backends
Config is set per-process when the plugin loads — matches subprocess behavior documented in plugin docs
Works with both decorator and direct-registration forms of register_backend()

How do we know it works?

5 new unit tests added to tests/test_attention_backend_registry.py:
- test_register_backend_config_with_class_path — config via direct registration
- test_register_backend_config_with_decorator — config via @register_backend decorator
- test_register_mamba_backend_config — config for MambaAttentionBackendEnum
- test_backend_config_defaults_to_none — backward compat: no config = None
- test_backend_config_clear — clear_config() removes stored config
All existing tests continue to pass unchanged
ruff check and ruff format --check pass

Changes

vllm/v1/attention/backends/registry.py: Added backend_config param, config storage dicts, get_config()/clear_config() enum methods, get_backend_config() function
tests/test_attention_backend_registry.py: 5 new tests covering the config mechanism

Changed files

tests/test_attention_backend_registry.py (modified, +96/-0)
vllm/v1/attention/backends/registry.py (modified, +63/-3)

RAW_BUFFERClick to expand / collapse

Motivation

We're building glassbox, an observability tool that is implemented as a custom attention backend, and registered via vllm.general_plugins and register_backend(AttentionBackendEnum.CUSTOM, ...).

There's some configuration that we need to pass to the custom backend, but as far as we can tell, there's no mechanism to do so:

Plugin side: vllm.general_plugins entry points are zero-argument callables — no config, context, or kwargs are passed in when they're loaded (source).
Backend side: register_backend() maps an enum to a class path string. The backend Impl class is instantiated by vLLM with a fixed constructor signature (num_heads, head_size, etc.) — no user-config parameter (source).
Subprocess behavior: The plugin docs note that plugins "can be loaded multiple times in different processes" but there's no mechanism to coordinate state or configuration across those loads.

At the moment, we're using class-level attributes as a workaround, but would love to have a more official way to do this.

If we're missing an existing mechanism for this, we'd appreciate being pointed in the right direction. Otherwise, we'd love to contribute a solution — for example, allowing register_backend to accept a config or factory callable, or adding a backend_config parameter to the instantiation path.

Happy to discuss the right approach, and thank you!

extent analysis

TL;DR

Passing configuration to a custom attention backend in vLLM may require modifying the register_backend function to accept a config or factory callable.

Guidance

Review the vllm.general_plugins entry points to confirm that no configuration or context is passed to the loaded plugins.
Investigate the register_backend function in vllm/v1/attention/backends/registry.py to determine if there's a way to extend its functionality to accept a config parameter.
Consider proposing a change to the register_backend function to accept a factory callable that can instantiate the backend with custom configuration.
Explore the possibility of adding a backend_config parameter to the instantiation path of the backend Impl class.

Example

No code snippet is provided as the issue lacks specific implementation details.

Notes

The current workaround using class-level attributes may not be suitable for production use, and a more official solution is needed. The proposed changes to register_backend or the addition of a backend_config parameter would require discussion and agreement with the vLLM maintainers.

Recommendation

Apply a workaround, such as using a factory callable, until a more official solution is implemented, as it allows for more flexibility and customization of the backend configuration.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Feature] Support passing configuration to custom attention backends [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #40065: feat: support passing configuration to custom attention backends

Description (problem / solution / changelog)

What's broken?

Who is affected?

How did we fix it?

How do we know it works?

Changes

Changed files

Motivation

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Feature] Support passing configuration to custom attention backends [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #40065: feat: support passing configuration to custom attention backends

Description (problem / solution / changelog)

What's broken?

Who is affected?

How did we fix it?

How do we know it works?

Changes

Changed files

Motivation

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING