vllm - 💡(How to fix) Fix [Feature]: W6A16 Support [1 participants]

vllm2026-03-12 19:10:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36916•Fetched 2026-04-08 00:43:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

frenzybiscuit

Participants

frenzybiscuit

Timeline (top)

labeled ×1

RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Please consider adding W6A16 support for VLLM/LLM-Compressor.

I'm aware it may be as slow as W8A16. My priority is VRAM and accuracy. W4A16 is good, but not accurate enough for me. I am VRAM constrained even with 4 GPU as I use high context.

I have multiple 3090's. Last I checked, I cannot quant K,V cache on Ampere as VLLM does not support it. This makes it difficult for me.

Alternatives

None

Additional context

N/A

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To add W6A16 support for VLLM/LLM-Compressor, we need to modify the existing code to accommodate the new data type.

Update the data type enumeration to include W6A16
Modify the quantization function to support W6A16
Update the cache storage to handle W6A16 data

Example Code

# Update data type enumeration
from enum import Enum
class DataType(Enum):
    W4A16 = 1
    W6A16 = 2
    W8A16 = 3

# Modify quantization function
def quantize(data, data_type):
    if data_type == DataType.W6A16:
        # Implement W6A16 quantization logic
        return data >> 2
    elif data_type == DataType.W4A16:
        # Implement W4A16 quantization logic
        return data >> 4
    elif data_type == DataType.W8A16:
        # Implement W8A16 quantization logic
        return data

# Update cache storage
class Cache:
    def __init__(self, data_type):
        self.data_type = data_type
        self.cache = {}

    def store(self, key, value):
        if self.data_type == DataType.W6A16:
            # Store W6A16 data in cache
            self.cache[key] = value
        else:
            # Store other data types in cache
            self.cache[key] = value

Verification

To verify the fix, test the updated code with W6A16 data and check for accuracy and VRAM usage.

Extra Tips

Make sure to update the documentation to reflect the new data type support.
Test the updated code thoroughly to ensure it works as expected.
Consider adding support for other data types in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Feature]: W6A16 Support [1 participants]

Recommended Tools

GitHub issue graph ai analysis

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Feature]: W6A16 Support [1 participants]

Recommended Tools

GitHub issue graph ai analysis

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING