PR fix notes

PR #37107: [Model] Add HyperCLOVAX-SEED-Think-14B language model support

Repository: vllm-project/vllm
Author: bigshanedogg
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/37107

Description (problem / solution / changelog)

Purpose

Add inference support for HyperCLOVA X (HyperCLOVAXForCausalLM), a large language model family developed by NAVER Cloud.

https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B

Changes

vllm/model_executor/models/hyperclovax.py (new) — HyperCLOVAXForCausalLM model implementation
vllm/transformers_utils/configs/hyperclovax.py (new) — HyperCLOVAXConfig configuration class
vllm/model_executor/models/registry.py — Register HyperCLOVAXForCausalLM
vllm/transformers_utils/configs/__init__.py — Register HyperCLOVAXConfig
docs/models/supported_models.md — Add HyperCLOVAXForCausalLM entry
tests/models/registry.py — Add test registry entry (naver-hyperclovax/HyperCLOVAX-SEED-Think-14B)
tests/models/language/generation/test_common.py — Add HyperCLOVAXForCausalLM to common generation tests

Test Plan

Launch server

  vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B \
    --max-model-len 32768 \
    --max-num-batched-tokens 16384 \
    --tensor-parallel-size 1 \
    --trust-remote-code \
    --enable-prefix-caching

Test Result

Benchmark validation

Tasks	Metric	vLLM (this PR)
hellaswag	acc_norm	0.6521
gsm8k	flexible-extract	0.9484

Evaluated with lm-evaluation-harness defaults and default sampling params for server validation.

Request

client

import requests

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please briefly explain what you can help with. Think carefully before answering."},
            ],
        }
    ],
    "temperature": 0.2,
    "skip_special_tokens": False,
    "stop": ["<|im_end|><|endofturn|>", "<|im_end|><|stop|>"],
    "chat_template_kwargs": {"skip_reasoning": True},
}

resp = requests.post(
    f"http://{url}/v1/chat/completions", 
    json=payload, 
    timeout=300,
)
resp.raise_for_status()

data = resp.json()
print(data["choices"][0]["message"].get("content"))

output

Okay, the user is asking me to briefly explain what I can help with. Let me start by recalling my capabilities. I know I can answer questions, provide explanations, assist with learning, help brainstorm ideas, and offer suggestions. But I should make sure not to overstate what I can do.

Wait, I should also mention that I can't access real-time information or perform physical actions. That's important to set the right expectations. Maybe start by listing the main areas: answering questions, explaining concepts, helping with tasks like writing or coding, and offering recommendations. But keep it concise since they asked for a brief explanation.

Hmm, should I include examples? The user might appreciate a quick list of specific areas. Like, "I can help with homework, language translation, coding problems, creative writing, and more." Also, clarify that I rely on existing knowledge up to my last update in July 2024. Oh right, and I can't browse the internet or access personal data unless shared in the conversation. Privacy is a key point here.

Wait, the user said "think carefully before answering," so maybe I should structure it clearly. Start with a general statement about assisting with information and tasks, then list key areas, mention limitations, and ensure it's all in a few short sentences. Let me check if I missed anything. Oh, yes, I should avoid jargon and keep it simple. Alright, time to put it all together concisely.<|im_end|>
<|im_start|>assistant
I can assist with providing information, explanations, and guidance across a wide range of topics, including:  
- **Answering questions** (science, history, technology, etc.).  
- **Explaining concepts** (math, programming, philosophy, etc.).  
- **Helping with tasks** (writing, editing, coding, problem-solving).  
- **Offering recommendations** (books, learning resources, strategies).  
- **Brainstorming ideas** (creative projects, studies, discussions).  

**Limitations**: I cannot access real-time data, perform physical actions, or retrieve personal information unless shared during our conversation. My knowledge is current up to July 2024. Let me know how I can assist! 😊

Changed files

docs/models/supported_models.md (modified, +1/-0)
tests/models/language/generation/test_common.py (modified, +4/-0)
tests/models/registry.py (modified, +1/-1)
vllm/model_executor/models/hyperclovax.py (added, +551/-0)
vllm/model_executor/models/registry.py (modified, +1/-1)
vllm/transformers_utils/configs/__init__.py (modified, +2/-0)
vllm/transformers_utils/configs/hyperclovax.py (added, +277/-0)

It would be great to add native support for HyperCLOVA X SEED Think 14B to the Transformers library, so users can load it without trust_remote_code=True. In addition, this model is intended to serve as the backbone for future multimodal models to be released on the Hugging Face Hub. Without native Transformers support, every new model variant must bundle its own copy of modeling_hyperclovax.py, leading to code duplication, and increased maintenance burden.

Model description

HyperCLOVA X SEED Think 14B is a 14.74B-parameter reasoning LLM developed by NAVER Cloud. It is a LLaMA-style decoder-only transformer with two architectural modifications not present in standard LLaMA:

Peri-Layer Normalization: an extra RMSNorm is applied after each sub-layer output (in addition to the standard pre-norm), controlled by a use_post_norm config flag.
Maximal Update Parametrization (μP): per-config scaling factors (attention_multiplier, residual_multiplier, embedding_multiplier, logits_scaling) replace the standard fixed scaling, enabling stable training across model sizes.

The model supports dual-mode reasoning: Think (chain-of-thought before answering) and Non-Think (direct answer), switchable via apply_chat_template(force_reasoning=True/False). It also supports function calling via a custom ChatML dialect. The model is supported in vLLM as of March 2026.

I checked that no existing PR covers this. I have also prepared a draft PR (#44956) in case it is helpful for the discussion or review.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Huggingface hub: naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
Technical report: arXiv 2506.22403
vLLM upstream: vllm-project/vllm#37107 (merged 2026-03-16)

extent analysis

Fix Plan

To add native support for HyperCLOVA X SEED Think 14B to the Transformers library, follow these steps:

Create a new file models/hyperclovax/modeling_hyperclovax.py with the following code:

from transformers import PreTrainedModel
from transformers.modeling_utils import PreTrainedModel, apply_chunking_to_forward

class HyperCLOVAXModel(PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.use_post_norm = config.use_post_norm
        # ... other config parameters ...

    def forward(self, input_ids, attention_mask, **kwargs):
        # ... implement forward pass with peri-layer normalization and maximal update parametrization ...
        if self.use_post_norm:
            # apply post-norm RMSNorm
            pass
        # ... other forward pass logic ...

Update models/hyperclovax/configuration_hyperclovax.py to include the new model configuration:

from transformers import PretrainedConfig

class HyperCLOVAXConfig(PretrainedConfig):
    model_type = "hyperclovax"
    def __init__(self, use_post_norm=False, attention_multiplier=1.0, residual_multiplier=1.0, embedding_multiplier=1.0, logits_scaling=1.0, **kwargs):
        super().__init__(**kwargs)
        self.use_post_norm = use_post_norm
        self.attention_multiplier = attention_multiplier
        self.residual_multiplier = residual_multiplier
        self.embedding_multiplier = embedding_multiplier
        self.logits_scaling = logits_scaling

Add a new model class to models/hyperclovax/__init__.py:

from .modeling_hyperclovax import HyperCLOVAXModel
from .configuration_hyperclovax import HyperCLOVAXConfig

class HyperCLOVAXForCausalLM(HyperCLOVAXModel):
    def __init__(self, config):
        super().__init__(config)
        # ... implement causal language modeling head ...

Update transformers/modeling_utils.py to include the new model in the MODEL_MAPPING dictionary:

MODEL_MAPPING = {
    # ... other models ...
    "hyperclovax": HyperCLOVAXForCausalLM,
}

Verification

To verify that the fix worked, run the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer("Hello, world!", return_tensors="pt").input_ids
output = model.generate(input

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix Add HyperCLOVA X SEED Think 14B [1 pull requests, 11 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #37107: [Model] Add HyperCLOVAX-SEED-Think-14B language model support

Description (problem / solution / changelog)

Purpose

Changes

Test Plan

Launch server

Test Result

Benchmark validation

Request

client

output

Changed files

Model description

Open source status

Provide useful links for the implementation

extent analysis

Fix Plan

Verification

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix Add HyperCLOVA X SEED Think 14B [1 pull requests, 11 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #37107: [Model] Add HyperCLOVAX-SEED-Think-14B language model support

Description (problem / solution / changelog)

Purpose

Changes

Test Plan

Launch server

Test Result

Benchmark validation

Request

client

output

Changed files

Model description

Open source status

Provide useful links for the implementation

extent analysis

Fix Plan

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING