transformers - ✅(Solved) Fix Add HyperCLOVA X SEED Think 14B [1 pull requests, 11 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44957Fetched 2026-04-08 01:21:32
View on GitHub
Comments
11
Participants
3
Timeline
40
Reactions
0
Timeline (top)
subscribed ×13mentioned ×12commented ×11cross-referenced ×2

PR fix notes

PR #37107: [Model] Add HyperCLOVAX-SEED-Think-14B language model support

Description (problem / solution / changelog)

Purpose

Add inference support for HyperCLOVA X (HyperCLOVAXForCausalLM), a large language model family developed by NAVER Cloud.

Changes

  • vllm/model_executor/models/hyperclovax.py (new) — HyperCLOVAXForCausalLM model implementation
  • vllm/transformers_utils/configs/hyperclovax.py (new) — HyperCLOVAXConfig configuration class
  • vllm/model_executor/models/registry.py — Register HyperCLOVAXForCausalLM
  • vllm/transformers_utils/configs/__init__.py — Register HyperCLOVAXConfig
  • docs/models/supported_models.md — Add HyperCLOVAXForCausalLM entry
  • tests/models/registry.py — Add test registry entry (naver-hyperclovax/HyperCLOVAX-SEED-Think-14B)
  • tests/models/language/generation/test_common.py — Add HyperCLOVAXForCausalLM to common generation tests

Test Plan

Launch server

  vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B \
    --max-model-len 32768 \
    --max-num-batched-tokens 16384 \
    --tensor-parallel-size 1 \
    --trust-remote-code \
    --enable-prefix-caching

Test Result

Benchmark validation

TasksMetricvLLM (this PR)
hellaswagacc_norm0.6521
gsm8kflexible-extract0.9484

Evaluated with lm-evaluation-harness defaults and default sampling params for server validation.

Request

client

import requests

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please briefly explain what you can help with. Think carefully before answering."},
            ],
        }
    ],
    "temperature": 0.2,
    "skip_special_tokens": False,
    "stop": ["<|im_end|><|endofturn|>", "<|im_end|><|stop|>"],
    "chat_template_kwargs": {"skip_reasoning": True},
}

resp = requests.post(
    f"http://{url}/v1/chat/completions", 
    json=payload, 
    timeout=300,
)
resp.raise_for_status()

data = resp.json()
print(data["choices"][0]["message"].get("content"))

output

Okay, the user is asking me to briefly explain what I can help with. Let me start by recalling my capabilities. I know I can answer questions, provide explanations, assist with learning, help brainstorm ideas, and offer suggestions. But I should make sure not to overstate what I can do.

Wait, I should also mention that I can't access real-time information or perform physical actions. That's important to set the right expectations. Maybe start by listing the main areas: answering questions, explaining concepts, helping with tasks like writing or coding, and offering recommendations. But keep it concise since they asked for a brief explanation.

Hmm, should I include examples? The user might appreciate a quick list of specific areas. Like, "I can help with homework, language translation, coding problems, creative writing, and more." Also, clarify that I rely on existing knowledge up to my last update in July 2024. Oh right, and I can't browse the internet or access personal data unless shared in the conversation. Privacy is a key point here.

Wait, the user said "think carefully before answering," so maybe I should structure it clearly. Start with a general statement about assisting with information and tasks, then list key areas, mention limitations, and ensure it's all in a few short sentences. Let me check if I missed anything. Oh, yes, I should avoid jargon and keep it simple. Alright, time to put it all together concisely.<|im_end|>
<|im_start|>assistant
I can assist with providing information, explanations, and guidance across a wide range of topics, including:  
- **Answering questions** (science, history, technology, etc.).  
- **Explaining concepts** (math, programming, philosophy, etc.).  
- **Helping with tasks** (writing, editing, coding, problem-solving).  
- **Offering recommendations** (books, learning resources, strategies).  
- **Brainstorming ideas** (creative projects, studies, discussions).  

**Limitations**: I cannot access real-time data, perform physical actions, or retrieve personal information unless shared during our conversation. My knowledge is current up to July 2024. Let me know how I can assist! 😊

Changed files

  • docs/models/supported_models.md (modified, +1/-0)
  • tests/models/language/generation/test_common.py (modified, +4/-0)
  • tests/models/registry.py (modified, +1/-1)
  • vllm/model_executor/models/hyperclovax.py (added, +551/-0)
  • vllm/model_executor/models/registry.py (modified, +1/-1)
  • vllm/transformers_utils/configs/__init__.py (modified, +2/-0)
  • vllm/transformers_utils/configs/hyperclovax.py (added, +277/-0)
RAW_BUFFERClick to expand / collapse

It would be great to add native support for HyperCLOVA X SEED Think 14B to the Transformers library, so users can load it without trust_remote_code=True. In addition, this model is intended to serve as the backbone for future multimodal models to be released on the Hugging Face Hub. Without native Transformers support, every new model variant must bundle its own copy of modeling_hyperclovax.py, leading to code duplication, and increased maintenance burden.

Model description

HyperCLOVA X SEED Think 14B is a 14.74B-parameter reasoning LLM developed by NAVER Cloud. It is a LLaMA-style decoder-only transformer with two architectural modifications not present in standard LLaMA:

  • Peri-Layer Normalization: an extra RMSNorm is applied after each sub-layer output (in addition to the standard pre-norm), controlled by a use_post_norm config flag.
  • Maximal Update Parametrization (μP): per-config scaling factors (attention_multiplier, residual_multiplier, embedding_multiplier, logits_scaling) replace the standard fixed scaling, enabling stable training across model sizes.

The model supports dual-mode reasoning: Think (chain-of-thought before answering) and Non-Think (direct answer), switchable via apply_chat_template(force_reasoning=True/False). It also supports function calling via a custom ChatML dialect. The model is supported in vLLM as of March 2026.

I checked that no existing PR covers this. I have also prepared a draft PR (#44956) in case it is helpful for the discussion or review.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

extent analysis

Fix Plan

To add native support for HyperCLOVA X SEED Think 14B to the Transformers library, follow these steps:

  • Create a new file models/hyperclovax/modeling_hyperclovax.py with the following code:
from transformers import PreTrainedModel
from transformers.modeling_utils import PreTrainedModel, apply_chunking_to_forward

class HyperCLOVAXModel(PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.use_post_norm = config.use_post_norm
        # ... other config parameters ...

    def forward(self, input_ids, attention_mask, **kwargs):
        # ... implement forward pass with peri-layer normalization and maximal update parametrization ...
        if self.use_post_norm:
            # apply post-norm RMSNorm
            pass
        # ... other forward pass logic ...
  • Update models/hyperclovax/configuration_hyperclovax.py to include the new model configuration:
from transformers import PretrainedConfig

class HyperCLOVAXConfig(PretrainedConfig):
    model_type = "hyperclovax"
    def __init__(self, use_post_norm=False, attention_multiplier=1.0, residual_multiplier=1.0, embedding_multiplier=1.0, logits_scaling=1.0, **kwargs):
        super().__init__(**kwargs)
        self.use_post_norm = use_post_norm
        self.attention_multiplier = attention_multiplier
        self.residual_multiplier = residual_multiplier
        self.embedding_multiplier = embedding_multiplier
        self.logits_scaling = logits_scaling
  • Add a new model class to models/hyperclovax/__init__.py:
from .modeling_hyperclovax import HyperCLOVAXModel
from .configuration_hyperclovax import HyperCLOVAXConfig

class HyperCLOVAXForCausalLM(HyperCLOVAXModel):
    def __init__(self, config):
        super().__init__(config)
        # ... implement causal language modeling head ...
  • Update transformers/modeling_utils.py to include the new model in the MODEL_MAPPING dictionary:
MODEL_MAPPING = {
    # ... other models ...
    "hyperclovax": HyperCLOVAXForCausalLM,
}

Verification

To verify that the fix worked, run the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_ids = tokenizer("Hello, world!", return_tensors="pt").input_ids
output = model.generate(input

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING