ollama - 💡(How to fix) Fix gemma4:26b (MoE) returns completely empty response (no content, no reasoning) and stops early on long system prompts [1 comments, 2 participants]

ollama2026-04-08 20:12:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15428•Fetched 2026-04-09 07:51:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

cenovioj-lifeline

Participants

cenovioj-lifeline

rick-github

Timeline (top)

subscribed ×2commented ×1

Code Example

# Create a 2000-character dummy system prompt
SYS_PROMPT=$(printf "System instruction. %.0s" {1..100})

# Send request to native API
curl -s http://localhost:11434/api/chat -d '{
  "model": "gemma4:26b",
  "messages": [
    {"role": "system", "content": "'"$SYS_PROMPT"'"},
    {"role": "user", "content": "Who won the 2025 NCAA mens basketball championship?"}
  ],
  "stream": false
}' | jq .

---

{
  "model": "gemma4:26b",
  "created_at": "2026-04-08T19:59:20.718104Z",
  "message": {
    "role": "assistant",
    "content": ""
  },
  "done": true,
  "done_reason": "stop",
  "total_duration": 1265782584,
  "load_duration": 137602292,
  "prompt_eval_count": 1423,
  "prompt_eval_duration": 192706834,
  "eval_count": 49,
  "eval_duration": 911844667
}

RAW_BUFFERClick to expand / collapse

gemma4:26b (MoE) returns completely empty response (no content, no reasoning) and stops early on long system prompts

What is the issue?

When using the Gemma 4 MoE model (gemma4:26b) via the native /api/chat endpoint, the model returns a completely empty response (empty content, missing reasoning, and done_reason: "stop") if the system prompt exceeds roughly 500 characters.

This is not the known issue with the OpenAI /v1/chat/completions endpoint where the output is moved to the reasoning field (as reported in #15288). In this case, the model evaluates ~49 tokens and stops immediately, producing absolutely no output.

If the system prompt is short (e.g., < 200 chars), the model works correctly and produces output. The same long prompt works perfectly on Dense models like gemma4:31b and gemma-d-cc:latest.

Steps to reproduce

Create a long system prompt (e.g., 2000+ characters). You can use any filler text or a complex system prompt.

# Create a 2000-character dummy system prompt
SYS_PROMPT=$(printf "System instruction. %.0s" {1..100})

# Send request to native API
curl -s http://localhost:11434/api/chat -d '{
  "model": "gemma4:26b",
  "messages": [
    {"role": "system", "content": "'"$SYS_PROMPT"'"},
    {"role": "user", "content": "Who won the 2025 NCAA mens basketball championship?"}
  ],
  "stream": false
}' | jq .

Result:

{
  "model": "gemma4:26b",
  "created_at": "2026-04-08T19:59:20.718104Z",
  "message": {
    "role": "assistant",
    "content": ""
  },
  "done": true,
  "done_reason": "stop",
  "total_duration": 1265782584,
  "load_duration": 137602292,
  "prompt_eval_count": 1423,
  "prompt_eval_duration": 192706834,
  "eval_count": 49,
  "eval_duration": 911844667
}

Notice eval_count is 49 and content is "".

What I have ruled out

Modelfile/Parameters: I tested identical blobs with varied Modelfiles (removing RENDERER, PARSER, tweaking num_ctx, temperature, top_k, etc.). The bug persists across all variants.
Ollama Version: Reproduced on 0.20.0 and 0.20.3.
Prompt Content: The bug triggers even with purely all-ASCII junk (Lorem Ipsum) if it reaches the length threshold.
Dense Models: gemma4:31b and gemma4:e4b handle the exact same prompt correctly. This bug is isolated to the MoE architecture or its specific quantization in the library.

Environment

Ollama version: 0.20.3
Model: gemma4:26b (SHA: 7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df)
OS: macOS 15 (Apple Silicon Mac Studio)
Configuration: flash_attention: false, num_parallel: 1

extent analysis

TL;DR

The issue can be mitigated by shortening the system prompt to under 500 characters or exploring model configuration adjustments.

Guidance

Verify that the issue is specific to the MoE architecture by testing other models with similar configurations.
Experiment with adjusting the num_ctx parameter to increase the context window, potentially allowing the model to process longer prompts.
Consider preprocessing the system prompt to reduce its length while preserving essential information.
Investigate if there are any model-specific limitations or optimizations that can be applied to handle longer prompts.

Example

No code snippet is provided as the issue seems to be related to model configuration and prompt length rather than a specific code implementation.

Notes

The issue appears to be isolated to the gemma4:26b model and its MoE architecture. The fact that dense models like gemma4:31b handle long prompts correctly suggests that the problem might be specific to the model's quantization or configuration.

Recommendation

Apply workaround: Shorten the system prompt or adjust model configuration to mitigate the issue, as the root cause seems to be related to the model's architecture and prompt length handling.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #embedding generation #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix gemma4:26b (MoE) returns completely empty response (no content, no reasoning) and stops early on long system prompts [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

gemma4:26b (MoE) returns completely empty response (no content, no reasoning) and stops early on long system prompts

What is the issue?

Steps to reproduce

What I have ruled out

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix gemma4:26b (MoE) returns completely empty response (no content, no reasoning) and stops early on long system prompts [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

gemma4:26b (MoE) returns completely empty response (no content, no reasoning) and stops early on long system prompts

What is the issue?

Steps to reproduce

What I have ruled out

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING