ollama - 💡(How to fix) Fix ## Bug? temperature=0 produces different output on first run vs subsequent runs

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

func (s *Sampler) sample(tokens []token) (token, error) {

Code Example

import urllib.request, json

def query(prompt, url="http://localhost:11434/api/generate"):
    data = json.dumps({
        "model": "qwen2.5:7b",
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0}
    }).encode()
    req = urllib.request.Request(url, data=data)
    return json.loads(urllib.request.urlopen(req).read())

prompt = "What is a logit in machine learning?"
for i in range(3):
    r = query(prompt)
    print(f"Epoch {i+1}: {r['response'][:80]}")

---

position_2 (3rd token):
  Epoch 1:  top_1=" machine" [-0.046964]   chosen=" the" (rank 2)
  Epoch 2:  top_1=" machine" [-0.045541]   chosen=" the" (rank 2)
  Epoch 3:  top_1=" machine" [-0.045541]   ← identical to epoch 2

position_11 (12th token):
  Epoch 1:  top_1=" in"  [-0.200212]   chosen=" **" (rank 2)
  Epoch 2:  top_1=" log" [-0.118924]   chosen=" **" (rank 2)
  Epoch 3:  top_1=" log" [-0.118924]   ← identical to epoch 2

---

func (s *Sampler) sample(tokens []token) (token, error) {
    if s.temperature == 0 {
        return greedy(tokens), nil
    }
    // ...
}

---

func greedy(tokens []token) token {
    max := tokens[0]
    for i := 1; i < len(tokens); i++ {
        if tokens[i].value > max.value {
            max = tokens[i]
        }
    }
    return max
}

---

qa_report_readable.csv has the multiple epochs across the temperature, the bug is seen in the first three epochs.
RAW_BUFFERClick to expand / collapse

What is the issue?

When running the same prompt with temperature=0 (greedy decoding), the first inference call after model load produces a different response compared to all subsequent calls. From the second call onward the output is fully stable and identical. I'm not sure if this is intentional or a bug - asking before submitting a fix.


To reproduce

Run the same prompt 3+ times with temperature=0:

import urllib.request, json

def query(prompt, url="http://localhost:11434/api/generate"):
    data = json.dumps({
        "model": "qwen2.5:7b",
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0}
    }).encode()
    req = urllib.request.Request(url, data=data)
    return json.loads(urllib.request.urlopen(req).read())

prompt = "What is a logit in machine learning?"
for i in range(3):
    r = query(prompt)
    print(f"Epoch {i+1}: {r['response'][:80]}")

Observed responses

All three epochs use model: qwen2.5:7b, temperature: 0, top_k: 5, same prompt.

Epoch 1 (diverges):

"In the context of machine learning and statistics, particularly within logistic regression models, a logit (or log-odds) is a transformation that converts probabilities into values on the entire real number line..."

Epoch 2 (stable from here):

"In the context of machine learning and statistics, a logit function plays a crucial role, particularly in models that deal with binary classification problems. The term "logit" comes from the word "logistic," referring to the logistic function..."

Epoch 3 (identical to epoch 2):

"In the context of machine learning and statistics, a logit function plays a crucial role, particularly in models that deal with binary classification problems. The term "logit" comes from the word "logistic," referring to the logistic function..."

Epochs 2 and 3 are byte-for-byte identical. Epoch 1 diverges both in content and structure — it focuses on the logit as a log-odds transformation, while epochs 2/3 lead with the logistic/sigmoid function.


Evidence from logprobs

The divergence is also visible in top_logprobs. Even the raw logprob values differ between epoch 1 and epochs 2/3:

position_2 (3rd token):
  Epoch 1:  top_1=" machine" [-0.046964]   chosen=" the" (rank 2)
  Epoch 2:  top_1=" machine" [-0.045541]   chosen=" the" (rank 2)
  Epoch 3:  top_1=" machine" [-0.045541]   ← identical to epoch 2

position_11 (12th token):
  Epoch 1:  top_1=" in"  [-0.200212]   chosen=" **" (rank 2)
  Epoch 2:  top_1=" log" [-0.118924]   chosen=" **" (rank 2)
  Epoch 3:  top_1=" log" [-0.118924]   ← identical to epoch 2

The logprob values themselves differ between epoch 1 and epochs 2/3 — not just which token was chosen. This suggests a different internal model state on the first call, not just different sampling behaviour.


Why this seems like a bug

In sample/samplers.go, temperature=0 takes a hard branch to greedy():

func (s *Sampler) sample(tokens []token) (token, error) {
    if s.temperature == 0 {
        return greedy(tokens), nil
    }
    // ...
}

And greedy() is pure argmax — no randomness at all:

func greedy(tokens []token) token {
    max := tokens[0]
    for i := 1; i < len(tokens); i++ {
        if tokens[i].value > max.value {
            max = tokens[i]
        }
    }
    return max
}

Since both functions are fully deterministic, all runs should produce identical output. The first run should not be special.


qa_report_readable.csv

Relevant log output

qa_report_readable.csv has the multiple epochs across the temperature, the bug is seen in the first three epochs.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix ## Bug? temperature=0 produces different output on first run vs subsequent runs