ollama - 💡(How to fix) Fix ## Bug? temperature=0 produces different output on first run vs subsequent runs

Code Example

import urllib.request, json

def query(prompt, url="http://localhost:11434/api/generate"):
    data = json.dumps({
        "model": "qwen2.5:7b",
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0}
    }).encode()
    req = urllib.request.Request(url, data=data)
    return json.loads(urllib.request.urlopen(req).read())

prompt = "What is a logit in machine learning?"
for i in range(3):
    r = query(prompt)
    print(f"Epoch {i+1}: {r['response'][:80]}")

---

position_2 (3rd token):
  Epoch 1:  top_1=" machine" [-0.046964]   chosen=" the" (rank 2)
  Epoch 2:  top_1=" machine" [-0.045541]   chosen=" the" (rank 2)
  Epoch 3:  top_1=" machine" [-0.045541]   ← identical to epoch 2

position_11 (12th token):
  Epoch 1:  top_1=" in"  [-0.200212]   chosen=" **" (rank 2)
  Epoch 2:  top_1=" log" [-0.118924]   chosen=" **" (rank 2)
  Epoch 3:  top_1=" log" [-0.118924]   ← identical to epoch 2

---

func (s *Sampler) sample(tokens []token) (token, error) {
    if s.temperature == 0 {
        return greedy(tokens), nil
    }
    // ...
}

---

func greedy(tokens []token) token {
    max := tokens[0]
    for i := 1; i < len(tokens); i++ {
        if tokens[i].value > max.value {
            max = tokens[i]
        }
    }
    return max
}

---

qa_report_readable.csv has the multiple epochs across the temperature, the bug is seen in the first three epochs.

What is the issue?

When running the same prompt with temperature=0 (greedy decoding), the first inference call after model load produces a different response compared to all subsequent calls. From the second call onward the output is fully stable and identical. I'm not sure if this is intentional or a bug - asking before submitting a fix.

To reproduce

Run the same prompt 3+ times with temperature=0:

import urllib.request, json

def query(prompt, url="http://localhost:11434/api/generate"):
    data = json.dumps({
        "model": "qwen2.5:7b",
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0}
    }).encode()
    req = urllib.request.Request(url, data=data)
    return json.loads(urllib.request.urlopen(req).read())

prompt = "What is a logit in machine learning?"
for i in range(3):
    r = query(prompt)
    print(f"Epoch {i+1}: {r['response'][:80]}")

Observed responses

All three epochs use model: qwen2.5:7b, temperature: 0, top_k: 5, same prompt.

Epoch 1 (diverges):

"In the context of machine learning and statistics, particularly within logistic regression models, a logit (or log-odds) is a transformation that converts probabilities into values on the entire real number line..."

Epoch 2 (stable from here):

"In the context of machine learning and statistics, a logit function plays a crucial role, particularly in models that deal with binary classification problems. The term "logit" comes from the word "logistic," referring to the logistic function..."

Epoch 3 (identical to epoch 2):

"In the context of machine learning and statistics, a logit function plays a crucial role, particularly in models that deal with binary classification problems. The term "logit" comes from the word "logistic," referring to the logistic function..."

Epochs 2 and 3 are byte-for-byte identical. Epoch 1 diverges both in content and structure — it focuses on the logit as a log-odds transformation, while epochs 2/3 lead with the logistic/sigmoid function.

Evidence from logprobs

The divergence is also visible in top_logprobs. Even the raw logprob values differ between epoch 1 and epochs 2/3:

position_2 (3rd token):
  Epoch 1:  top_1=" machine" [-0.046964]   chosen=" the" (rank 2)
  Epoch 2:  top_1=" machine" [-0.045541]   chosen=" the" (rank 2)
  Epoch 3:  top_1=" machine" [-0.045541]   ← identical to epoch 2

position_11 (12th token):
  Epoch 1:  top_1=" in"  [-0.200212]   chosen=" **" (rank 2)
  Epoch 2:  top_1=" log" [-0.118924]   chosen=" **" (rank 2)
  Epoch 3:  top_1=" log" [-0.118924]   ← identical to epoch 2

The logprob values themselves differ between epoch 1 and epochs 2/3 — not just which token was chosen. This suggests a different internal model state on the first call, not just different sampling behaviour.

Why this seems like a bug

In sample/samplers.go, temperature=0 takes a hard branch to greedy():

func (s *Sampler) sample(tokens []token) (token, error) {
    if s.temperature == 0 {
        return greedy(tokens), nil
    }
    // ...
}

And greedy() is pure argmax — no randomness at all:

func greedy(tokens []token) token {
    max := tokens[0]
    for i := 1; i < len(tokens); i++ {
        if tokens[i].value > max.value {
            max = tokens[i]
        }
    }
    return max
}

Since both functions are fully deterministic, all runs should produce identical output. The first run should not be special.

qa_report_readable.csv

Relevant log output

qa_report_readable.csv has the multiple epochs across the temperature, the bug is seen in the first three epochs.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.1

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix ## Bug? temperature=0 produces different output on first run vs subsequent runs

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix ## Bug? temperature=0 produces different output on first run vs subsequent runs

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

RELATED_DISCOVERY

TRENDING