ollama - ✅(Solved) Fix panic: failed to sample token [1 pull requests, 1 participants]

ollama2026-03-08 13:19:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14718•Fetched 2026-04-08 00:32:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ng0177

Participants

ng0177

Timeline (top)

cross-referenced ×1labeled ×1referenced ×1

Error Message

Interestingly, no problems are encountered on an almost identical MSI Laptop but the present one gives below error. Mär 08 11:28:25 nixos ollama[1288]: time=2026-03-08T11:28:25.321+01:00 level=ERROR source=server.go:1539 msg="post predict" error="Post "http://127.0.0.1:35495/completion\": EOF"

Fix Action

Fixed

Fixed by PR: runner: replace panics with graceful error handling in sample/decode (https://github.com/ollama/ollama/pull/14773)

PR fix notes

PR #14773: runner: replace panics with graceful error handling in sample/decode

Repository: ollama/ollama
Author: alvinttang
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/14773

Description (problem / solution / changelog)

Summary

Fixes #14718

Replace panic("failed to sample token") and panic("failed to decode token") in runner/ollamarunner/runner.go with graceful error handling that terminates only the failing sequence instead of crashing the entire runner process.

Problem

When seq.sampler.Sample(logits) or Decode([]int32{token}) returns an error, the current code calls panic(...), which kills the entire runner process — including all other active sequences being served concurrently. This is unnecessarily destructive since the error is scoped to a single sequence.

Solution

Log the error with slog.Error including the sequence ID and error details
Call s.removeSequence(i, llm.DoneReasonError) to cleanly terminate just the failing sequence
continue to process remaining sequences in the batch

This matches the existing error handling pattern used throughout the same function, e.g.:

// EOS handling (line 780)
s.removeSequence(i, llm.DoneReasonStop)
continue

// Length limit (line 799)
s.removeSequence(i, llm.DoneReasonLength)
continue

// Connection closed (line 852)
s.removeSequence(i, llm.DoneReasonConnectionClosed)
continue

A new DoneReasonError constant is added to llm.DoneReason to distinguish error terminations from normal stop reasons.

Test plan

Verify go vet ./llm/... passes (confirmed locally)
Verify existing tests pass
Manual: trigger a sample error and confirm the runner continues serving other sequences

🤖 Generated with Claude Code

Changed files

llm/server.go (modified, +4/-0)
runner/ollamarunner/runner.go (modified, +6/-2)

Code Example

sudo systemctl status ollama
● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-03-08 11:24:06 CET; 5min ago
 Invocation: 5a47e212b4e04c15ab6df791437be417
   Main PID: 1288 (.ollama-wrapped)
         IP: 37.4K in, 181.4K out
         IO: 3.4G read, 7.1M written
      Tasks: 22 (limit: 18392)
     Memory: 3.5G (peak: 4.7G)
        CPU: 31.095s
     CGroup: /system.slice/ollama.service
             └─1288 /nix/store/mf0nd1azczdzqkmihjllagcfq51ayi4l-ollama-0.12.11/bin/ollama serve

Mär 08 11:28:15 nixos ollama[1288]: time=2026-03-08T11:28:15.194+01:00 level=INFO source=server.go:1332 msg="llama runner started in 11.99 seconds"
Mär 08 11:28:25 nixos ollama[1288]: panic: failed to sample token
Mär 08 11:28:25 nixos ollama[1288]: goroutine 937 [running]:
Mär 08 11:28:25 nixos ollama[1288]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000240f00, {0x0, {0x63a2b0, 0xc00014a280}, {0x646b68, 0xc00157ef48}, {0xc000714380, 0xd, 0x10}, {{0x646b68, ...}, ...}, ...})
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:763 +0x1aa7
Mär 08 11:28:25 nixos ollama[1288]: created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 51
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
Mär 08 11:28:25 nixos ollama[1288]: time=2026-03-08T11:28:25.321+01:00 level=ERROR source=server.go:1539 msg="post predict" error="Post \"http://127.0.0.1:35495/completion\": EOF"

RAW_BUFFERClick to expand / collapse

What is the issue?

Interestingly, no problems are encountered on an almost identical MSI Laptop but the present one gives below error.

Relevant log output

sudo systemctl status ollama
● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-03-08 11:24:06 CET; 5min ago
 Invocation: 5a47e212b4e04c15ab6df791437be417
   Main PID: 1288 (.ollama-wrapped)
         IP: 37.4K in, 181.4K out
         IO: 3.4G read, 7.1M written
      Tasks: 22 (limit: 18392)
     Memory: 3.5G (peak: 4.7G)
        CPU: 31.095s
     CGroup: /system.slice/ollama.service
             └─1288 /nix/store/mf0nd1azczdzqkmihjllagcfq51ayi4l-ollama-0.12.11/bin/ollama serve

Mär 08 11:28:15 nixos ollama[1288]: time=2026-03-08T11:28:15.194+01:00 level=INFO source=server.go:1332 msg="llama runner started in 11.99 seconds"
Mär 08 11:28:25 nixos ollama[1288]: panic: failed to sample token
Mär 08 11:28:25 nixos ollama[1288]: goroutine 937 [running]:
Mär 08 11:28:25 nixos ollama[1288]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000240f00, {0x0, {0x63a2b0, 0xc00014a280}, {0x646b68, 0xc00157ef48}, {0xc000714380, 0xd, 0x10}, {{0x646b68, ...}, ...}, ...})
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:763 +0x1aa7
Mär 08 11:28:25 nixos ollama[1288]: created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 51
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
Mär 08 11:28:25 nixos ollama[1288]: time=2026-03-08T11:28:25.321+01:00 level=ERROR source=server.go:1539 msg="post predict" error="Post \"http://127.0.0.1:35495/completion\": EOF"

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.12.11

extent analysis

Fix Plan

To fix the issue, we need to update the ollama configuration to handle the EOF error when sampling tokens. Here are the steps:

Update the ollama service to use a newer version that includes the fix for the EOF error.
Add the following configuration to the ollama service file:

Environment="OLLAMA_SAMPLE_TOKEN_TIMEOUT=30s"

This sets the timeout for sampling tokens to 30 seconds.

Restart the ollama service:

sudo systemctl restart ollama

If the issue persists, try increasing the OLLAMA_SAMPLE_TOKEN_TIMEOUT value.

Verification

To verify that the fix worked, check the ollama service logs for any errors:

sudo journalctl -u ollama

If there are no errors, try running a test query to see if the service is responding correctly.

Extra Tips

Make sure to check the ollama documentation for any known issues or configuration options that may help resolve the problem.
If you are using a GPU, try disabling it to see if the issue is related to the GPU.
Consider upgrading to a newer version of ollama to ensure you have the latest fixes and features.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #dependency conflict #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix panic: failed to sample token [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #14773: runner: replace panics with graceful error handling in sample/decode

Description (problem / solution / changelog)

Summary

Problem

Solution

Test plan

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix panic: failed to sample token [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #14773: runner: replace panics with graceful error handling in sample/decode

Description (problem / solution / changelog)

Summary

Problem

Solution

Test plan

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING