ollama - ✅(Solved) Fix panic: failed to sample token [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14718Fetched 2026-04-08 00:32:39
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1referenced ×1

Error Message

Interestingly, no problems are encountered on an almost identical MSI Laptop but the present one gives below error. Mär 08 11:28:25 nixos ollama[1288]: time=2026-03-08T11:28:25.321+01:00 level=ERROR source=server.go:1539 msg="post predict" error="Post "http://127.0.0.1:35495/completion\": EOF"

Fix Action

Fixed

PR fix notes

PR #14773: runner: replace panics with graceful error handling in sample/decode

Description (problem / solution / changelog)

Summary

Fixes #14718

Replace panic("failed to sample token") and panic("failed to decode token") in runner/ollamarunner/runner.go with graceful error handling that terminates only the failing sequence instead of crashing the entire runner process.

Problem

When seq.sampler.Sample(logits) or Decode([]int32{token}) returns an error, the current code calls panic(...), which kills the entire runner process — including all other active sequences being served concurrently. This is unnecessarily destructive since the error is scoped to a single sequence.

Solution

  • Log the error with slog.Error including the sequence ID and error details
  • Call s.removeSequence(i, llm.DoneReasonError) to cleanly terminate just the failing sequence
  • continue to process remaining sequences in the batch

This matches the existing error handling pattern used throughout the same function, e.g.:

// EOS handling (line 780)
s.removeSequence(i, llm.DoneReasonStop)
continue

// Length limit (line 799)
s.removeSequence(i, llm.DoneReasonLength)
continue

// Connection closed (line 852)
s.removeSequence(i, llm.DoneReasonConnectionClosed)
continue

A new DoneReasonError constant is added to llm.DoneReason to distinguish error terminations from normal stop reasons.

Test plan

  • Verify go vet ./llm/... passes (confirmed locally)
  • Verify existing tests pass
  • Manual: trigger a sample error and confirm the runner continues serving other sequences

🤖 Generated with Claude Code

Changed files

  • llm/server.go (modified, +4/-0)
  • runner/ollamarunner/runner.go (modified, +6/-2)

Code Example

sudo systemctl status ollama
● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-03-08 11:24:06 CET; 5min ago
 Invocation: 5a47e212b4e04c15ab6df791437be417
   Main PID: 1288 (.ollama-wrapped)
         IP: 37.4K in, 181.4K out
         IO: 3.4G read, 7.1M written
      Tasks: 22 (limit: 18392)
     Memory: 3.5G (peak: 4.7G)
        CPU: 31.095s
     CGroup: /system.slice/ollama.service
             └─1288 /nix/store/mf0nd1azczdzqkmihjllagcfq51ayi4l-ollama-0.12.11/bin/ollama serve

Mär 08 11:28:15 nixos ollama[1288]: time=2026-03-08T11:28:15.194+01:00 level=INFO source=server.go:1332 msg="llama runner started in 11.99 seconds"
Mär 08 11:28:25 nixos ollama[1288]: panic: failed to sample token
Mär 08 11:28:25 nixos ollama[1288]: goroutine 937 [running]:
Mär 08 11:28:25 nixos ollama[1288]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000240f00, {0x0, {0x63a2b0, 0xc00014a280}, {0x646b68, 0xc00157ef48}, {0xc000714380, 0xd, 0x10}, {{0x646b68, ...}, ...}, ...})
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:763 +0x1aa7
Mär 08 11:28:25 nixos ollama[1288]: created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 51
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
Mär 08 11:28:25 nixos ollama[1288]: time=2026-03-08T11:28:25.321+01:00 level=ERROR source=server.go:1539 msg="post predict" error="Post \"http://127.0.0.1:35495/completion\": EOF"
RAW_BUFFERClick to expand / collapse

What is the issue?

Interestingly, no problems are encountered on an almost identical MSI Laptop but the present one gives below error.

Relevant log output

sudo systemctl status ollama
● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-03-08 11:24:06 CET; 5min ago
 Invocation: 5a47e212b4e04c15ab6df791437be417
   Main PID: 1288 (.ollama-wrapped)
         IP: 37.4K in, 181.4K out
         IO: 3.4G read, 7.1M written
      Tasks: 22 (limit: 18392)
     Memory: 3.5G (peak: 4.7G)
        CPU: 31.095s
     CGroup: /system.slice/ollama.service
             └─1288 /nix/store/mf0nd1azczdzqkmihjllagcfq51ayi4l-ollama-0.12.11/bin/ollama serve

Mär 08 11:28:15 nixos ollama[1288]: time=2026-03-08T11:28:15.194+01:00 level=INFO source=server.go:1332 msg="llama runner started in 11.99 seconds"
Mär 08 11:28:25 nixos ollama[1288]: panic: failed to sample token
Mär 08 11:28:25 nixos ollama[1288]: goroutine 937 [running]:
Mär 08 11:28:25 nixos ollama[1288]: github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000240f00, {0x0, {0x63a2b0, 0xc00014a280}, {0x646b68, 0xc00157ef48}, {0xc000714380, 0xd, 0x10}, {{0x646b68, ...}, ...}, ...})
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:763 +0x1aa7
Mär 08 11:28:25 nixos ollama[1288]: created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 51
Mär 08 11:28:25 nixos ollama[1288]:         github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
Mär 08 11:28:25 nixos ollama[1288]: time=2026-03-08T11:28:25.321+01:00 level=ERROR source=server.go:1539 msg="post predict" error="Post \"http://127.0.0.1:35495/completion\": EOF"

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.12.11

extent analysis

Fix Plan

To fix the issue, we need to update the ollama configuration to handle the EOF error when sampling tokens. Here are the steps:

  • Update the ollama service to use a newer version that includes the fix for the EOF error.
  • Add the following configuration to the ollama service file:
Environment="OLLAMA_SAMPLE_TOKEN_TIMEOUT=30s"

This sets the timeout for sampling tokens to 30 seconds.

  • Restart the ollama service:
sudo systemctl restart ollama
  • If the issue persists, try increasing the OLLAMA_SAMPLE_TOKEN_TIMEOUT value.

Verification

To verify that the fix worked, check the ollama service logs for any errors:

sudo journalctl -u ollama

If there are no errors, try running a test query to see if the service is responding correctly.

Extra Tips

  • Make sure to check the ollama documentation for any known issues or configuration options that may help resolve the problem.
  • If you are using a GPU, try disabling it to see if the issue is related to the GPU.
  • Consider upgrading to a newer version of ollama to ensure you have the latest fixes and features.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING