ollama - 💡(How to fix) Fix Qwen2.5 14B models crash llama runner with exit status 2 on GTX TITAN X (Ubuntu, CUDA 13) while 7B works

ollama2026-05-11 17:38:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Error: 500 Internal Server Error: llama runner process has terminated: signal arrived during cgo execution The logs show llamarunner.(*Server).run ending with error="exit status 2" followed by "Load failed ... llama runner process has terminated: signal arrived during cgo execution". All of the following result in the same 500 error: Error shown to the client: Error: 500 Internal Server Error: llama runner process has terminated: signal arrived during cgo execution Retested 14B models. Same behavior and same error (exit status 2 / cgo signal). Qwen2.5 14B models (and Qwen2.5‑coder 14B variants) should load and serve like other models on this machine, or at least fail gracefully with a clear error (e.g., “not enough VRAM”, “model not supported”) rather than crashing the llama runner with exit status 2. Qwen2.5 14B models consistently crash the llama runner within ~3 seconds of load, without high GPU utilization, returning a 500 error to the client. time=2026-05-11T13:20:42.422-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2" time=2026-05-11T13:20:42.618-04:00 level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-6535211f6554fe87fd9f9d28539b2809db2edb58c7a5c2dd2c2e62564e6fdba6 error="llama runner process has terminated: signal arrived during cgo execution"

Root Cause

Confirmed no other process is listening on the Ollama port when starting models; the “port in use” messages I saw earlier were a side effect of the runner crashing, not the root cause.

Fix Action

Fix / Workaround

Is there a way to force a specific backend version (e.g., a stable CUDA 12 backend) or a recommended workaround until a fixed build is available?

Code Example

From journalctl -u ollama --no-pager | tail -n 100 during a failed qwen2.5:14b load:

text
...
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x7bca8c514a0, {0x5bb7feb83f50, 0x7bca8b8b4a0})
        github.com/ollama/ollama/runner/llamarunner/runner.go:360 +0x4b fp=0x7bca8a8dfb8 sp=0x7bca8a8dee8 pc=0x5bb7fcc118eb
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x1f fp=0x7bca8a8dfe0 sp=0x7bca8a8dfb8 pc=0x5bb7fcc16d9f
runtime.goexit({})
        runtime/asm_amd64.s:1771 +0x1 fp=0x7bca8a8dfe8 sp=0x7bca8a8dfe0 pc=0x5bb7fc67e501
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5
...
time=2026-05-11T13:20:42.368-04:00 level=INFO source=server.go:1428 msg="waiting for server to become available" status="llm server not responding"
time=2026-05-11T13:20:42.422-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
time=2026-05-11T13:20:42.618-04:00 level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-6535211f6554fe87fd9f9d28539b2809db2edb58c7a5c2dd2c2e62564e6fdba6 error="llama runner process has terminated: signal arrived during cgo execution"
[GIN] 2026/05/11 - 13:20:42 | 500 |  2.901848837s |       127.0.0.1 | POST     "/api/generate"

RAW_BUFFERClick to expand / collapse

What is the issue?

On my Ubuntu machine with an NVIDIA GeForce GTX TITAN X (12 GB VRAM) and a modern CUDA 13 driver, all Qwen2.5 14B models fail to load and immediately crash the llama runner with:

Error: 500 Internal Server Error: llama runner process has terminated: signal arrived during cgo execution

The same machine runs qwen2.5-coder:7b without issues.

The 14B models never appear in ollama ps.

nvidia-smi shows GPU memory only reaching ~1 GB before the crash (so they are not fully loading onto GPU).

The logs show llamarunner.(*Server).run ending with error="exit status 2" followed by "Load failed ... llama runner process has terminated: signal arrived during cgo execution".

This looks like a llama runner/backend bug for Qwen2.5 14B on this environment, not a resource limit.

Ollama version is 0.23.2

Environment

OS: Ubuntu (X99 platform, Intel i7-5930K, 32 GB RAM)

GPU: NVIDIA GeForce GTX TITAN X (Maxwell, 12 GB VRAM)

nvidia-smi header:

text +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.142 Driver Version: 580.142 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX TITAN X Off | 00000000:01:00.0 On | N/A | | 22% 47C P5 37W / 250W | 1009MiB / 12288MiB | 0% Default | +-----------------------------------------+------------------------+----------------------+ Installation method: official curl https://ollama.com/install.sh | sh script

Binary path: /usr/local/bin/ollama

Backends path: /usr/local/lib/ollama/ (initially contained cuda_v13; I also tested with that directory renamed to force fallback)

Models/commands that work

bash ollama run qwen2.5-coder:7b ollama run llama3:8b Works normally, responds to prompts.

Uses GPU as expected.

Models/commands that fail

All of the following result in the same 500 error:

bash ollama run qwen2.5-coder:14b ollama run qwen2.5:14b ollama run qwen2.5-coder:14b-instruct-q4_0

Error shown to the client:

text Error: 500 Internal Server Error: llama runner process has terminated: signal arrived during cgo execution Behavior:

Model never shows up in ollama ps.

nvidia-smi shows VRAM usage peaking around ~1 GB then dropping back, with no high GPU utilization.

So the crash happens during startup / load, not during heavy inference.

What I already tried

Verified GPU/driver support

GTX TITAN X (Maxwell) with driver 580.142, CUDA 13.0 — should be supported for 12–14B models at Q4.

Disabled CUDA 13 backend to force fallback

Found CUDA 13 backend at /usr/local/lib/ollama/cuda_v13.

Stopped service, renamed it:

bash sudo systemctl stop ollama sudo mv /usr/local/lib/ollama/cuda_v13 /usr/local/lib/ollama/cuda_v13.disabled sudo systemctl start ollama Retested 14B models. Same behavior and same error (exit status 2 / cgo signal).

Clean reinstall of Ollama

Stopped service, removed /usr/local/lib/ollama and /usr/local/bin/ollama, reinstalled via official script, restarted service, re‑downloaded models.

7B still works. 14B variants still fail exactly as before.

Port conflict checks

Confirmed no other process is listening on the Ollama port when starting models; the “port in use” messages I saw earlier were a side effect of the runner crashing, not the root cause.

I see similar traces for other Qwen2.5 14B variants (including qwen2.5-coder:14b-instruct-q4_0) with different blob hashes, but the same exit status 2 and “signal arrived during cgo execution.”

Expected behavior

Qwen2.5 14B models (and Qwen2.5‑coder 14B variants) should load and serve like other models on this machine, or at least fail gracefully with a clear error (e.g., “not enough VRAM”, “model not supported”) rather than crashing the llama runner with exit status 2.

Actual behavior

Qwen2.5 14B models consistently crash the llama runner within ~3 seconds of load, without high GPU utilization, returning a 500 error to the client.

qwen2.5-coder:7b and other smaller models work fine on the same environment.

Questions

Is this a known issue with Qwen2.5 14B on CUDA 13 / TITAN X / current Ollama builds?

Is there a way to force a specific backend version (e.g., a stable CUDA 12 backend) or a recommended workaround until a fixed build is available?

Happy to test any debug builds or provide more log detail if needed.

Relevant log output

From journalctl -u ollama --no-pager | tail -n 100 during a failed qwen2.5:14b load:

text
...
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0x7bca8c514a0, {0x5bb7feb83f50, 0x7bca8b8b4a0})
        github.com/ollama/ollama/runner/llamarunner/runner.go:360 +0x4b fp=0x7bca8a8dfb8 sp=0x7bca8a8dee8 pc=0x5bb7fcc118eb
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x1f fp=0x7bca8a8dfe0 sp=0x7bca8a8dfb8 pc=0x5bb7fcc16d9f
runtime.goexit({})
        runtime/asm_amd64.s:1771 +0x1 fp=0x7bca8a8dfe8 sp=0x7bca8a8dfe0 pc=0x5bb7fc67e501
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5
...
time=2026-05-11T13:20:42.368-04:00 level=INFO source=server.go:1428 msg="waiting for server to become available" status="llm server not responding"
time=2026-05-11T13:20:42.422-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
time=2026-05-11T13:20:42.618-04:00 level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-6535211f6554fe87fd9f9d28539b2809db2edb58c7a5c2dd2c2e62564e6fdba6 error="llama runner process has terminated: signal arrived during cgo execution"
[GIN] 2026/05/11 - 13:20:42 | 500 |  2.901848837s |       127.0.0.1 | POST     "/api/generate"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.23.2

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Qwen2.5 14B models crash llama runner with exit status 2 on GTX TITAN X (Ubuntu, CUDA 13) while 7B works

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Qwen2.5 14B models crash llama runner with exit status 2 on GTX TITAN X (Ubuntu, CUDA 13) while 7B works

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

RELATED_DISCOVERY

TRENDING