ollama - 💡(How to fix) Fix [0.17.5][macOS Apple Silicon] model runner unexpectedly stopped (EOF/exit status 2) on /api/generate [8 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14611Fetched 2026-04-08 00:33:52
View on GitHub
Comments
8
Participants
3
Timeline
8
Reactions
0
Author
Timeline (top)
commented ×8

Error Message

  • {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error"} {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"} time=2026-03-04T17:10:17.873+09:00 level=ERROR source=server.go:1611 msg="post predict" error="Post "http://127.0.0.1:53947/completion\": EOF" time=2026-03-04T17:10:17.873+09:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2" /api/generate should return a normal completion or a stable, actionable error without runner termination.

Code Example

pkill -9 -f 'ollama runner' || true
   pkill -9 -f '/usr/local/bin/ollama serve' || true
   pkill -9 -f '/opt/homebrew/bin/ollama serve' || true
   ollama serve

---

curl -sS http://127.0.0.1:11434/api/version

---

curl -sS http://127.0.0.1:11434/api/generate -d '{
     "model":"gpt-oss:20b",
     "prompt":"hi",
     "stream":false,
     "options":{"num_ctx":512,"num_predict":8}
   }'

---

{"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"}

---

time=2026-03-04T17:10:17.873+09:00 level=ERROR source=server.go:1611 msg="post predict" error="Post \"http://127.0.0.1:53947/completion\": EOF"
[GIN] 2026/03/04 - 17:10:17 | 500 |  6.983919458s |       127.0.0.1 | POST     "/api/generate"
time=2026-03-04T17:10:17.873+09:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2"
RAW_BUFFERClick to expand / collapse

What happened

On macOS (Apple Silicon), Ollama API health endpoints are reachable, but inference calls intermittently fail with:

  • {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error"}
  • HTTP 500 from /api/generate
  • server log shows post predict ... EOF and llama runner terminated: exit status 2

This happened repeatedly during local automation workloads.

Environment

  • Ollama: 0.17.5
  • OS: macOS 26.2 (arm64)
  • Hardware: Apple M4 Pro, 64GB RAM
  • API endpoint: http://127.0.0.1:11434
  • Example model: gpt-oss:20b

Reproduction

  1. Start a clean server:
    pkill -9 -f 'ollama runner' || true
    pkill -9 -f '/usr/local/bin/ollama serve' || true
    pkill -9 -f '/opt/homebrew/bin/ollama serve' || true
    ollama serve
  2. Confirm API health:
    curl -sS http://127.0.0.1:11434/api/version
  3. Call generate:
    curl -sS http://127.0.0.1:11434/api/generate -d '{
      "model":"gpt-oss:20b",
      "prompt":"hi",
      "stream":false,
      "options":{"num_ctx":512,"num_predict":8}
    }'
  4. Sometimes response is:
    {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"}

Relevant logs

From runner/server logs around failure:

time=2026-03-04T17:10:17.873+09:00 level=ERROR source=server.go:1611 msg="post predict" error="Post \"http://127.0.0.1:53947/completion\": EOF"
[GIN] 2026/03/04 - 17:10:17 | 500 |  6.983919458s |       127.0.0.1 | POST     "/api/generate"
time=2026-03-04T17:10:17.873+09:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2"

There is also a native crash dump section with register dump in the same failure window.

Expected behavior

/api/generate should return a normal completion or a stable, actionable error without runner termination.

Notes

  • This still occurred after restarting Ollama and rebooting macOS.
  • We also reproduced with conservative options (num_ctx low, num_predict low).

extent analysis

Fix Plan

To address the intermittent failure of inference calls, we will focus on adjusting the resource allocation and handling of the Ollama API. The steps below aim to stabilize the /api/generate endpoint.

  1. Increase Resource Limits:

    • Edit the ollama configuration to allocate more resources to the model runner.
    • Specifically, increase the max_memory and max_cpu settings in your ollama.yaml or equivalent configuration file.
  2. Implement Retry Mechanism:

    • Modify the client-side code to implement a retry mechanism with exponential backoff for handling temporary failures.
    • Example Python code using requests and tenacity libraries:
      import requests
      from tenacity import retry, stop_after_attempt, wait_exponential
      
      @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
      def generate_text(prompt, model="gpt-oss:20b"):
          url = "http://127.0.0.1:11434/api/generate"
          data = {
              "model": model,
              "prompt": prompt,
              "stream": False,
              "options": {"num_ctx": 512, "num_predict": 8}
          }
          response = requests.post(url, json=data)
          response.raise_for_status()
          return response.json()
      
      # Example usage
      try:
          result = generate_text("Hello, world!")
          print(result)
      except requests.RequestException as e:
          print(f"Request failed: {e}")
  3. Monitor Server Logs:

    • Regularly inspect server logs for patterns or specific error messages that could indicate underlying issues.
    • Adjust logging levels or add custom logging to capture more detailed information around failures.
  4. Update Ollama Version:

    • If possible, update Ollama to the latest version to ensure any known issues related to resource management or stability have been addressed.

Verification

  • After implementing these changes, repeat the reproduction steps to verify that the /api/generate endpoint no longer intermittently fails.
  • Monitor server logs for any error messages related to resource limitations or internal errors.

Extra Tips

  • Regularly review and adjust resource allocations based on workload demands to prevent similar issues.
  • Consider implementing health checks for the model runner to proactively detect and restart it if necessary.
  • For long-term stability, explore options for horizontal scaling or load balancing to distribute the workload across multiple instances.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

/api/generate should return a normal completion or a stable, actionable error without runner termination.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING