ollama - 💡(How to fix) Fix Ollama Cloud Pro: 95% failure rate across all cloud models — service is unusable

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    curl -s --max-time 20 http://localhost:11434/api/chat \
      -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}" \
      | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{'"$i"'} OK {d[\"total_duration\"]/1e9:.1f}s')" 2>/dev/null \
      || echo "#$i FAIL (empty/timeout)"
  done
done
RAW_BUFFERClick to expand / collapse

Environment

  • Plan: Ollama Pro ($20/month, subscribed 2026-04-09)
  • OS: macOS (Darwin 25.4.0)
  • Ollama version: Latest (via brew)
  • Connection: Stable internet, 0% packet loss to ollama.com
  • Models tested: glm-5.1:cloud, kimi-k2.5:cloud, qwen3.5:cloud, deepseek-v3.2:cloud

Problem

Ollama Cloud is effectively unusable. Both /api/chat and /api/generate endpoints return empty responses or timeout for all cloud models. This is not model-specific — every single cloud model exhibits the same behavior.

Reproduction

Simple test — 5 sequential requests per model, 20-second timeout:

for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    curl -s --max-time 20 http://localhost:11434/api/chat \
      -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}" \
      | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{'"$i"'} OK {d[\"total_duration\"]/1e9:.1f}s')" 2>/dev/null \
      || echo "#$i FAIL (empty/timeout)"
  done
done

Results (2026-04-09, ~21:00 UTC+8)

ModelSuccess RateNotes
glm-5.1:cloud0/5All empty/timeout
kimi-k2.5:cloud1/51 success (2.6s), 4 failures
qwen3.5:cloud0/5All empty/timeout
deepseek-v3.2:cloud0/5All empty/timeout
Total1/20 (5%)

Earlier in the day, glm-5.1:cloud worked intermittently (2/3 success), so this appears to be a degrading situation.

Both endpoints affected

Tested /api/generate as well — same 0/5 failure rate for glm-5.1:cloud. This rules out a /api/chat-specific bug.

Expected behavior

As a paying Pro subscriber ($20/month), I expect a reasonable success rate (>95%) for cloud model inference. A 5% success rate is not a degraded service — it is a broken service.

What I've ruled out

  • ✅ Local Ollama service is running (localhost:11434 responds, ollama list shows all cloud models)
  • ✅ Network is stable (non-cloud local models work fine)
  • ✅ Not a single-model issue (all 4 cloud models fail)
  • ✅ Not an endpoint issue (/api/chat and /api/generate both fail)
  • ✅ Tested with minimal payloads ("hi") — not a token limit issue

Related issues

This aligns with multiple existing reports:

  • #15419 — Frequent 503 errors on cloud models (2026-04-08, 7+ confirmations)
  • #14673 — 29.7% failure rate documented, support tickets ignored 2+ weeks
  • #15290 — EOF errors and socket closures on cloud models

Requests

  1. Acknowledge the outage — There is no status page, no incident communication, and no response on existing issues
  2. Provide a status page for Ollama Cloud service health
  3. Add Retry-After headers on 503/502 responses so clients can implement proper backoff
  4. Consider pro-rating or extending subscriptions for periods of sustained outage — charging $20/month for a 5% success rate is not acceptable

extent analysis

TL;DR

The issue can be mitigated by implementing a retry mechanism with exponential backoff for the affected Ollama Cloud API endpoints.

Guidance

  • Verify the issue by running the provided reproduction script to confirm the empty responses or timeouts for all cloud models.
  • Implement a retry mechanism with exponential backoff for the /api/chat and /api/generate endpoints to handle temporary failures.
  • Consider adding a circuit breaker pattern to detect and prevent further requests when the service is deemed unavailable.
  • Monitor the Ollama Cloud service health and adjust the retry strategy accordingly.

Example

# Example retry script using curl
for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    retry_count=0
    max_retries=3
    while [ $retry_count -lt $max_retries ]; do
      response=$(curl -s --max-time 20 http://localhost:11434/api/chat \
        -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}")
      if [ -n "$response" ]; then
        echo "#$i OK"
        break
      else
        retry_count=$((retry_count + 1))
        sleep $((2 ** retry_count))
      fi
    done
    if [ $retry_count -eq $max_retries ]; then
      echo "#$i FAIL (empty/timeout)"
    fi
  done
done

Notes

The provided example is a basic retry script and may need to be adapted to the specific use case. The retry mechanism should be adjusted based on the observed failure rates and the desired level of resilience.

Recommendation

Apply a workaround by implementing a retry mechanism with exponential backoff, as the root cause of the issue appears to be related to the Ollama Cloud service health, which is outside of the user's control. This will help mitigate the effects of the outage until the service is restored or a more permanent fix is implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

As a paying Pro subscriber ($20/month), I expect a reasonable success rate (>95%) for cloud model inference. A 5% success rate is not a degraded service — it is a broken service.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING