ollama - 💡(How to fix) Fix Ollama Cloud Pro: 95% failure rate across all cloud models

Code Example

for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    curl -s --max-time 20 http://localhost:11434/api/chat \
      -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}" \
      | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{'"$i"'} OK {d[\"total_duration\"]/1e9:.1f}s')" 2>/dev/null \
      || echo "#$i FAIL (empty/timeout)"
  done
done

Environment

Plan: Ollama Pro ($20/month, subscribed 2026-04-09)
OS: macOS (Darwin 25.4.0)
Ollama version: Latest (via brew)
Connection: Stable internet, 0% packet loss to ollama.com
Models tested: glm-5.1:cloud, kimi-k2.5:cloud, qwen3.5:cloud, deepseek-v3.2:cloud

Problem

Ollama Cloud is effectively unusable. Both /api/chat and /api/generate endpoints return empty responses or timeout for all cloud models. This is not model-specific — every single cloud model exhibits the same behavior.

Reproduction

Simple test — 5 sequential requests per model, 20-second timeout:

for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    curl -s --max-time 20 http://localhost:11434/api/chat \
      -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}" \
      | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'#{'"$i"'} OK {d[\"total_duration\"]/1e9:.1f}s')" 2>/dev/null \
      || echo "#$i FAIL (empty/timeout)"
  done
done

Results (2026-04-09, ~21:00 UTC+8)

Model	Success Rate	Notes
`glm-5.1:cloud`	0/5	All empty/timeout
`kimi-k2.5:cloud`	1/5	1 success (2.6s), 4 failures
`qwen3.5:cloud`	0/5	All empty/timeout
`deepseek-v3.2:cloud`	0/5	All empty/timeout
Total	1/20 (5%)

Earlier in the day, glm-5.1:cloud worked intermittently (2/3 success), so this appears to be a degrading situation.

Both endpoints affected

Tested /api/generate as well — same 0/5 failure rate for glm-5.1:cloud. This rules out a /api/chat-specific bug.

Expected behavior

As a paying Pro subscriber ($20/month), I expect a reasonable success rate (>95%) for cloud model inference. A 5% success rate is not a degraded service — it is a broken service.

What I've ruled out

✅ Local Ollama service is running (localhost:11434 responds, ollama list shows all cloud models)
✅ Network is stable (non-cloud local models work fine)
✅ Not a single-model issue (all 4 cloud models fail)
✅ Not an endpoint issue (/api/chat and /api/generate both fail)
✅ Tested with minimal payloads ("hi") — not a token limit issue

Related issues

This aligns with multiple existing reports:

#15419 — Frequent 503 errors on cloud models (2026-04-08, 7+ confirmations)
#14673 — 29.7% failure rate documented, support tickets ignored 2+ weeks
#15290 — EOF errors and socket closures on cloud models

Requests

Acknowledge the outage — There is no status page, no incident communication, and no response on existing issues
Provide a status page for Ollama Cloud service health
Add Retry-After headers on 503/502 responses so clients can implement proper backoff
Consider pro-rating or extending subscriptions for periods of sustained outage — charging $20/month for a 5% success rate is not acceptable

extent analysis

TL;DR

The issue can be mitigated by implementing a retry mechanism with exponential backoff for the affected Ollama Cloud API endpoints.

Guidance

Verify the issue by running the provided reproduction script to confirm the empty responses or timeouts for all cloud models.
Implement a retry mechanism with exponential backoff for the /api/chat and /api/generate endpoints to handle temporary failures.
Consider adding a circuit breaker pattern to detect and prevent further requests when the service is deemed unavailable.
Monitor the Ollama Cloud service health and adjust the retry strategy accordingly.

Example

# Example retry script using curl
for model in glm-5.1:cloud kimi-k2.5:cloud qwen3.5:cloud deepseek-v3.2:cloud; do
  echo "=== $model ==="
  for i in 1 2 3 4 5; do
    retry_count=0
    max_retries=3
    while [ $retry_count -lt $max_retries ]; do
      response=$(curl -s --max-time 20 http://localhost:11434/api/chat \
        -d "{\"model\":\"$model\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":false}")
      if [ -n "$response" ]; then
        echo "#$i OK"
        break
      else
        retry_count=$((retry_count + 1))
        sleep $((2 ** retry_count))
      fi
    done
    if [ $retry_count -eq $max_retries ]; then
      echo "#$i FAIL (empty/timeout)"
    fi
  done
done

Notes

The provided example is a basic retry script and may need to be adapted to the specific use case. The retry mechanism should be adjusted based on the observed failure rates and the desired level of resilience.

Recommendation

Apply a workaround by implementing a retry mechanism with exponential backoff, as the root cause of the issue appears to be related to the Ollama Cloud service health, which is outside of the user's control. This will help mitigate the effects of the outage until the service is restored or a more permanent fix is implemented.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Ollama Cloud Pro: 95% failure rate across all cloud models — service is unusable

Recommended Tools

GitHub issue graph ai analysis

Code Example

Environment

Problem

Reproduction

Results (2026-04-09, ~21:00 UTC+8)

Both endpoints affected

Expected behavior

What I've ruled out

Related issues

Requests

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Ollama Cloud Pro: 95% failure rate across all cloud models — service is unusable

Recommended Tools

GitHub issue graph ai analysis

Code Example

Environment

Problem

Reproduction

Results (2026-04-09, ~21:00 UTC+8)

Both endpoints affected

Expected behavior

What I've ruled out

Related issues

Requests

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING