ollama - 💡(How to fix) Fix Ollama Cloud: Frequent 503 errors making cloud models unreliable [6 comments, 7 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15419Fetched 2026-04-09 07:51:22
View on GitHub
Comments
6
Participants
7
Timeline
8
Reactions
2
Author
Timeline (top)
commented ×6labeled ×1subscribed ×1
RAW_BUFFERClick to expand / collapse

Hey Ollama team,

I've been running autonomous agents using Ollama Cloud models and running into frequent 503 Service Unavailable errors. Thought I'd report this since it's making the cloud models pretty unreliable for production use.

What's happening

Requests to cloud models randomly fail with:

This happens with:

  • glm-5:cloud
  • glm-5.1:cloud
  • minimax-m2.7:cloud
  • kimi-k2.5:cloud

How often

Multiple times per hour. Sometimes a single request works, the next one fails with 503, then it works again. It's intermittent but frequent enough to break agent workflows.

Impact

  • Agent requests timeout mid-conversation
  • Cron jobs fail unexpectedly
  • Users experience broken interactions in Telegram/Discord bots

What I've tried

  • Retry logic with exponential backoff
  • Falling back to different models
  • None of it really helps when the 503s are this frequent

Setup

  • Ollama running in Docker
  • Using the model suffix to hit ollama.com:443
  • Fairly standard setup otherwise

Would be helpful

  1. Some visibility into when/why 503s happen (rate limiting vs capacity issues?)
  2. A retry-after header so we can back off properly
  3. Maybe a status page for Ollama Cloud uptime?

Thanks for the great work on Ollama! Happy to provide more details if needed.

extent analysis

TL;DR

Implement a more robust retry mechanism with a longer backoff period and consider using a circuit breaker pattern to handle frequent 503 Service Unavailable errors.

Guidance

  • Investigate the possibility of rate limiting or capacity issues on the Ollama Cloud side, as the frequent 503 errors could be indicative of one of these problems.
  • Consider implementing a circuit breaker pattern to detect when the service is not responding and prevent further requests until it becomes available again.
  • Add logging to track the frequency and timing of 503 errors to better understand the issue and identify potential patterns.
  • Reach out to the Ollama team to request a status page for Ollama Cloud uptime and more visibility into when and why 503 errors occur.

Example

No specific code example can be provided without more information on the implementation details, but a basic circuit breaker pattern in Python might look like this:

import time
import requests

class CircuitBreaker:
    def __init__(self, timeout, threshold):
        self.timeout = timeout
        self.threshold = threshold
        self.failures = 0
        self.circuit_open = False

    def request(self, url):
        if self.circuit_open:
            if time.time() - self.timeout > self.threshold:
                self.circuit_open = False
            else:
                raise Exception("Circuit open")
        try:
            response = requests.get(url)
            response.raise_for_status()
            self.failures = 0
            return response
        except requests.exceptions.RequestException as e:
            self.failures += 1
            if self.failures >= self.threshold:
                self.circuit_open = True
            raise e

# Usage
breaker = CircuitBreaker(timeout=60, threshold=5)
try:
    response = breaker.request("https://ollama.com:443/model")
except Exception as e:
    print(f"Request failed: {e}")

Notes

The provided guidance is based on the assumption that the 503 errors are due to rate limiting or capacity issues on the Ollama Cloud side. However, without more information from the Ollama team, it's difficult to provide a more specific solution.

Recommendation

Apply a workaround by implementing a more robust retry mechanism and circuit breaker pattern, as the root cause of the issue is unclear and may require changes on the Ollama Cloud side.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING