ollama - 💡(How to fix) Fix [MEDIUM] No rate limiting on API endpoints - resource exhaustion risk

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

None of the API endpoints have rate limiting. The gin router only has CORS and host middleware:

r.Use(
    cors.New(corsConfig),
    allowedHostsMiddleware(s.addr),
)
// No rate limiting middleware

Endpoints affected:

  • POST /api/generate -- inference requests (most expensive)
  • POST /api/chat -- chat completions
  • POST /api/embed / POST /api/embeddings -- embedding generation
  • POST /api/pull -- model downloads (bandwidth)
  • POST /api/create -- model creation (CPU/disk intensive)

Root Cause

None of the API endpoints have rate limiting. The gin router only has CORS and host middleware:

r.Use(
    cors.New(corsConfig),
    allowedHostsMiddleware(s.addr),
)
// No rate limiting middleware

Endpoints affected:

  • POST /api/generate -- inference requests (most expensive)
  • POST /api/chat -- chat completions
  • POST /api/embed / POST /api/embeddings -- embedding generation
  • POST /api/pull -- model downloads (bandwidth)
  • POST /api/create -- model creation (CPU/disk intensive)

Code Example

r.Use(
    cors.New(corsConfig),
    allowedHostsMiddleware(s.addr),
)
// No rate limiting middleware
RAW_BUFFERClick to expand / collapse

Severity: MEDIUM -- CVSS 5.3

Location: server/routes.go (GenerateRoutes), all inference handlers Category: Missing Rate Limiting / Resource Exhaustion Confidence: Certain CWE: CWE-770 MITRE ATT&CK: T1499.004

Description

None of the API endpoints have rate limiting. The gin router only has CORS and host middleware:

r.Use(
    cors.New(corsConfig),
    allowedHostsMiddleware(s.addr),
)
// No rate limiting middleware

Endpoints affected:

  • POST /api/generate -- inference requests (most expensive)
  • POST /api/chat -- chat completions
  • POST /api/embed / POST /api/embeddings -- embedding generation
  • POST /api/pull -- model downloads (bandwidth)
  • POST /api/create -- model creation (CPU/disk intensive)

Impact

  • Attacker can flood inference endpoints with requests, causing GPU resource exhaustion and denial of service for legitimate users
  • GPU memory is finite -- rapid model switching via many concurrent requests can crash the runner
  • Model pull endpoint can be abused to consume bandwidth and disk space
  • No mechanism to slow down or block abusive clients

Remediation

  1. Add rate limiting middleware (e.g., gin-contrib/limiter or custom)
  2. Different rate limits for different endpoints:
    • /api/generate, /api/chat: lower limit (compute-bound)
    • /api/tags, /api/show: higher limit (lightweight)
  3. Configurable limits via environment variables (e.g., OLLAMA_RATE_LIMIT)
  4. Consider IP-based and token-based rate limiting

Found as part of a broader security audit of ollama/ollama.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING