litellm - 💡(How to fix) Fix CrashLoopBackOff in air-gapped OpenShift due to tiktoken trying to download cl100k_base.tiktoken in main-v1.81.3-stable [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23218Fetched 2026-04-08 00:38:01
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Error Message

File "/usr/lib/python3.13/site-packages/tiktoken/load.py", line 24, in read_file resp = requests.get(blobpath) ... requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("... [Errno -2] Name or service not known"))

Root Cause

The failure happens during Python module import (not during actual model calls), because tiktoken eagerly loads cl100k_base via imports in litellm (especially through litellm.images.utilstoken_counterdefault_encoding).

Fix Action

Fix / Workaround

Workarounds tried

  • Preloading via python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')" in Dockerfile → still fails (tiktoken re-attempts fetch on runtime import)
  • Downgrade to pre-regression tag (e.g. main-v1.80.8-stable or earlier) → often resolves it (but loses newer features)
  • Mounting the cache dir as a volume with pre-downloaded file → same issue if import chain bypasses cache check

Code Example

FROM ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable

WORKDIR /app

ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache \
    DATA_GYM_CACHE_DIR=/app/tiktoken-cache

RUN mkdir -p "${TIKTOKEN_CACHE_DIR}" && \
    python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"

# (your other layers: chmod entrypoint, expose, etc.)
RUN chmod +x ./docker/entrypoint.sh

EXPOSE 4000/tcp

CMD ["--port", "4000", "--config", "/app/proxy_config.yaml", "--detailed_debug"]

---

File "/usr/lib/python3.13/site-packages/tiktoken/load.py", line 24, in read_file
  resp = requests.get(blobpath)
...
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("... [Errno -2] Name or service not known"))

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Describe the bug
In a fully air-gapped/disconnected OpenShift cluster (no outbound internet, no DNS resolution to external domains like openaipublic.blob.core.windows.net), the LiteLLM pod enters CrashLoopBackOff immediately on startup.

The failure happens during Python module import (not during actual model calls), because tiktoken eagerly loads cl100k_base via imports in litellm (especially through litellm.images.utilstoken_counterdefault_encoding).

This is a known regression in v1.81.x (see #19852 – "tiktoken eager loading regression in v1.81.x - startup fails without internet access").

Expected behavior

  • Preloading the encoding during image build (with internet) + setting TIKTOKEN_CACHE_DIR should allow fully offline startup.
  • litellm should not trigger eager tiktoken loading at import time in air-gapped setups (or respect cache before attempting network fetch).

Workarounds tried

  • Preloading via python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')" in Dockerfile → still fails (tiktoken re-attempts fetch on runtime import)
  • Downgrade to pre-regression tag (e.g. main-v1.80.8-stable or earlier) → often resolves it (but loses newer features)
  • Mounting the cache dir as a volume with pre-downloaded file → same issue if import chain bypasses cache check

Steps to Reproduce

  1. Use disconnected OpenShift 4.x cluster (no egress to Azure blob or internet)
  2. Build/deploy a custom image based on ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable with this Dockerfile to preload the tiktoken cache during build (when internet is available):
FROM ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable

WORKDIR /app

ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache \
    DATA_GYM_CACHE_DIR=/app/tiktoken-cache

RUN mkdir -p "${TIKTOKEN_CACHE_DIR}" && \
    python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"

# (your other layers: chmod entrypoint, expose, etc.)
RUN chmod +x ./docker/entrypoint.sh

EXPOSE 4000/tcp

CMD ["--port", "4000", "--config", "/app/proxy_config.yaml", "--detailed_debug"]
  1. Build succeeds (cache file 9b5ad71b2ce5302211f9c61530b329a4922fc6a4 is created in /app/tiktoken-cache)
  2. Push to internal registry and deploy in air-gapped cluster
  3. Pod crashes with:
File "/usr/lib/python3.13/site-packages/tiktoken/load.py", line 24, in read_file
  resp = requests.get(blobpath)
...
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("... [Errno -2] Name or service not known"))

Even though the file exists in TIKTOKEN_CACHE_DIR, tiktoken still attempts the HTTP fetch during early import.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

non root main-v1.81.3-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue with tiktoken eager loading in air-gapped OpenShift clusters, follow these steps:

  • Patch the tiktoken library: Modify the tiktoken library to respect the cache directory before attempting to fetch the encoding from the network.
  • Update the Dockerfile: Update the Dockerfile to include the patched tiktoken library and ensure that the cache directory is properly configured.

Example code changes:

# Patched tiktoken/load.py
import os

def read_file(blobpath, cache_dir):
    cache_file = os.path.join(cache_dir, os.path.basename(blobpath))
    if os.path.exists(cache_file):
        with open(cache_file, 'rb') as f:
            return f.read()
    # Attempt to fetch from network if cache file does not exist
    resp = requests.get(blobpath)
    # ...
# Updated Dockerfile
FROM ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable

WORKDIR /app

ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache \
    DATA_GYM_CACHE_DIR=/app/tiktoken-cache

# Copy patched tiktoken library
COPY patched_tiktoken /usr/lib/python3.13/site-packages/tiktoken

RUN mkdir -p "${TIKTOKEN_CACHE_DIR}" && \
    python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"

# (your other layers: chmod entrypoint, expose, etc.)
RUN chmod +x ./docker/entrypoint.sh

EXPOSE 4000/tcp

CMD ["--port", "4000", "--config", "/app/proxy_config.yaml", "--detailed_debug"]

Verification

To verify that the fix worked, build and deploy the updated image in the air-gapped cluster and check that the pod starts successfully without attempting to fetch the encoding from the network.

Extra Tips

  • Ensure that the cache directory is properly configured and that the cache file exists in the expected location.
  • Consider implementing a fallback mechanism to handle cases where the cache file is missing or corrupted.
  • Keep in mind that this fix is specific to the tiktoken library and may need to be adapted for other libraries or dependencies that exhibit similar behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING