litellm - 💡(How to fix) Fix CrashLoopBackOff in air-gapped OpenShift due to tiktoken trying to download cl100k_base.tiktoken in main-v1.81.3-stable [1 participants]

litellm2026-03-10 01:44:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23218•Fetched 2026-04-08 00:38:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Elvincth

Participants

Elvincth

Timeline (top)

labeled ×3cross-referenced ×1

Error Message

File "/usr/lib/python3.13/site-packages/tiktoken/load.py", line 24, in read_file resp = requests.get(blobpath) ... requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("... [Errno -2] Name or service not known"))

Root Cause

The failure happens during Python module import (not during actual model calls), because tiktoken eagerly loads cl100k_base via imports in litellm (especially through litellm.images.utils → token_counter → default_encoding).

Fix Action

Fix / Workaround

Workarounds tried

Preloading via python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')" in Dockerfile → still fails (tiktoken re-attempts fetch on runtime import)
Downgrade to pre-regression tag (e.g. main-v1.80.8-stable or earlier) → often resolves it (but loses newer features)
Mounting the cache dir as a volume with pre-downloaded file → same issue if import chain bypasses cache check

Code Example

FROM ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable

WORKDIR /app

ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache \
    DATA_GYM_CACHE_DIR=/app/tiktoken-cache

RUN mkdir -p "${TIKTOKEN_CACHE_DIR}" && \
    python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"

# (your other layers: chmod entrypoint, expose, etc.)
RUN chmod +x ./docker/entrypoint.sh

EXPOSE 4000/tcp

CMD ["--port", "4000", "--config", "/app/proxy_config.yaml", "--detailed_debug"]

---

File "/usr/lib/python3.13/site-packages/tiktoken/load.py", line 24, in read_file
  resp = requests.get(blobpath)
...
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("... [Errno -2] Name or service not known"))

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Describe the bug
In a fully air-gapped/disconnected OpenShift cluster (no outbound internet, no DNS resolution to external domains like openaipublic.blob.core.windows.net), the LiteLLM pod enters CrashLoopBackOff immediately on startup.

This is a known regression in v1.81.x (see #19852 – "tiktoken eager loading regression in v1.81.x - startup fails without internet access").

Expected behavior

Preloading the encoding during image build (with internet) + setting TIKTOKEN_CACHE_DIR should allow fully offline startup.
litellm should not trigger eager tiktoken loading at import time in air-gapped setups (or respect cache before attempting network fetch).

Workarounds tried

Preloading via python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')" in Dockerfile → still fails (tiktoken re-attempts fetch on runtime import)
Downgrade to pre-regression tag (e.g. main-v1.80.8-stable or earlier) → often resolves it (but loses newer features)
Mounting the cache dir as a volume with pre-downloaded file → same issue if import chain bypasses cache check

Steps to Reproduce

Use disconnected OpenShift 4.x cluster (no egress to Azure blob or internet)
Build/deploy a custom image based on ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable with this Dockerfile to preload the tiktoken cache during build (when internet is available):

FROM ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable

WORKDIR /app

ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache \
    DATA_GYM_CACHE_DIR=/app/tiktoken-cache

RUN mkdir -p "${TIKTOKEN_CACHE_DIR}" && \
    python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"

# (your other layers: chmod entrypoint, expose, etc.)
RUN chmod +x ./docker/entrypoint.sh

EXPOSE 4000/tcp

CMD ["--port", "4000", "--config", "/app/proxy_config.yaml", "--detailed_debug"]

Build succeeds (cache file 9b5ad71b2ce5302211f9c61530b329a4922fc6a4 is created in /app/tiktoken-cache)
Push to internal registry and deploy in air-gapped cluster
Pod crashes with:

File "/usr/lib/python3.13/site-packages/tiktoken/load.py", line 24, in read_file
  resp = requests.get(blobpath)
...
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /encodings/cl100k_base.tiktoken (Caused by NameResolutionError("... [Errno -2] Name or service not known"))

Even though the file exists in TIKTOKEN_CACHE_DIR, tiktoken still attempts the HTTP fetch during early import.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

non root main-v1.81.3-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue with tiktoken eager loading in air-gapped OpenShift clusters, follow these steps:

Patch the tiktoken library: Modify the tiktoken library to respect the cache directory before attempting to fetch the encoding from the network.
Update the Dockerfile: Update the Dockerfile to include the patched tiktoken library and ensure that the cache directory is properly configured.

Example code changes:

# Patched tiktoken/load.py
import os

def read_file(blobpath, cache_dir):
    cache_file = os.path.join(cache_dir, os.path.basename(blobpath))
    if os.path.exists(cache_file):
        with open(cache_file, 'rb') as f:
            return f.read()
    # Attempt to fetch from network if cache file does not exist
    resp = requests.get(blobpath)
    # ...

# Updated Dockerfile
FROM ghcr.io/berriai/litellm-non_root:main-v1.81.3-stable

WORKDIR /app

ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache \
    DATA_GYM_CACHE_DIR=/app/tiktoken-cache

# Copy patched tiktoken library
COPY patched_tiktoken /usr/lib/python3.13/site-packages/tiktoken

RUN mkdir -p "${TIKTOKEN_CACHE_DIR}" && \
    python -c "import tiktoken; tiktoken.get_encoding('cl100k_base')"

# (your other layers: chmod entrypoint, expose, etc.)
RUN chmod +x ./docker/entrypoint.sh

EXPOSE 4000/tcp

CMD ["--port", "4000", "--config", "/app/proxy_config.yaml", "--detailed_debug"]

Verification

To verify that the fix worked, build and deploy the updated image in the air-gapped cluster and check that the pod starts successfully without attempting to fetch the encoding from the network.

Extra Tips

Ensure that the cache directory is properly configured and that the cache file exists in the expected location.
Consider implementing a fallback mechanism to handle cases where the cache file is missing or corrupted.
Keep in mind that this fix is specific to the tiktoken library and may need to be adapted for other libraries or dependencies that exhibit similar behavior.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #tool integration #LLM response #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix CrashLoopBackOff in air-gapped OpenShift due to tiktoken trying to download cl100k_base.tiktoken in main-v1.81.3-stable [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix CrashLoopBackOff in air-gapped OpenShift due to tiktoken trying to download cl100k_base.tiktoken in main-v1.81.3-stable [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING