litellm - 💡(How to fix) Fix LiteLLM keeps throwing 404 Error with Modal Endpoint [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25129Fetched 2026-04-08 02:44:59
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
labeled ×1renamed ×1

Error Message

(llm_deep)'~\modal\litellm>python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model lora Connected to http://localhost:4000 Using LoRA: True

Traceback (most recent call last): File "\deployment\modal\litellm\client.py", line 136, in <module> main() File "\deployment\modal\litellm\client.py", line 110, in main response = send_request(prompt, use_lora) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\deployment\modal\litellm\client.py", line 48, in send_request response = client.chat.completions.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "\llmai\llm_deep\Lib\site-packages\openai_utils_utils.py", line 286, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "\llmai\llm_deep\Lib\site-packages\openai\resources\chat\completions\completions.py", line 1192, in create return self._post( ^^^^^^^^^^^ File "\llmai\llm_deep\Lib\site-packages\openai_base_client.py", line 1297, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\llmai\llm_deep\Lib\site-packages\openai_base_client.py", line 1070, in request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'message': "litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}. Received Model Group=LoRAfrica\nAvailable Model Group Fallbacks=None", 'type': None, 'param': None, 'code': '404'}}

Code Example

docker compose up -v
docker compose restart litellm

---

# ----------------------------
# LiteLLM Configuration
# ----------------------------
model_list:
  # Single model entry, LoRA toggle controlled by server
  - model_name: LoRAfrica
    litellm_params:
      model: openai/LoRAfrica         # The model identifier used by the server
      api_base: https://modal_username--v1.modal.run/v1  # Replace with your Modal endpoint, /v1 is added for OpenAI endpoint compatiblilty
      api_key: os.environ/MODAL_API_KEY
      input_cost_per_token: 0.000000224
      output_cost_per_token: 0.000000576

general_settings:
  master_key: sk-dummy-key
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  store_prompts_in_spend_logs: true
  budget_blocking: true

  alerting:
    - slack
  alerting_threshold: 70
  slack_webhook_url: os.environ/SLACK_WEBHOOK_URL

litellm_settings:
  set_verbose: true
  success_callback: ["langsmith"]
  # drop_params: true

---

version: "3.9"

services:
  # ===========================================
  # PostgreSQL for Spend Tracking
  # ===========================================
  postgres:
    image: postgres:15
    container_name: litellm-postgres
    environment:
      POSTGRES_USER: litellm
      POSTGRES_PASSWORD: litellm
      POSTGRES_DB: litellm
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U litellm"]
      interval: 5s
      timeout: 5s
      retries: 10

  # ===========================================
  # LiteLLM Proxy
  # ===========================================
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm-proxy
    ports:
      - "4000:4000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    environment:
     #  - OPENAI_API_KEY=${OPENAI_API_KEY}
      - MODAL_API_KEY=${MODAL_API_KEY}  # Uncomment if using Modal
      - DATABASE_URL=postgresql://litellm:litellm@postgres:5432/litellm
      # Admin UI access
      - LITELLM_MASTER_KEY=sk-dummy-key
      - UI_USERNAME=admin
      - UI_PASSWORD=admin
      # Slack alerts
      - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL:-}
      # LangSmith tracing
      - LANGSMITH_API_KEY=${LANGSMITH_API_KEY:-}
      - LANGSMITH_PROJECT=${LANGSMITH_PROJECT:-litellm-proxy}
      - LANGSMITH_TRACING=true
      # # Optional: Connect to your local Langfuse
      # - LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY:-}
      # - LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY:-}
      # - LANGFUSE_HOST=${LANGFUSE_HOST:-http://host.docker.internal:3000}
    command:
      - "--config"
      - "/app/config.yaml"
      - "--port"
      - "4000"
      - "--detailed_debug"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      postgres:
        condition: service_healthy

volumes:
  postgres_data:

---

"""
LiteLLM chat client with LoRA toggle and server-side token usage.

Usage Examples:

# Non-streaming
python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model lora
python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model base

# Streaming mode
python client.py --stream "Briefly detail the significance story of the Igbo god Amadioha?" --model lora
python client.py --stream "Briefly detail the significance story of the Igbo god Amadioha?" --model base

# Interactive mode (streaming by default)
python client.py --model lora
python client.py --model base
"""

import argparse
from openai import OpenAI

# ---------------------
# Configuration
# ---------------------
LITELLM_URL = "http://localhost:4000"  # LiteLLM proxy URL
LITELLM_KEY = "sk-dummy-key"   # Master key from LiteLLM (this keys changes when I create a virtual key)
MODEL_NAME = "LoRAfrica"               # Single model registered in litellm_config.yaml

SYSTEM_PROMPT = (
    "You are a helpful AI assistant specialised in African history "
    "which gives concise answers to questions asked."
)

# ---------------------
# Initialize client
# ---------------------
client = OpenAI(base_url=LITELLM_URL, api_key=LITELLM_KEY)

# ---------------------
# Helpers
# ---------------------
def send_request(prompt: str, use_lora: bool = True, max_tokens: int = 128):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt}
    ]

    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
        max_tokens=max_tokens,
        temperature=0.1,
        extra_body={"use_lora": use_lora}
    )

    if response.usage:
        print(
            f"Token usage -> Prompt: {response.usage.prompt_tokens}, "
            f"Completion: {response.usage.completion_tokens}, "
            f"Total: {response.usage.total_tokens}"
        )

    return response.choices[0].message.content


def send_request_stream(prompt: str, use_lora: bool = True, max_tokens: int = 128):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt}
    ]

    stream = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
        max_tokens=max_tokens,
        temperature=0.1,
        stream=True,
        extra_body={"use_lora": use_lora}
    )

    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta and "content" in delta:
            yield delta["content"]

# ---------------------
# Main
# ---------------------
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("input_text", nargs="?", default=None)
    parser.add_argument("--model", choices=["base", "lora"], default="lora")
    parser.add_argument("--stream", action="store_true")
    args = parser.parse_args()

    use_lora = args.model == "lora"

    print(f"Connected to {LITELLM_URL}")
    print(f"Using LoRA: {use_lora}")
    print("-" * 50)

    if args.input_text:
        prompt = args.input_text
        if args.stream:
            print("Bot: ", end="", flush=True)
            for token in send_request_stream(prompt, use_lora):
                print(token, end="", flush=True)
            print()
        else:
            response = send_request(prompt, use_lora)
            print(f"\nResponse:\n{response}")

    else:
        # Interactive mode
        print("Interactive mode (streaming). Type 'quit' to exit.\n")
        while True:
            try:
                user_input = input("You: ").strip()
                if user_input.lower() in ["quit", "exit", "q"]:
                    print("Goodbye!")
                    break
                if not user_input:
                    continue

                print("Bot: ", end="", flush=True)
                for token in send_request_stream(user_input, use_lora):
                    print(token, end="", flush=True)
                print("\n")

            except KeyboardInterrupt:
                print("\nGoodbye!")
                break


if __name__ == "__main__":
    main()

---

(llm_deep)'~\modal\litellm>python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model lora
Connected to http://localhost:4000
Using LoRA: True
--------------------------------------------------
Traceback (most recent call last):
  File "~\deployment\modal\litellm\client.py", line 136, in <module>
    main()
  File "~\deployment\modal\litellm\client.py", line 110, in main
    response = send_request(prompt, use_lora)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\deployment\modal\litellm\client.py", line 48, in send_request
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\_utils\_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\resources\chat\completions\completions.py", line 1192, in create
    return self._post(
           ^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\_base_client.py", line 1297, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\_base_client.py", line 1070, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': "litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}. Received Model Group=LoRAfrica\nAvailable Model Group Fallbacks=None", 'type': None, 'param': None, 'code': '404'}}
RAW_BUFFERClick to expand / collapse

My Modal Endpoint is accessible via my Modal Client code (without LiteLLM) and PostMan, but with LiteLLM, it keeps throwing a 404 error. Below is my setup

commands to get LiteLLM starting

docker compose up -v
docker compose restart litellm

litellm_config.yaml

# ----------------------------
# LiteLLM Configuration
# ----------------------------
model_list:
  # Single model entry, LoRA toggle controlled by server
  - model_name: LoRAfrica
    litellm_params:
      model: openai/LoRAfrica         # The model identifier used by the server
      api_base: https://modal_username--v1.modal.run/v1  # Replace with your Modal endpoint, /v1 is added for OpenAI endpoint compatiblilty
      api_key: os.environ/MODAL_API_KEY
      input_cost_per_token: 0.000000224
      output_cost_per_token: 0.000000576

general_settings:
  master_key: sk-dummy-key
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  store_prompts_in_spend_logs: true
  budget_blocking: true

  alerting:
    - slack
  alerting_threshold: 70
  slack_webhook_url: os.environ/SLACK_WEBHOOK_URL

litellm_settings:
  set_verbose: true
  success_callback: ["langsmith"]
  # drop_params: true

docker-compose.yaml

version: "3.9"

services:
  # ===========================================
  # PostgreSQL for Spend Tracking
  # ===========================================
  postgres:
    image: postgres:15
    container_name: litellm-postgres
    environment:
      POSTGRES_USER: litellm
      POSTGRES_PASSWORD: litellm
      POSTGRES_DB: litellm
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U litellm"]
      interval: 5s
      timeout: 5s
      retries: 10

  # ===========================================
  # LiteLLM Proxy
  # ===========================================
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm-proxy
    ports:
      - "4000:4000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
    environment:
     #  - OPENAI_API_KEY=${OPENAI_API_KEY}
      - MODAL_API_KEY=${MODAL_API_KEY}  # Uncomment if using Modal
      - DATABASE_URL=postgresql://litellm:litellm@postgres:5432/litellm
      # Admin UI access
      - LITELLM_MASTER_KEY=sk-dummy-key
      - UI_USERNAME=admin
      - UI_PASSWORD=admin
      # Slack alerts
      - SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL:-}
      # LangSmith tracing
      - LANGSMITH_API_KEY=${LANGSMITH_API_KEY:-}
      - LANGSMITH_PROJECT=${LANGSMITH_PROJECT:-litellm-proxy}
      - LANGSMITH_TRACING=true
      # # Optional: Connect to your local Langfuse
      # - LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY:-}
      # - LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY:-}
      # - LANGFUSE_HOST=${LANGFUSE_HOST:-http://host.docker.internal:3000}
    command:
      - "--config"
      - "/app/config.yaml"
      - "--port"
      - "4000"
      - "--detailed_debug"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      postgres:
        condition: service_healthy

volumes:
  postgres_data:

Client code

"""
LiteLLM chat client with LoRA toggle and server-side token usage.

Usage Examples:

# Non-streaming
python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model lora
python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model base

# Streaming mode
python client.py --stream "Briefly detail the significance story of the Igbo god Amadioha?" --model lora
python client.py --stream "Briefly detail the significance story of the Igbo god Amadioha?" --model base

# Interactive mode (streaming by default)
python client.py --model lora
python client.py --model base
"""

import argparse
from openai import OpenAI

# ---------------------
# Configuration
# ---------------------
LITELLM_URL = "http://localhost:4000"  # LiteLLM proxy URL
LITELLM_KEY = "sk-dummy-key"   # Master key from LiteLLM (this keys changes when I create a virtual key)
MODEL_NAME = "LoRAfrica"               # Single model registered in litellm_config.yaml

SYSTEM_PROMPT = (
    "You are a helpful AI assistant specialised in African history "
    "which gives concise answers to questions asked."
)

# ---------------------
# Initialize client
# ---------------------
client = OpenAI(base_url=LITELLM_URL, api_key=LITELLM_KEY)

# ---------------------
# Helpers
# ---------------------
def send_request(prompt: str, use_lora: bool = True, max_tokens: int = 128):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt}
    ]

    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
        max_tokens=max_tokens,
        temperature=0.1,
        extra_body={"use_lora": use_lora}
    )

    if response.usage:
        print(
            f"Token usage -> Prompt: {response.usage.prompt_tokens}, "
            f"Completion: {response.usage.completion_tokens}, "
            f"Total: {response.usage.total_tokens}"
        )

    return response.choices[0].message.content


def send_request_stream(prompt: str, use_lora: bool = True, max_tokens: int = 128):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": prompt}
    ]

    stream = client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
        max_tokens=max_tokens,
        temperature=0.1,
        stream=True,
        extra_body={"use_lora": use_lora}
    )

    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta and "content" in delta:
            yield delta["content"]

# ---------------------
# Main
# ---------------------
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("input_text", nargs="?", default=None)
    parser.add_argument("--model", choices=["base", "lora"], default="lora")
    parser.add_argument("--stream", action="store_true")
    args = parser.parse_args()

    use_lora = args.model == "lora"

    print(f"Connected to {LITELLM_URL}")
    print(f"Using LoRA: {use_lora}")
    print("-" * 50)

    if args.input_text:
        prompt = args.input_text
        if args.stream:
            print("Bot: ", end="", flush=True)
            for token in send_request_stream(prompt, use_lora):
                print(token, end="", flush=True)
            print()
        else:
            response = send_request(prompt, use_lora)
            print(f"\nResponse:\n{response}")

    else:
        # Interactive mode
        print("Interactive mode (streaming). Type 'quit' to exit.\n")
        while True:
            try:
                user_input = input("You: ").strip()
                if user_input.lower() in ["quit", "exit", "q"]:
                    print("Goodbye!")
                    break
                if not user_input:
                    continue

                print("Bot: ", end="", flush=True)
                for token in send_request_stream(user_input, use_lora):
                    print(token, end="", flush=True)
                print("\n")

            except KeyboardInterrupt:
                print("\nGoodbye!")
                break


if __name__ == "__main__":
    main()

Error

(llm_deep)'~\modal\litellm>python client.py "Briefly detail the significance story of the Igbo god Amadioha?" --model lora
Connected to http://localhost:4000
Using LoRA: True
--------------------------------------------------
Traceback (most recent call last):
  File "~\deployment\modal\litellm\client.py", line 136, in <module>
    main()
  File "~\deployment\modal\litellm\client.py", line 110, in main
    response = send_request(prompt, use_lora)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\deployment\modal\litellm\client.py", line 48, in send_request
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\_utils\_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\resources\chat\completions\completions.py", line 1192, in create
    return self._post(
           ^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\_base_client.py", line 1297, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~\llmai\llm_deep\Lib\site-packages\openai\_base_client.py", line 1070, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': "litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}. Received Model Group=LoRAfrica\nAvailable Model Group Fallbacks=None", 'type': None, 'param': None, 'code': '404'}}

extent analysis

TL;DR

The 404 error with LiteLLM suggests a misconfiguration or incompatibility issue, potentially related to the model name or endpoint setup.

Guidance

  1. Verify Model Name: Ensure the model name "LoRAfrica" is correctly registered and available in the LiteLLM configuration and the Modal endpoint.
  2. Check Endpoint Setup: Confirm that the api_base URL in litellm_config.yaml is correctly set to the Modal endpoint, and that the endpoint is accessible and properly configured.
  3. Review LiteLLM Logs: Inspect the LiteLLM logs for any error messages or warnings that could indicate the root cause of the 404 error, such as model loading issues or configuration problems.
  4. Test with Different Models: Attempt to use a different model or a base model to isolate if the issue is specific to the "LoRAfrica" model or a more general configuration problem.

Example

No specific code example is provided as the issue seems to be related to configuration rather than code syntax.

Notes

The error message suggests a NotFoundError which could be due to a variety of reasons including but not limited to, the model not being found, the endpoint not being correctly configured, or the model not being properly loaded in the LiteLLM setup.

Recommendation

Apply a workaround by checking the model name, endpoint setup, and reviewing logs to identify any potential misconfigurations before attempting to use the "LoRAfrica" model with LiteLLM.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING