litellm - 💡(How to fix) Fix [Feature Request]: Support for NVIDIA NeMo Guardrails (Python Library & Proxy API Server) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25255Fetched 2026-04-08 03:02:28
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

  1. If blocked by the input guardrails, LiteLLM returns an immediate standard error/blocked response.

Root Cause

Describe alternatives you've considered

  1. Using NeMo Guardrails as the entry point: Running the NeMo Guardrails server and configuring it to call the LiteLLM Proxy as its LLM provider. Drawback: We lose the ability to use LiteLLM's API key management, rate limiting, and user-level spend tracking because LiteLLM only sees a single user (the NeMo server).
  2. Custom LiteLLM Middleware: Writing custom callbacks (custom_callbacks) to route requests through NeMo Guardrails. Drawback: Requires maintaining custom Python code rather than relying on a standardized, containerized LiteLLM proxy deployment.

Code Example

model_list:
  - model_name: my-secure-gpt
    litellm_params:
      model: openai/gpt-4
      
litellm_settings:
  guardrails:
    - nemo_guardrails:
        config_path: "./nemo_config/" # Path to the Colang and YAML files
        # Optional: Run on input, output, or both
        mode: "both"
RAW_BUFFERClick to expand / collapse

Is your feature request related to a problem? Please describe. As enterprise adoption of LLMs grows, so does the need for robust, programmable guardrails to ensure output safety, prevent hallucinations, and restrict topic deviation. While LiteLLM currently supports some guardrail providers (like Lakera, Prompt Injection, and Presidio), it lacks native support for NVIDIA NeMo Guardrails, which has become an industry standard for open-source, highly customizable guardrails.

Currently, if we want to use NeMo Guardrails with LiteLLM, we have to run NeMo Guardrails as a separate upstream service or wrap LiteLLM calls in custom code. This adds architectural complexity, increases latency, and prevents us from utilizing LiteLLM's native Proxy Server features (like spend tracking, load balancing, and API key management) as the primary entry point.

Describe the solution you'd like I would like to see native integration of NVIDIA NeMo Guardrails into LiteLLM, specifically within the Proxy API Server and the standard Python library.

Ideally, it would function similarly to existing LiteLLM guardrail integrations, where we can specify a path to the NeMo Guardrails configuration directory in the config.yaml.

Proposed Implementation Idea for Proxy config.yaml:

model_list:
  - model_name: my-secure-gpt
    litellm_params:
      model: openai/gpt-4
      
litellm_settings:
  guardrails:
    - nemo_guardrails:
        config_path: "./nemo_config/" # Path to the Colang and YAML files
        # Optional: Run on input, output, or both
        mode: "both"

Workflow:

  1. User sends a request to the LiteLLM Proxy.
  2. LiteLLM intercepts the request and passes it to the NeMo Guardrails instance (initialized via the provided config_path).
  3. If blocked by the input guardrails, LiteLLM returns an immediate standard error/blocked response.
  4. If passed, LiteLLM routes the request to the upstream LLM.
  5. (If configured) Output is evaluated by NeMo Guardrails before being returned to the user.

Describe alternatives you've considered

  1. Using NeMo Guardrails as the entry point: Running the NeMo Guardrails server and configuring it to call the LiteLLM Proxy as its LLM provider. Drawback: We lose the ability to use LiteLLM's API key management, rate limiting, and user-level spend tracking because LiteLLM only sees a single user (the NeMo server).
  2. Custom LiteLLM Middleware: Writing custom callbacks (custom_callbacks) to route requests through NeMo Guardrails. Drawback: Requires maintaining custom Python code rather than relying on a standardized, containerized LiteLLM proxy deployment.

Additional context

  • NVIDIA NeMo Guardrails Repo: https://github.com/NVIDIA/NeMo-Guardrails
  • Because NeMo Guardrails has a Python API (LLMRails), it should be relatively straightforward to integrate it into LiteLLM's async pre_call_hook and post_call_hook architecture.
  • This feature would make LiteLLM incredibly attractive to enterprise teams looking for a unified routing, logging, and safety layer.

extent analysis

TL;DR

Integrate NVIDIA NeMo Guardrails into LiteLLM by implementing a native guardrail provider that utilizes the NeMo Guardrails Python API (LLMRails) within the Proxy API Server and standard Python library.

Guidance

  • Investigate the LLMRails API to understand how to initialize and use NeMo Guardrails instances programmatically.
  • Modify the LiteLLM Proxy API Server to accept a config_path parameter for NeMo Guardrails configuration, similar to existing guardrail integrations.
  • Implement pre_call_hook and post_call_hook functions to integrate NeMo Guardrails into the LiteLLM request workflow, allowing for input and output evaluation.
  • Consider adding support for optional mode parameter to control when NeMo Guardrails are applied (input, output, or both).

Example

import LLMRails

# Initialize NeMo Guardrails instance
nemo_guardrails = LLMRails.init(config_path="./nemo_config/")

# Pre-call hook to evaluate input
def pre_call_hook(request):
    if not nemo_guardrails.evaluate_input(request):
        return "Input blocked by NeMo Guardrails"

# Post-call hook to evaluate output
def post_call_hook(response):
    if not nemo_guardrails.evaluate_output(response):
        return "Output blocked by NeMo Guardrails"

Notes

  • The proposed implementation idea for the Proxy config.yaml file provides a clear direction for integrating NeMo Guardrails into LiteLLM.
  • The LLMRails API documentation should be consulted to ensure correct usage and initialization of NeMo Guardrails instances.

Recommendation

Apply workaround by implementing a custom integration of NeMo Guardrails into LiteLLM using the LLMRails API, as a native integration is not currently available. This will allow for a more streamlined and standardized deployment of LiteLLM with NeMo Guardrails.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Feature Request]: Support for NVIDIA NeMo Guardrails (Python Library & Proxy API Server) [1 participants]