hermes - 💡(How to fix) Fix Feature Request: First-class support for preset routers, split endpoints, and serverless inference providers

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Current workarounds:

  • OpenRouter presets: discover_models: false + custom provider (works but undocumented)
  • Split endpoints: Inline api_key in model_aliases (fragile, repeats key)
  • NIM/Modal: Generic custom provider (loses type-specific optimizations)

Code Example

custom_providers:
  - name: my-openrouter-presets
    type: openrouter_presets   # new type
    api_key: sk-or-v1-...
    presets:
      - "@preset/some-name"
      - "@preset/some-other-name"

---

custom_providers:
  - name: wafer-pass
    provider_id: wafer-pass    # new field
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
  - name: wafer-serverless
    provider_id: wafer-serverless
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...

---

providers:
  nvidia_nim:
    api_key: nvapi-...
    default_model: deepseek

---

providers:
  modal:
    api_key: modalresearch_...
    base_url: https://api.us-west-2.modal.direct/v1
    default_model: zai-org/GLM-5.1-FP8

---

custom_providers:
  - name: wafer-pass
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
    model: GLM-5.1
  - name: wafer-serverless
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
    model: deepseek-v4-flash
  - name: Openrouter Presets
    base_url: https://openrouter.ai/api/v1
    api_key: sk-or-v1-...
    discover_models: false
    model: '@preset/some-name'
RAW_BUFFERClick to expand / collapse

Feature Request: First-class support for preset routers, split endpoints, and serverless inference providers

Is your feature request related to a problem? Please describe.

Hermes's current custom_providers system works well for standard OpenAI-compatible endpoints but has friction with several modern inference patterns:

  1. OpenRouter Presets — Private @preset/* models fail model-discovery warnings and pollute the model list with 300+ public models unless discover_models: false is manually added (undocumented for this use case).

  2. Split Endpoints (same base_url, different keys) — Providers like Wafer AI have multiple product tiers (pass, serverless) sharing the same base_url but requiring different API keys. The current deduplication logic (seen_name_url_pairs) silently drops duplicate base_urls, forcing users to inline keys in model_aliases.

  3. NVIDIA NIM — No native provider profile; users must use custom with base_url: https://integrate.api.nvidia.com/v1. Model catalog discovery works but lacks NIM-specific features (e.g., nvidia/ namespace handling).

  4. Nebius Token Factory — Token-based auth (long-lived service tokens) works but isn't clearly distinguished from standard API keys in docs.

  5. Modal Direct — No built-in support for Modal's direct inference endpoints (https://api.<region>.modal.direct/v1/chat/completions).

Describe the solution you'd like

1. Native openrouter_presets provider type

custom_providers:
  - name: my-openrouter-presets
    type: openrouter_presets   # new type
    api_key: sk-or-v1-...
    presets:
      - "@preset/some-name"
      - "@preset/some-other-name"
  • Automatically sets discover_models: false
  • Validates preset slug format (@preset/*)
  • Suppresses "not found in model listing" warnings for preset models

2. Explicit provider_id for split endpoints

custom_providers:
  - name: wafer-pass
    provider_id: wafer-pass    # new field
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
  - name: wafer-serverless
    provider_id: wafer-serverless
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
  • Deduplication uses (provider_id, base_url) instead of just (name, base_url, model)
  • Allows same base_url with different keys

3. Native nvidia_nim provider

providers:
  nvidia_nim:
    api_key: nvapi-...
    default_model: deepseek
  • Built-in profile with correct models_url, headers, and nvidia/ namespace handling

4. Native modal provider

providers:
  modal:
    api_key: modalresearch_...
    base_url: https://api.us-west-2.modal.direct/v1
    default_model: zai-org/GLM-5.1-FP8

5. Better discover_models documentation

Document discover_models: false for:

  • Private endpoints
  • Preset-only providers
  • Dedicated single-model endpoints

Describe alternatives you've considered

Current workarounds:

  • OpenRouter presets: discover_models: false + custom provider (works but undocumented)
  • Split endpoints: Inline api_key in model_aliases (fragile, repeats key)
  • NIM/Modal: Generic custom provider (loses type-specific optimizations)

Additional context

Our production custom_providers config for reference:

custom_providers:
  - name: wafer-pass
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
    model: GLM-5.1
  - name: wafer-serverless
    base_url: https://pass.wafer.ai/v1
    api_key: wfr_...
    model: deepseek-v4-flash
  - name: Openrouter Presets
    base_url: https://openrouter.ai/api/v1
    api_key: sk-or-v1-...
    discover_models: false
    model: '@preset/some-name'

Checklist for maintainers

  • openrouter_presets provider type
  • provider_id field for split endpoints
  • nvidia_nim built-in provider
  • modal built-in provider
  • Document discover_models: false patterns

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Feature Request: First-class support for preset routers, split endpoints, and serverless inference providers