hermes - ✅(Solved) Fix [Feature]: Custom OpenAI-compatible chat completions backend for web_search (Perplexity Sonar / ChatGPT browsing / etc.) [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#12832Fetched 2026-04-20 12:16:46
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
commented ×1cross-referenced ×1

Error Message

Reuse the same chat endpoint: send one message per URL asking the model to fetch + summarise the page into markdown. This works because search-augmented models (Sonar, browsing-enabled ChatGPT) can fetch URLs directly. Each URL becomes one document in the standard web_extract output shape (url, title, content, raw_content, metadata), with per-URL error isolation.

Root Cause

Reuse the same chat endpoint: send one message per URL asking the model to fetch + summarise the page into markdown. This works because search-augmented models (Sonar, browsing-enabled ChatGPT) can fetch URLs directly. Each URL becomes one document in the standard web_extract output shape (url, title, content, raw_content, metadata), with per-URL error isolation.

Fix Action

Fix / Workaround

  • Draft implementation: PR #8858 (feat: add custom OpenAI-compatible search backend). The PR does not Fixes this issue so they can be reviewed / merged independently; the issue documents the design and motivation separately from the patch.
  • Related but separate feature requests:
    • #10284 — custom JSON search backend (4get / SearXNG shape)
    • #10644 — native Brave Search backend

PR fix notes

PR #8858: feat: add custom OpenAI-compatible search backend

Description (problem / solution / changelog)

Add a new custom web search backend that works with any OpenAI-compatible chat completions endpoint with built-in web search (e.g. Perplexity Sonar, ChatGPT with browsing).

Configuration via config.yaml: web: backend: custom custom_base_url: https://api.example.com/v1 custom_model: model-name custom_api_key: sk-xxx

Or via environment variables: CUSTOM_SEARCH_BASE_URL, CUSTOM_SEARCH_MODEL, CUSTOM_SEARCH_API_KEY

The backend extracts structured search results from the model citations/search_results fields, falling back to the answer text. Also supports web_extract via chat completion.

What does this PR do?

Adds a fifth web_search / web_extract backend — custom — for any OpenAI-compatible /chat/completions endpoint that returns search citations inline. Concrete targets: Perplexity Sonar, ChatGPT with browsing, self-hosted LLMs with web access, LiteLLM or vLLM wrapping a search-augmented model.

Why this belongs in tools/web_tools.py next to Exa / Tavily / Firecrawl / Parallel rather than as a skill:

  • web_search / web_extract are first-class tools invoked by every agent and every platform toolset — a skill would be invisible unless explicitly loaded.
  • The web: config block is already the canonical place users expect to configure search; adding a skill would fragment the mental model.
  • The response-parsing logic (search_results[] → citations[] → answer text) is proxy-shape normalization, which is the same responsibility the other backends already discharge inside web_tools.py.

This is deliberately distinct from #10284 / #10414 (a custom JSON backend for 4get / SearXNG) — that path uses GET with url_template + results_path field mapping, this path uses POST /chat/completions with citation extraction. Different request shape, different response parser, different extract semantics; they want to coexist.

Related Issue

#12832

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

tools/web_tools.py (+219/-4) — new Custom Chat Completions Backend section with:

  • _get_custom_base_url(), _get_custom_model(), _custom_headers() — resolution order is CUSTOM_SEARCH_* env → web.custom_* config → (model only) default sonar.
  • _custom_chat(prompt) — single httpx.post to {base_url}/chat/completions with OpenAI-shaped payload.
  • _custom_search(query, limit) — three-tier response normalization: search_results[] (Perplexity native) → citations[] (string or dict) → choices[0].message.content as "Search Answer". Always returns the same {success, data: {web: [...]}} shape as the other backends so web_search_tool doesn't branch downstream.
  • _custom_extract(urls) — one chat call per URL with per-URL exception isolation; returns the same document shape (url, title, content, raw_content, metadata) as Firecrawl/Tavily/Parallel extract.
  • _get_backend(), _is_backend_available(), _web_requires_env(), check_web_api_key(), get_debug_session_info(), web_search_tool(), web_extract_tool() — threaded "custom" through every backend-aware dispatch point. _is_backend_available("custom") uniquely accepts either CUSTOM_SEARCH_API_KEY env or web.custom_api_key config, matching the helper resolution order.

hermes_cli/tools_config.py (+33/-0) — adds a "Custom (OpenAI-compatible)" entry to the web-backend provider list in hermes tools, plus a small generic extra_config mechanism in _configure_provider() that prompts for custom_base_url / custom_model (with sonar as the default) and writes them under web:. The mechanism is generic so future backends with non-env config can reuse it without a bespoke block.

tests/tools/test_web_tools_config.py (+363) — 20 new tests across three areas:

  1. TestBackendSelection parity (4 tests): config → "custom", case-insensitive, env-only fallback, priority ordering when Firecrawl also has a key.
  2. TestCheckWebApiKey parity (3 tests): env-only, backend: custom configured but no key returns False, web.custom_api_key via config only still returns True.
  3. Custom-backend-specific classes (13 tests): TestCustomBackendHelpers covers env vs config priority, trailing-slash stripping, default model, missing-config ValueError, Bearer auth construction. TestCustomSearch covers all three response-shape paths and their priority, limit enforcement, and the empty-response branch. TestCustomExtract covers multi-URL success, per-URL exception isolation, and empty-list short-circuit.

Also extended two existing _ENV_KEYS tuples to include CUSTOM_SEARCH_API_KEY / CUSTOM_SEARCH_BASE_URL / CUSTOM_SEARCH_MODEL so setup_method cleans them — without this, Custom env vars set in one test could leak into another.

website/docs/user-guide/configuration.md, website/docs/integrations/index.md, website/docs/reference/tools-reference.md — added the fifth backend to every enumerating table / comment / env-var list. The configuration guide gets a dedicated subsection explaining the resolution order and the extraction priority.

How to Test

  1. Configure Hermes against a real Perplexity Sonar account:

    # ~/.hermes/config.yaml
    web:
      backend: custom
      custom_base_url: https://api.perplexity.ai
      custom_model: sonar
    # ~/.hermes/.env
    CUSTOM_SEARCH_API_KEY=<your-perplexity-api-key>
  2. Verify the debug banner picks it up:

    hermes doctor
    # → "Using custom backend: https://api.perplexity.ai (model: sonar)"
  3. Exercise both tools end-to-end:

    hermes -q "Use web_search to find the latest llama.cpp release notes, then web_extract the top result."

    Expected: agent gets normalized {title, url, description, position} items from search_results[] (no Search Answer fallback, no empty list). web_extract returns a markdown document per URL with per-URL error isolation when a URL is unreachable.

  4. Run the test suite for the touched area:

    cd ~/.hermes/hermes-agent && source venv/bin/activate
    pytest tests/tools/test_web_tools_config.py tests/tools/test_web_tools_tavily.py -q
    # → 93 passed (78 in test_web_tools_config.py, 15 in test_web_tools_tavily.py)
  5. For contributors without a Sonar account: the added unit tests patch _custom_chat directly, so no real network call or key is needed to verify the response-shape contracts.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Ubuntu 24.04

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — updated configuration.md, integrations/index.md, tools-reference.md
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (the example file does not document the web: block, so there's no existing shape to extend)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A (no architecture change)
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (pure httpx.post + os.getenv + string processing, no termios/fcntl/os.setsid/path separators/subprocess)
  • I've updated tool descriptions/schemas if I changed tool behavior — the module-level docstring enumerates Custom alongside Exa/Tavily/Firecrawl/Parallel; web_search / web_extract schemas are unchanged (backend is transparent to the tool contract)

Screenshots / Logs

$ hermes doctor | head -20
...
✅ Web Search & Extract: configured
   Using custom backend: https://api.perplexity.ai (model: sonar)
...
$ pytest tests/tools/test_web_tools_config.py -q
78 passed in 2.18s

Changed files

  • hermes_cli/tools_config.py (modified, +33/-0)
  • tests/tools/test_web_tools_config.py (modified, +363/-0)
  • tools/web_tools.py (modified, +219/-4)
  • website/docs/integrations/index.md (modified, +4/-3)
  • website/docs/reference/tools-reference.md (modified, +2/-2)
  • website/docs/user-guide/configuration.md (modified, +15/-2)

Code Example

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar
  custom_api_key: sk-xxx    # optional; falls back to env

---

CUSTOM_SEARCH_BASE_URL=https://api.perplexity.ai
CUSTOM_SEARCH_MODEL=sonar
CUSTOM_SEARCH_API_KEY=sk-xxx
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes's web_search currently only supports a fixed set of backends: Exa, Parallel, Firecrawl, Tavily. Each is a dedicated search API with its own SDK / REST shape.

A large and growing class of search providers don't fit that shape — they're OpenAI-compatible chat completions endpoints where the model itself performs web search and returns citations alongside its answer. Examples:

  • Perplexity SonarPOST /chat/completions returns choices[].message.content plus search_results[] and citations[]
  • ChatGPT / OpenAI models with the browsing tool enabled — same chat-completions envelope, citations returned inline
  • Self-hosted / proxy deployments that expose an OpenAI-compatible surface wrapping any search-augmented LLM (corporate search gateways, LiteLLM + Sonar, vLLM serving a web-augmented model, etc.)

Today there is no way to plug any of these into web_search / web_extract. Users either:

  1. Pay for a second search backend (Exa/Tavily) even when they already have a Sonar / search-augmented subscription, or
  2. Write a custom skill and lose the ergonomics of the first-class web_search tool and web: config block.

This is orthogonal to #10284 ("configurable custom JSON search backend"), which targets providers like 4get / SearXNG that speak a plain JSON search API — not chat completions with embedded citations. The two use different request shapes, response parsers, and auth conventions; they want to coexist as separate backends, not collapse into one.

Proposed Solution

Add a new custom backend under the existing web: config that speaks the OpenAI /chat/completions protocol and extracts structured citations from the response.

Configuration

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar
  custom_api_key: sk-xxx    # optional; falls back to env

Or entirely via env:

CUSTOM_SEARCH_BASE_URL=https://api.perplexity.ai
CUSTOM_SEARCH_MODEL=sonar
CUSTOM_SEARCH_API_KEY=sk-xxx

Behaviour

web_search(query, limit)

  1. POST {base_url}/chat/completions with {"model": custom_model, "messages": [{"role": "user", "content": query}]} and Authorization: Bearer {api_key}.
  2. Parse the response in priority order:
    • search_results[] (Perplexity native shape) → map each item's title / url / snippet|content into the standard result schema, return up to limit.
    • citations[] fallback — accept both plain-string URLs and {title, url, snippet} dicts.
    • Answer text fallback — if neither structured field is present, wrap choices[0].message.content as a single result titled "Search Answer" so the agent still gets useful output.
  3. Return the normalised {success, data: {web: [...]}} shape used by every other backend, so downstream tools (web_search_tool) don't branch on backend.

web_extract(urls)

Reuse the same chat endpoint: send one message per URL asking the model to fetch + summarise the page into markdown. This works because search-augmented models (Sonar, browsing-enabled ChatGPT) can fetch URLs directly. Each URL becomes one document in the standard web_extract output shape (url, title, content, raw_content, metadata), with per-URL error isolation.

Integration touchpoints (same surface as existing backends)

  • _get_backend() — add "custom" to the allowed set and to env-var auto-detection (CUSTOM_SEARCH_API_KEY).
  • _is_backend_available("custom") — checks env or web.custom_api_key.
  • _web_requires_env() — include CUSTOM_SEARCH_API_KEY so dependency checks surface it.
  • check_web_api_key() — add "custom" to both the configured-backend branch and the auto-detect fallback.
  • get_debug_session_info() — show custom backend with base URL + model for debuggability.
  • hermes_cli/tools_config.py interactive setup — add a "Custom (OpenAI-compatible)" entry alongside Exa/Tavily/Firecrawl/Parallel, prompting for CUSTOM_SEARCH_API_KEY and the custom_base_url / custom_model extra config. (This also motivates a small generic extra_config mechanism in _configure_provider so future backends with non-env config don't each need a bespoke block.)

Why this doesn't belong as a skill

  • web_search / web_extract are first-class tools invoked by every agent and every platform toolset. A skill would be invisible to agents that don't explicitly load it.
  • The web: config block is already the canonical place users expect to configure search. Adding a skill would fragment the mental model.
  • Every other search backend lives in tools/web_tools.py — this belongs there too, next to Exa / Tavily / Firecrawl / Parallel.

Alternatives Considered

  1. Collapse with #10284 under a single custom backend with a type: field (type: json vs type: llm_chat). Rejected: the two have nothing in common beyond the word "custom" — different auth, different request body, different response parser, different extract semantics. A single code path with a big if type == ... would be harder to maintain than two independent backends.
  2. Only expose Perplexity explicitly (e.g. backend: perplexity). Rejected: the value is precisely that any OpenAI-compatible model with search works — ChatGPT browsing, self-hosted wrappers, corporate gateways. Hardcoding Perplexity loses that generality.
  3. Keep it as a skill. Rejected per above.

Additional Context

  • Draft implementation: PR #8858 (feat: add custom OpenAI-compatible search backend). The PR does not Fixes this issue so they can be reviewed / merged independently; the issue documents the design and motivation separately from the patch.
  • Related but separate feature requests:
    • #10284 — custom JSON search backend (4get / SearXNG shape)
    • #10644 — native Brave Search backend

Happy to iterate on config shape, field priority order, or extract semantics in review.

extent analysis

TL;DR

To support OpenAI-compatible chat completions endpoints in Hermes's web_search, add a new custom backend that speaks the OpenAI /chat/completions protocol and extracts structured citations from the response.

Guidance

  • Implement a new custom backend under the existing web: config that handles the OpenAI /chat/completions protocol.
  • Define configuration options for custom_base_url, custom_model, and custom_api_key to support various OpenAI-compatible providers.
  • Update the web_search and web_extract functions to work with the new custom backend, including parsing responses and extracting citations.
  • Integrate the new backend with existing tools and config mechanisms, such as _get_backend(), _is_backend_available(), and hermes_cli/tools_config.py.

Example

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar
  custom_api_key: sk-xxx

Notes

The proposed solution aims to add support for OpenAI-compatible chat completions endpoints, which is orthogonal to the existing custom JSON search backend. The new backend will coexist with the existing backends, providing a flexible solution for users with different search provider needs.

Recommendation

Apply the proposed workaround by implementing the new custom backend and configuring it to work with OpenAI-compatible providers. This will provide a more flexible and user-friendly solution for search functionality in Hermes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING