hermes - ✅(Solved) Fix [Feature]: Custom OpenAI-compatible chat completions backend for web_search (Perplexity Sonar / ChatGPT browsing / etc.) [1 pull requests, 1 comments, 1 participants]

yuanqingz · 2026-04-20T04:55:10Z

[hermes] PR 8858: feat: add custom OpenAI-compatible search backend - Repository: NousResearch/hermes-agent - Author: yuanqingz - State: open | merged: False -… # PR #8858: feat: add custom OpenAI-compatible search backend - Repository: NousResearch/hermes-agent - Author: yuanqingz - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/8858 ## Description (problem / solution / changelog) Add a new custom web search backend that works with any OpenAI-compatible chat completions endpoint with built-in web search (e.g. Perplexity Sonar, ChatGPT with browsing). Configuration via config.yaml: web: backend: custom custom_base_url: https://api.example.com/v1 custom_model: model-name custom_api_key: sk-xxx Or via environment variables: CUSTOM_SEARCH_BASE_URL, CUSTOM_SEARCH_MODEL, CUSTOM_SEARCH_API_KEY The backend extracts structured search results from the model citations/search_results fields, falling back to the answer text. Also supports web_extract via chat completion. ## What does this PR do? Adds a fifth `web_search` / `web_extract` backend — `custom` — for any OpenAI-compatible `/chat/completions` endpoint that returns search citations inline. Concrete targets: Perplexity Sonar, ChatGPT with browsing, self-hosted LLMs with web access, LiteLLM or vLLM wrapping a search-augmented model. Why this belongs in `tools/web_tools.py` next to Exa / Tavily / Firecrawl / Parallel rather than as a skill: - `web_search` / `web_extract` are first-class tools invoked by every agent and every platform toolset — a skill would be invisible unless explicitly loaded. - The `web:` config block is already the canonical place users expect to configure search; adding a skill would fragment the mental model. - The response-parsing logic (search_results[] → citations[] → answer text) is proxy-shape normalization, which is the same responsibility the other backends already discharge inside `web_tools.py`. This is deliberately distinct from #10284 / #10414 (a custom _JSON_ backend for 4get / SearXNG) — that path uses `GET` with `url_template` + `results_path` field mapping, this path uses `POST /chat/completions` with citation extraction. Different request shape, different response parser, different extract semantics; they want to coexist. ## Related Issue #12832 ## Type of Change - [ ] 🐛 Bug fix (non-breaking change that fixes an issue) - [x] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made **`tools/web_tools.py`** (+219/-4) — new Custom Chat Completions Backend section with: - `_get_custom_base_url()`, `_get_custom_model()`, `_custom_headers()` — resolution order is `CUSTOM_SEARCH_*` env → `web.custom_*` config → (model only) default `sonar`. - `_custom_chat(prompt)` — single `httpx.post` to `{base_url}/chat/completions` with OpenAI-shaped payload. - `_custom_search(query, limit)` — three-tier response normalization: `search_results[]` (Perplexity native) → `citations[]` (string or dict) → `choices[0].message.content` as `"Search Answer"`. Always returns the same `{success, data: {web: [...]}}` shape as the other backends so `web_search_tool` doesn't branch downstream. - `_custom_extract(urls)` — one chat call per URL with per-URL exception isolation; returns the same document shape (`url, title, content, raw_content, metadata`) as Firecrawl/Tavily/Parallel extract. - `_get_backend()`, `_is_backend_available()`, `_web_requires_env()`, `check_web_api_key()`, `get_debug_session_info()`, `web_search_tool()`, `web_extract_tool()` — threaded `"custom"` through every backend-aware dispatch point. `_is_backend_available("custom")` uniquely accepts either `CUSTOM_SEARCH_API_KEY` env or `web.custom_api_key` config, matching the helper resolution order. **`hermes_cli/tools_config.py`** (+33/-0) — adds a "Custom (OpenAI-compatible)" entry to the web-backend provider list in `hermes tools`, plus a small generic `extra_config` mechanism in `_configure_provider()` that prompts for `custom_base_url` / `custom_model` (with `sonar` as the default) and writes them under `web:`. The mechanism is generic so future backends with non-env config can reuse it without a bespoke block. **`tests/tools/test_web_tools_config.py`** (+363) — 20 new tests across three areas: 1. `TestBackendSelection` parity (4 tests): config → `"custom"`, case-insensitive, env-only fallback, priority ordering when Firecrawl also has a key. 2. `TestCheckWebApiKey` parity (3 tests): env-only, `backend: custom` configured but no key returns False, `web.custom_api_key` via config only still returns True. 3. Custom-backend-specific classes (13 tests): `TestCustomBackendHelpers` covers env vs config priority, trailing-slash stripping, default model, missing-config ValueError, Bearer auth construction. `TestCustomSearch` covers all three response-shape pat

hermes2026-04-20 04:55:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#12832•Fetched 2026-04-20 12:16:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

yuanqingz

Participants

yuanqingz

Timeline (top)

commented ×1cross-referenced ×1

Error Message

Reuse the same chat endpoint: send one message per URL asking the model to fetch + summarise the page into markdown. This works because search-augmented models (Sonar, browsing-enabled ChatGPT) can fetch URLs directly. Each URL becomes one document in the standard web_extract output shape (url, title, content, raw_content, metadata), with per-URL error isolation.

Root Cause

Fix Action

Fix / Workaround

Draft implementation: PR #8858 (feat: add custom OpenAI-compatible search backend). The PR does not Fixes this issue so they can be reviewed / merged independently; the issue documents the design and motivation separately from the patch.
Related but separate feature requests:
- #10284 — custom JSON search backend (4get / SearXNG shape)
- #10644 — native Brave Search backend

PR fix notes

PR #8858: feat: add custom OpenAI-compatible search backend

Repository: NousResearch/hermes-agent
Author: yuanqingz
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/8858

Description (problem / solution / changelog)

Add a new custom web search backend that works with any OpenAI-compatible chat completions endpoint with built-in web search (e.g. Perplexity Sonar, ChatGPT with browsing).

Configuration via config.yaml: web: backend: custom custom_base_url: https://api.example.com/v1 custom_model: model-name custom_api_key: sk-xxx

Or via environment variables: CUSTOM_SEARCH_BASE_URL, CUSTOM_SEARCH_MODEL, CUSTOM_SEARCH_API_KEY

The backend extracts structured search results from the model citations/search_results fields, falling back to the answer text. Also supports web_extract via chat completion.

What does this PR do?

Adds a fifth web_search / web_extract backend — custom — for any OpenAI-compatible /chat/completions endpoint that returns search citations inline. Concrete targets: Perplexity Sonar, ChatGPT with browsing, self-hosted LLMs with web access, LiteLLM or vLLM wrapping a search-augmented model.

Why this belongs in tools/web_tools.py next to Exa / Tavily / Firecrawl / Parallel rather than as a skill:

web_search / web_extract are first-class tools invoked by every agent and every platform toolset — a skill would be invisible unless explicitly loaded.
The web: config block is already the canonical place users expect to configure search; adding a skill would fragment the mental model.
The response-parsing logic (search_results[] → citations[] → answer text) is proxy-shape normalization, which is the same responsibility the other backends already discharge inside web_tools.py.

This is deliberately distinct from #10284 / #10414 (a custom JSON backend for 4get / SearXNG) — that path uses GET with url_template + results_path field mapping, this path uses POST /chat/completions with citation extraction. Different request shape, different response parser, different extract semantics; they want to coexist.

Related Issue

#12832

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

tools/web_tools.py (+219/-4) — new Custom Chat Completions Backend section with:

_get_custom_base_url(), _get_custom_model(), _custom_headers() — resolution order is CUSTOM_SEARCH_* env → web.custom_* config → (model only) default sonar.
_custom_chat(prompt) — single httpx.post to {base_url}/chat/completions with OpenAI-shaped payload.
_custom_search(query, limit) — three-tier response normalization: search_results[] (Perplexity native) → citations[] (string or dict) → choices[0].message.content as "Search Answer". Always returns the same {success, data: {web: [...]}} shape as the other backends so web_search_tool doesn't branch downstream.
_custom_extract(urls) — one chat call per URL with per-URL exception isolation; returns the same document shape (url, title, content, raw_content, metadata) as Firecrawl/Tavily/Parallel extract.
_get_backend(), _is_backend_available(), _web_requires_env(), check_web_api_key(), get_debug_session_info(), web_search_tool(), web_extract_tool() — threaded "custom" through every backend-aware dispatch point. _is_backend_available("custom") uniquely accepts either CUSTOM_SEARCH_API_KEY env or web.custom_api_key config, matching the helper resolution order.

hermes_cli/tools_config.py (+33/-0) — adds a "Custom (OpenAI-compatible)" entry to the web-backend provider list in hermes tools, plus a small generic extra_config mechanism in _configure_provider() that prompts for custom_base_url / custom_model (with sonar as the default) and writes them under web:. The mechanism is generic so future backends with non-env config can reuse it without a bespoke block.

tests/tools/test_web_tools_config.py (+363) — 20 new tests across three areas:

TestBackendSelection parity (4 tests): config → "custom", case-insensitive, env-only fallback, priority ordering when Firecrawl also has a key.
TestCheckWebApiKey parity (3 tests): env-only, backend: custom configured but no key returns False, web.custom_api_key via config only still returns True.
Custom-backend-specific classes (13 tests): TestCustomBackendHelpers covers env vs config priority, trailing-slash stripping, default model, missing-config ValueError, Bearer auth construction. TestCustomSearch covers all three response-shape paths and their priority, limit enforcement, and the empty-response branch. TestCustomExtract covers multi-URL success, per-URL exception isolation, and empty-list short-circuit.

Also extended two existing _ENV_KEYS tuples to include CUSTOM_SEARCH_API_KEY / CUSTOM_SEARCH_BASE_URL / CUSTOM_SEARCH_MODEL so setup_method cleans them — without this, Custom env vars set in one test could leak into another.

website/docs/user-guide/configuration.md, website/docs/integrations/index.md, website/docs/reference/tools-reference.md — added the fifth backend to every enumerating table / comment / env-var list. The configuration guide gets a dedicated subsection explaining the resolution order and the extraction priority.

How to Test

Configure Hermes against a real Perplexity Sonar account:

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar

# ~/.hermes/.env
CUSTOM_SEARCH_API_KEY=<your-perplexity-api-key>

Verify the debug banner picks it up:

hermes doctor
# → "Using custom backend: https://api.perplexity.ai (model: sonar)"

Exercise both tools end-to-end:
```
hermes -q "Use web_search to find the latest llama.cpp release notes, then web_extract the top result."
```
Expected: agent gets normalized {title, url, description, position} items from search_results[] (no Search Answer fallback, no empty list). web_extract returns a markdown document per URL with per-URL error isolation when a URL is unreachable.

Run the test suite for the touched area:

cd ~/.hermes/hermes-agent && source venv/bin/activate
pytest tests/tools/test_web_tools_config.py tests/tools/test_web_tools_tavily.py -q
# → 93 passed (78 in test_web_tools_config.py, 15 in test_web_tools_tavily.py)

For contributors without a Sonar account: the added unit tests patch _custom_chat directly, so no real network call or key is needed to verify the response-shape contracts.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: Ubuntu 24.04

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — updated configuration.md, integrations/index.md, tools-reference.md
I've updated cli-config.yaml.example if I added/changed config keys — N/A (the example file does not document the web: block, so there's no existing shape to extend)
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A (no architecture change)
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (pure httpx.post + os.getenv + string processing, no termios/fcntl/os.setsid/path separators/subprocess)
I've updated tool descriptions/schemas if I changed tool behavior — the module-level docstring enumerates Custom alongside Exa/Tavily/Firecrawl/Parallel; web_search / web_extract schemas are unchanged (backend is transparent to the tool contract)

Screenshots / Logs

$ hermes doctor | head -20
...
✅ Web Search & Extract: configured
   Using custom backend: https://api.perplexity.ai (model: sonar)
...

$ pytest tests/tools/test_web_tools_config.py -q
78 passed in 2.18s

Changed files

hermes_cli/tools_config.py (modified, +33/-0)
tests/tools/test_web_tools_config.py (modified, +363/-0)
tools/web_tools.py (modified, +219/-4)
website/docs/integrations/index.md (modified, +4/-3)
website/docs/reference/tools-reference.md (modified, +2/-2)
website/docs/user-guide/configuration.md (modified, +15/-2)

Code Example

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar
  custom_api_key: sk-xxx    # optional; falls back to env

---

CUSTOM_SEARCH_BASE_URL=https://api.perplexity.ai
CUSTOM_SEARCH_MODEL=sonar
CUSTOM_SEARCH_API_KEY=sk-xxx

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes's web_search currently only supports a fixed set of backends: Exa, Parallel, Firecrawl, Tavily. Each is a dedicated search API with its own SDK / REST shape.

A large and growing class of search providers don't fit that shape — they're OpenAI-compatible chat completions endpoints where the model itself performs web search and returns citations alongside its answer. Examples:

Perplexity Sonar — POST /chat/completions returns choices[].message.content plus search_results[] and citations[]
ChatGPT / OpenAI models with the browsing tool enabled — same chat-completions envelope, citations returned inline
Self-hosted / proxy deployments that expose an OpenAI-compatible surface wrapping any search-augmented LLM (corporate search gateways, LiteLLM + Sonar, vLLM serving a web-augmented model, etc.)

Today there is no way to plug any of these into web_search / web_extract. Users either:

Pay for a second search backend (Exa/Tavily) even when they already have a Sonar / search-augmented subscription, or
Write a custom skill and lose the ergonomics of the first-class web_search tool and web: config block.

This is orthogonal to #10284 ("configurable custom JSON search backend"), which targets providers like 4get / SearXNG that speak a plain JSON search API — not chat completions with embedded citations. The two use different request shapes, response parsers, and auth conventions; they want to coexist as separate backends, not collapse into one.

Proposed Solution

Add a new custom backend under the existing web: config that speaks the OpenAI /chat/completions protocol and extracts structured citations from the response.

Configuration

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar
  custom_api_key: sk-xxx    # optional; falls back to env

Or entirely via env:

CUSTOM_SEARCH_BASE_URL=https://api.perplexity.ai
CUSTOM_SEARCH_MODEL=sonar
CUSTOM_SEARCH_API_KEY=sk-xxx

Behaviour

web_search(query, limit)

POST {base_url}/chat/completions with {"model": custom_model, "messages": [{"role": "user", "content": query}]} and Authorization: Bearer {api_key}.
Parse the response in priority order:
- search_results[] (Perplexity native shape) → map each item's title / url / snippet|content into the standard result schema, return up to limit.
- citations[] fallback — accept both plain-string URLs and {title, url, snippet} dicts.
- Answer text fallback — if neither structured field is present, wrap choices[0].message.content as a single result titled "Search Answer" so the agent still gets useful output.
Return the normalised {success, data: {web: [...]}} shape used by every other backend, so downstream tools (web_search_tool) don't branch on backend.

web_extract(urls)

Integration touchpoints (same surface as existing backends)

_get_backend() — add "custom" to the allowed set and to env-var auto-detection (CUSTOM_SEARCH_API_KEY).
_is_backend_available("custom") — checks env or web.custom_api_key.
_web_requires_env() — include CUSTOM_SEARCH_API_KEY so dependency checks surface it.
check_web_api_key() — add "custom" to both the configured-backend branch and the auto-detect fallback.
get_debug_session_info() — show custom backend with base URL + model for debuggability.
hermes_cli/tools_config.py interactive setup — add a "Custom (OpenAI-compatible)" entry alongside Exa/Tavily/Firecrawl/Parallel, prompting for CUSTOM_SEARCH_API_KEY and the custom_base_url / custom_model extra config. (This also motivates a small generic extra_config mechanism in _configure_provider so future backends with non-env config don't each need a bespoke block.)

Why this doesn't belong as a skill

web_search / web_extract are first-class tools invoked by every agent and every platform toolset. A skill would be invisible to agents that don't explicitly load it.
The web: config block is already the canonical place users expect to configure search. Adding a skill would fragment the mental model.
Every other search backend lives in tools/web_tools.py — this belongs there too, next to Exa / Tavily / Firecrawl / Parallel.

Alternatives Considered

Collapse with #10284 under a single custom backend with a type: field (type: json vs type: llm_chat). Rejected: the two have nothing in common beyond the word "custom" — different auth, different request body, different response parser, different extract semantics. A single code path with a big if type == ... would be harder to maintain than two independent backends.
Only expose Perplexity explicitly (e.g. backend: perplexity). Rejected: the value is precisely that any OpenAI-compatible model with search works — ChatGPT browsing, self-hosted wrappers, corporate gateways. Hardcoding Perplexity loses that generality.
Keep it as a skill. Rejected per above.

Additional Context

Draft implementation: PR #8858 (feat: add custom OpenAI-compatible search backend). The PR does not Fixes this issue so they can be reviewed / merged independently; the issue documents the design and motivation separately from the patch.
Related but separate feature requests:
- #10284 — custom JSON search backend (4get / SearXNG shape)
- #10644 — native Brave Search backend

Happy to iterate on config shape, field priority order, or extract semantics in review.

extent analysis

TL;DR

To support OpenAI-compatible chat completions endpoints in Hermes's web_search, add a new custom backend that speaks the OpenAI /chat/completions protocol and extracts structured citations from the response.

Guidance

Implement a new custom backend under the existing web: config that handles the OpenAI /chat/completions protocol.
Define configuration options for custom_base_url, custom_model, and custom_api_key to support various OpenAI-compatible providers.
Update the web_search and web_extract functions to work with the new custom backend, including parsing responses and extracting citations.
Integrate the new backend with existing tools and config mechanisms, such as _get_backend(), _is_backend_available(), and hermes_cli/tools_config.py.

Example

# ~/.hermes/config.yaml
web:
  backend: custom
  custom_base_url: https://api.perplexity.ai
  custom_model: sonar
  custom_api_key: sk-xxx

Notes

The proposed solution aims to add support for OpenAI-compatible chat completions endpoints, which is orthogonal to the existing custom JSON search backend. The new backend will coexist with the existing backends, providing a flexible solution for users with different search provider needs.

Recommendation

Apply the proposed workaround by implementing the new custom backend and configuring it to work with OpenAI-compatible providers. This will provide a more flexible and user-friendly solution for search functionality in Hermes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #retrieval issue #search optimization #API routing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.