hermes - 💡(How to fix) Fix Bug: web-search-grounding skill routes models to browser/curl SERP scraping instead of the registered web

StepCodex · 2026-05-26T12:36:06Z

[hermes] Bug Description The bundled skills/research/web-search-grounding/SKILL.md steers models toward browser-based SERP scraping navigating to google.com /… ### Bug Description The bundled `skills/research/web-search-grounding/SKILL.md` steers models toward browser-based SERP scraping (navigating to google.com / duckduckgo.com / bing.com and parsing result HTML) instead of calling the registered `web_search` tool. The skill text frames the registered tool as a *fallback* rather than the primary action. This affects every `web_search` backend (Firecrawl, Tavily, Exa, Parallel, SearXNG, DDGS, Brave-Free, xAI). The skill is provider-agnostic; the misdirection happens regardless of which provider is configured. ### Steps to Reproduce 1. Configure any `web_search` backend with a valid key (e.g. `TAVILY_API_KEY`). 2. Start a Hermes session. 3. Ask: > Use web_search to find pages on example.com about widgets. Return the top 5 results. ### Expected Behavior Model calls `web_search` first. One tool call, results returned. ### Actual Behavior (observed across multiple sessions) ``` 📚 skill web-search-grounding 🌐 navigate www.google.com 🌐 navigate duckduckgo.com 🌐 navigate www.bing.com 📸 snapshot full 💻 $ python - <<PY ... requests.get('https://html.duckduckgo.com/html/?q=...') PY 💻 $ curl -L -A 'Mozilla/5.0' 'https://www.bing.com/search?q=...' 🌐 navigate target-site.com ... ``` Models burn 60-120 seconds on browser SERP pages (where CAPTCHA challenges interrupt), ad-hoc `requests` / `urllib` / `curl` scrapers, and on-site search probing before either giving up or falling back to the registered `web_search` tool. Provider-side budget tracking and credit attribution is bypassed entirely. ### Affected Component Tools (terminal, file ops, web, code execution, etc.) ### Root Cause Analysis `skills/research/web-search-grounding/SKILL.md`: 1. **Frontmatter `description`** says "use configured search providers **when requested**" — softer than "always." The skill router consults this `description` to decide whether the skill is relevant; the conditional framing weakens the policy at the load stage. 2. **Step 1 of Core Workflow:** "Use an actual search provider/tool **first when the user explicitly asks for `web_search`** or search results." Reads as conditional ("only when the user names the tool by name") rather than imperative ("always for any web-search-shaped request"). 3. **Step 4:** "If a public search page presents bot challenges, do not burn time on CAPTCHA interaction. Switch strategy: use another search backend, the site's own search/category pages, or the configured Hermes web search provider." Mentions "the normal browser/search path" elsewhere — implies browser navigation to search engines is the default and the registered tool is one of several fallbacks. 4. **"Provider/tool fallback pattern" section:** "Prefer the configured Hermes `web_search` provider if available." Frames the registered tool as a fallback ("Prefer ... if available") rather than the primary action. Models interpret this as "use it only after the normal path fails." The skill predates the maturation of the plugin-based `web_search` provider system. When it was written, browser SERP scraping was a more legitimate path. ### Impact - CAPTCHA waste on every web-search-shaped request. - Provider-side budget tracking bypassed (credits don't tick down when models scrape SERPs directly). - Ad-hoc `requests` / `urllib` scrapers that handle redirects, encoding, and rate-limiting inconsistently. - Time waste — 60-120 seconds per turn on browser pages. ### Proposed Fix Rewrite the skill markdown to make the registered `web_search` tool the imperative first action: - **Frontmatter `description`** leads with "call the registered `web_search` tool directly" and explicitly forbids browser-SERP navigation and `curl`/`urllib`-SERP scraping. This is what the skill router reads to decide relevance — the line directly shapes model behavior. - **Step 1** becomes imperative: *"Call the `web_search` tool. That is the action."* Followed by explicit prohibitions on the observed failure modes. - **Fallback hierarchy inverted:** registered tool is primary; browser navigation only to the *target site itself* after `web_search` errors or returns empty for both exact and canonical-domain queries. - **New Anti-patterns section** enumerates the exact failure modes (navigating to google/ddg/bing, SERP scraping with `requests`/`urllib`/`curl`, treating `web_search` as last resort). The fallback to the *target site's own search/category pages* via the browser is preserved as a legitimate escalation when `web_search` itself returns empty — this matters for sites where the canonical-domain redirect logic alone isn't enough. A full draft of the rewrite is ready locally; happy to submit as a PR. ### Notes - Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the misdirection happens on every backend equally. The skill text doesn't reference any specif

Root Cause

skills/research/web-search-grounding/SKILL.md:

Frontmatter description says "use configured search providers when requested" — softer than "always." The skill router consults this description to decide whether the skill is relevant; the conditional framing weakens the policy at the load stage.
Step 1 of Core Workflow: "Use an actual search provider/tool first when the user explicitly asks for web_search or search results." Reads as conditional ("only when the user names the tool by name") rather than imperative ("always for any web-search-shaped request").
Step 4: "If a public search page presents bot challenges, do not burn time on CAPTCHA interaction. Switch strategy: use another search backend, the site's own search/category pages, or the configured Hermes web search provider." Mentions "the normal browser/search path" elsewhere — implies browser navigation to search engines is the default and the registered tool is one of several fallbacks.
"Provider/tool fallback pattern" section: "Prefer the configured Hermes web_search provider if available." Frames the registered tool as a fallback ("Prefer ... if available") rather than the primary action. Models interpret this as "use it only after the normal path fails."

The skill predates the maturation of the plugin-based web_search provider system. When it was written, browser SERP scraping was a more legitimate path.

Code Example

📚 skill     web-search-grounding
🌐 navigate  www.google.com
🌐 navigate  duckduckgo.com
🌐 navigate  www.bing.com
📸 snapshot  full
💻 $ python - <<PY ... requests.get('https://html.duckduckgo.com/html/?q=...') PY
💻 $ curl -L -A 'Mozilla/5.0' 'https://www.bing.com/search?q=...'
🌐 navigate  target-site.com
...

Bug Description

The bundled skills/research/web-search-grounding/SKILL.md steers models toward browser-based SERP scraping (navigating to google.com / duckduckgo.com / bing.com and parsing result HTML) instead of calling the registered web_search tool. The skill text frames the registered tool as a fallback rather than the primary action.

This affects every web_search backend (Firecrawl, Tavily, Exa, Parallel, SearXNG, DDGS, Brave-Free, xAI). The skill is provider-agnostic; the misdirection happens regardless of which provider is configured.

Steps to Reproduce

Configure any web_search backend with a valid key (e.g. TAVILY_API_KEY).
Start a Hermes session.
Ask:

Use web_search to find pages on example.com about widgets. Return the top 5 results.

Expected Behavior

Model calls web_search first. One tool call, results returned.

Actual Behavior (observed across multiple sessions)

📚 skill     web-search-grounding
🌐 navigate  www.google.com
🌐 navigate  duckduckgo.com
🌐 navigate  www.bing.com
📸 snapshot  full
💻 $ python - <<PY ... requests.get('https://html.duckduckgo.com/html/?q=...') PY
💻 $ curl -L -A 'Mozilla/5.0' 'https://www.bing.com/search?q=...'
🌐 navigate  target-site.com
...

Models burn 60-120 seconds on browser SERP pages (where CAPTCHA challenges interrupt), ad-hoc requests / urllib / curl scrapers, and on-site search probing before either giving up or falling back to the registered web_search tool. Provider-side budget tracking and credit attribution is bypassed entirely.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Root Cause Analysis

skills/research/web-search-grounding/SKILL.md:

Frontmatter description says "use configured search providers when requested" — softer than "always." The skill router consults this description to decide whether the skill is relevant; the conditional framing weakens the policy at the load stage.
Step 1 of Core Workflow: "Use an actual search provider/tool first when the user explicitly asks for web_search or search results." Reads as conditional ("only when the user names the tool by name") rather than imperative ("always for any web-search-shaped request").
Step 4: "If a public search page presents bot challenges, do not burn time on CAPTCHA interaction. Switch strategy: use another search backend, the site's own search/category pages, or the configured Hermes web search provider." Mentions "the normal browser/search path" elsewhere — implies browser navigation to search engines is the default and the registered tool is one of several fallbacks.
"Provider/tool fallback pattern" section: "Prefer the configured Hermes web_search provider if available." Frames the registered tool as a fallback ("Prefer ... if available") rather than the primary action. Models interpret this as "use it only after the normal path fails."

The skill predates the maturation of the plugin-based web_search provider system. When it was written, browser SERP scraping was a more legitimate path.

Impact

CAPTCHA waste on every web-search-shaped request.
Provider-side budget tracking bypassed (credits don't tick down when models scrape SERPs directly).
Ad-hoc requests / urllib scrapers that handle redirects, encoding, and rate-limiting inconsistently.
Time waste — 60-120 seconds per turn on browser pages.

Proposed Fix

Rewrite the skill markdown to make the registered web_search tool the imperative first action:

Frontmatter description leads with "call the registered web_search tool directly" and explicitly forbids browser-SERP navigation and curl/urllib-SERP scraping. This is what the skill router reads to decide relevance — the line directly shapes model behavior.
Step 1 becomes imperative: "Call the web_search tool. That is the action." Followed by explicit prohibitions on the observed failure modes.
Fallback hierarchy inverted: registered tool is primary; browser navigation only to the target site itself after web_search errors or returns empty for both exact and canonical-domain queries.
New Anti-patterns section enumerates the exact failure modes (navigating to google/ddg/bing, SERP scraping with requests/urllib/curl, treating web_search as last resort).

The fallback to the target site's own search/category pages via the browser is preserved as a legitimate escalation when web_search itself returns empty — this matters for sites where the canonical-domain redirect logic alone isn't enough.

A full draft of the rewrite is ready locally; happy to submit as a PR.

Notes

Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the misdirection happens on every backend equally. The skill text doesn't reference any specific provider.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: web-search-grounding skill routes models to browser/curl SERP scraping instead of the registered web_search tool

Recommended Tools

GitHub issue graph ai analysis