hermes - 💡(How to fix) Fix Bug: web-search-grounding skill routes models to browser/curl SERP scraping instead of the registered web_search tool

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

skills/research/web-search-grounding/SKILL.md:

  1. Frontmatter description says "use configured search providers when requested" — softer than "always." The skill router consults this description to decide whether the skill is relevant; the conditional framing weakens the policy at the load stage.

  2. Step 1 of Core Workflow: "Use an actual search provider/tool first when the user explicitly asks for web_search or search results." Reads as conditional ("only when the user names the tool by name") rather than imperative ("always for any web-search-shaped request").

  3. Step 4: "If a public search page presents bot challenges, do not burn time on CAPTCHA interaction. Switch strategy: use another search backend, the site's own search/category pages, or the configured Hermes web search provider." Mentions "the normal browser/search path" elsewhere — implies browser navigation to search engines is the default and the registered tool is one of several fallbacks.

  4. "Provider/tool fallback pattern" section: "Prefer the configured Hermes web_search provider if available." Frames the registered tool as a fallback ("Prefer ... if available") rather than the primary action. Models interpret this as "use it only after the normal path fails."

The skill predates the maturation of the plugin-based web_search provider system. When it was written, browser SERP scraping was a more legitimate path.

Code Example

📚 skill     web-search-grounding
🌐 navigate  www.google.com
🌐 navigate  duckduckgo.com
🌐 navigate  www.bing.com
📸 snapshot  full
💻 $ python - <<PY ... requests.get('https://html.duckduckgo.com/html/?q=...') PY
💻 $ curl -L -A 'Mozilla/5.0' 'https://www.bing.com/search?q=...'
🌐 navigate  target-site.com
...
RAW_BUFFERClick to expand / collapse

Bug Description

The bundled skills/research/web-search-grounding/SKILL.md steers models toward browser-based SERP scraping (navigating to google.com / duckduckgo.com / bing.com and parsing result HTML) instead of calling the registered web_search tool. The skill text frames the registered tool as a fallback rather than the primary action.

This affects every web_search backend (Firecrawl, Tavily, Exa, Parallel, SearXNG, DDGS, Brave-Free, xAI). The skill is provider-agnostic; the misdirection happens regardless of which provider is configured.

Steps to Reproduce

  1. Configure any web_search backend with a valid key (e.g. TAVILY_API_KEY).

  2. Start a Hermes session.

  3. Ask:

    Use web_search to find pages on example.com about widgets. Return the top 5 results.

Expected Behavior

Model calls web_search first. One tool call, results returned.

Actual Behavior (observed across multiple sessions)

📚 skill     web-search-grounding
🌐 navigate  www.google.com
🌐 navigate  duckduckgo.com
🌐 navigate  www.bing.com
📸 snapshot  full
💻 $ python - <<PY ... requests.get('https://html.duckduckgo.com/html/?q=...') PY
💻 $ curl -L -A 'Mozilla/5.0' 'https://www.bing.com/search?q=...'
🌐 navigate  target-site.com
...

Models burn 60-120 seconds on browser SERP pages (where CAPTCHA challenges interrupt), ad-hoc requests / urllib / curl scrapers, and on-site search probing before either giving up or falling back to the registered web_search tool. Provider-side budget tracking and credit attribution is bypassed entirely.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Root Cause Analysis

skills/research/web-search-grounding/SKILL.md:

  1. Frontmatter description says "use configured search providers when requested" — softer than "always." The skill router consults this description to decide whether the skill is relevant; the conditional framing weakens the policy at the load stage.

  2. Step 1 of Core Workflow: "Use an actual search provider/tool first when the user explicitly asks for web_search or search results." Reads as conditional ("only when the user names the tool by name") rather than imperative ("always for any web-search-shaped request").

  3. Step 4: "If a public search page presents bot challenges, do not burn time on CAPTCHA interaction. Switch strategy: use another search backend, the site's own search/category pages, or the configured Hermes web search provider." Mentions "the normal browser/search path" elsewhere — implies browser navigation to search engines is the default and the registered tool is one of several fallbacks.

  4. "Provider/tool fallback pattern" section: "Prefer the configured Hermes web_search provider if available." Frames the registered tool as a fallback ("Prefer ... if available") rather than the primary action. Models interpret this as "use it only after the normal path fails."

The skill predates the maturation of the plugin-based web_search provider system. When it was written, browser SERP scraping was a more legitimate path.

Impact

  • CAPTCHA waste on every web-search-shaped request.
  • Provider-side budget tracking bypassed (credits don't tick down when models scrape SERPs directly).
  • Ad-hoc requests / urllib scrapers that handle redirects, encoding, and rate-limiting inconsistently.
  • Time waste — 60-120 seconds per turn on browser pages.

Proposed Fix

Rewrite the skill markdown to make the registered web_search tool the imperative first action:

  • Frontmatter description leads with "call the registered web_search tool directly" and explicitly forbids browser-SERP navigation and curl/urllib-SERP scraping. This is what the skill router reads to decide relevance — the line directly shapes model behavior.
  • Step 1 becomes imperative: "Call the web_search tool. That is the action." Followed by explicit prohibitions on the observed failure modes.
  • Fallback hierarchy inverted: registered tool is primary; browser navigation only to the target site itself after web_search errors or returns empty for both exact and canonical-domain queries.
  • New Anti-patterns section enumerates the exact failure modes (navigating to google/ddg/bing, SERP scraping with requests/urllib/curl, treating web_search as last resort).

The fallback to the target site's own search/category pages via the browser is preserved as a legitimate escalation when web_search itself returns empty — this matters for sites where the canonical-domain redirect logic alone isn't enough.

A full draft of the rewrite is ready locally; happy to submit as a PR.

Notes

  • Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the misdirection happens on every backend equally. The skill text doesn't reference any specific provider.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING