hermes - 💡(How to fix) Fix Bug: web_crawl is implemented but never registered as a model-callable tool

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

tools/web_tools.py registers two web tools:

  • line 1540: registry.register(name="web_search", ...)
  • line 1550: registry.register(name="web_extract", ...)

There is no third registry.register(name="web_crawl", ...) call. WEB_CRAWL_SCHEMA doesn't exist next to WEB_SEARCH_SCHEMA and WEB_EXTRACT_SCHEMA.

In toolsets.py:

  • line 33: "web_search", "web_extract", in _HERMES_CORE_TOOLS — no web_crawl.
  • line 92: TOOLSETS["web"]["tools"] is ["web_search", "web_extract"] — no web_crawl.
  • line 344: TOOLSETS["hermes-acp"]["tools"] lists web_search, web_extract — no web_crawl.

The dispatcher and provider infrastructure is fully built out (web_search_registry.get_active_crawl_provider() works correctly), but the tool surface is missing.

Fix Action

Fix / Workaround

Model calls web_crawl, the dispatcher routes through the configured provider's crawl() method, results are returned.

The dispatcher and provider infrastructure is fully built out (web_search_registry.get_active_crawl_provider() works correctly), but the tool surface is missing.

  • Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the bug pre-exists Oxylabs and affects every crawl-capable provider equally (Firecrawl, Tavily).
  • Verified end-to-end locally: with the three changes applied, web_crawl becomes model-callable, dispatches through the registered provider, and returns crawl results. No regression in web_search / web_extract.
  • Happy to submit a PR.

Code Example

WEB_CRAWL_SCHEMA = {
       "name": "web_crawl",
       "description": (
           "Crawl a website starting from a seed URL with natural-language "
           "instructions for what content to extract from each crawled page. "
           "Returns one entry per crawled page with url, title, and content "
           "(markdown). Use this for multi-page traversal of a single site "
           "(catalog/docs/blog); for single-URL extraction use web_extract."
       ),
       "parameters": {
           "type": "object",
           "properties": {
               "url": {
                   "type": "string",
                   "description": (
                       "Seed URL to crawl from. The crawler will follow "
                       "links from this page."
                   ),
               },
               "instructions": {
                   "type": "string",
                   "description": (
                       "Natural-language description of what to extract "
                       "from each crawled page (e.g. 'list all product "
                       "pages with name and price')."
                   ),
               },
           },
           "required": ["url"],
       },
   }

---

registry.register(
       name="web_crawl",
       toolset="web",
       schema=WEB_CRAWL_SCHEMA,
       handler=lambda args, **kw: web_crawl_tool(
           args.get("url", ""),
           instructions=args.get("instructions"),
       ),
       check_fn=check_web_api_key,
       requires_env=_web_requires_env(),
       is_async=True,
       emoji="🕸️",
       max_result_size_chars=100_000,
   )
RAW_BUFFERClick to expand / collapse

Bug Description

web_crawl_tool() is defined in tools/web_tools.py (~line 1133) as an async Python function and the registry resolves an active crawl provider correctly via agent.web_search_registry.get_active_crawl_provider(). But no registry.register(name="web_crawl", ...) call exists in the file. web_search and web_extract are registered at lines 1540 and 1550; web_crawl is not registered anywhere.

toolsets.py _HERMES_CORE_TOOLS (line 33) also omits "web_crawl", so even if a schema were added the core tool list wouldn't surface it.

The net effect: every crawl-capable provider (Firecrawl, Tavily, and any future plugin advertising supports_crawl()) is unreachable from the model. Crawl is only callable programmatically from Python.

Steps to Reproduce

  1. Configure a crawl-capable backend with a valid key (e.g. Firecrawl with FIRECRAWL_API_KEY).

  2. Start a Hermes session.

  3. Ask:

    Use web_crawl on https://example.com with instructions "list product pages with name and price"

Expected Behavior

Model calls web_crawl, the dispatcher routes through the configured provider's crawl() method, results are returned.

Actual Behavior

Model has no web_crawl tool in its available toolset. It improvises: terminal with curl/requests, browser_navigate, or falls back to web_extract on a guessed category URL. Provider-side crawl is never invoked, no credits are spent at the crawl provider.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Root Cause Analysis

tools/web_tools.py registers two web tools:

  • line 1540: registry.register(name="web_search", ...)
  • line 1550: registry.register(name="web_extract", ...)

There is no third registry.register(name="web_crawl", ...) call. WEB_CRAWL_SCHEMA doesn't exist next to WEB_SEARCH_SCHEMA and WEB_EXTRACT_SCHEMA.

In toolsets.py:

  • line 33: "web_search", "web_extract", in _HERMES_CORE_TOOLS — no web_crawl.
  • line 92: TOOLSETS["web"]["tools"] is ["web_search", "web_extract"] — no web_crawl.
  • line 344: TOOLSETS["hermes-acp"]["tools"] lists web_search, web_extract — no web_crawl.

The dispatcher and provider infrastructure is fully built out (web_search_registry.get_active_crawl_provider() works correctly), but the tool surface is missing.

Proposed Fix

Strictly additive. Three changes:

  1. Add WEB_CRAWL_SCHEMA next to the existing schemas. Minimal shape — url (required), instructions (optional):

    WEB_CRAWL_SCHEMA = {
        "name": "web_crawl",
        "description": (
            "Crawl a website starting from a seed URL with natural-language "
            "instructions for what content to extract from each crawled page. "
            "Returns one entry per crawled page with url, title, and content "
            "(markdown). Use this for multi-page traversal of a single site "
            "(catalog/docs/blog); for single-URL extraction use web_extract."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": (
                        "Seed URL to crawl from. The crawler will follow "
                        "links from this page."
                    ),
                },
                "instructions": {
                    "type": "string",
                    "description": (
                        "Natural-language description of what to extract "
                        "from each crawled page (e.g. 'list all product "
                        "pages with name and price')."
                    ),
                },
            },
            "required": ["url"],
        },
    }

    This mirrors how WEB_EXTRACT_SCHEMA hides post-processing toggles. Other web_crawl_tool() params (depth, use_llm_processing, model, min_length) have sensible defaults and stay internal.

  2. Add a matching registration after the web_extract block:

    registry.register(
        name="web_crawl",
        toolset="web",
        schema=WEB_CRAWL_SCHEMA,
        handler=lambda args, **kw: web_crawl_tool(
            args.get("url", ""),
            instructions=args.get("instructions"),
        ),
        check_fn=check_web_api_key,
        requires_env=_web_requires_env(),
        is_async=True,
        emoji="🕸️",
        max_result_size_chars=100_000,
    )
  3. Add "web_crawl" to _HERMES_CORE_TOOLS, TOOLSETS["web"]["tools"], and TOOLSETS["hermes-acp"]["tools"] in toolsets.py.

Notes

  • Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the bug pre-exists Oxylabs and affects every crawl-capable provider equally (Firecrawl, Tavily).
  • Verified end-to-end locally: with the three changes applied, web_crawl becomes model-callable, dispatches through the registered provider, and returns crawl results. No regression in web_search / web_extract.
  • Happy to submit a PR.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING