hermes - 💡(How to fix) Fix Bug: web_crawl is implemented but never registered as a model-callable tool

hermes2026-05-26 12:35:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

tools/web_tools.py registers two web tools:

line 1540: registry.register(name="web_search", ...)
line 1550: registry.register(name="web_extract", ...)

There is no third registry.register(name="web_crawl", ...) call. WEB_CRAWL_SCHEMA doesn't exist next to WEB_SEARCH_SCHEMA and WEB_EXTRACT_SCHEMA.

In toolsets.py:

line 33: "web_search", "web_extract", in _HERMES_CORE_TOOLS — no web_crawl.
line 92: TOOLSETS["web"]["tools"] is ["web_search", "web_extract"] — no web_crawl.
line 344: TOOLSETS["hermes-acp"]["tools"] lists web_search, web_extract — no web_crawl.

The dispatcher and provider infrastructure is fully built out (web_search_registry.get_active_crawl_provider() works correctly), but the tool surface is missing.

Fix Action

Fix / Workaround

Model calls web_crawl, the dispatcher routes through the configured provider's crawl() method, results are returned.

The dispatcher and provider infrastructure is fully built out (web_search_registry.get_active_crawl_provider() works correctly), but the tool surface is missing.

Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the bug pre-exists Oxylabs and affects every crawl-capable provider equally (Firecrawl, Tavily).
Verified end-to-end locally: with the three changes applied, web_crawl becomes model-callable, dispatches through the registered provider, and returns crawl results. No regression in web_search / web_extract.
Happy to submit a PR.

Code Example

WEB_CRAWL_SCHEMA = {
       "name": "web_crawl",
       "description": (
           "Crawl a website starting from a seed URL with natural-language "
           "instructions for what content to extract from each crawled page. "
           "Returns one entry per crawled page with url, title, and content "
           "(markdown). Use this for multi-page traversal of a single site "
           "(catalog/docs/blog); for single-URL extraction use web_extract."
       ),
       "parameters": {
           "type": "object",
           "properties": {
               "url": {
                   "type": "string",
                   "description": (
                       "Seed URL to crawl from. The crawler will follow "
                       "links from this page."
                   ),
               },
               "instructions": {
                   "type": "string",
                   "description": (
                       "Natural-language description of what to extract "
                       "from each crawled page (e.g. 'list all product "
                       "pages with name and price')."
                   ),
               },
           },
           "required": ["url"],
       },
   }

---

registry.register(
       name="web_crawl",
       toolset="web",
       schema=WEB_CRAWL_SCHEMA,
       handler=lambda args, **kw: web_crawl_tool(
           args.get("url", ""),
           instructions=args.get("instructions"),
       ),
       check_fn=check_web_api_key,
       requires_env=_web_requires_env(),
       is_async=True,
       emoji="🕸️",
       max_result_size_chars=100_000,
   )

RAW_BUFFERClick to expand / collapse

Bug Description

web_crawl_tool() is defined in tools/web_tools.py (~line 1133) as an async Python function and the registry resolves an active crawl provider correctly via agent.web_search_registry.get_active_crawl_provider(). But no registry.register(name="web_crawl", ...) call exists in the file. web_search and web_extract are registered at lines 1540 and 1550; web_crawl is not registered anywhere.

toolsets.py _HERMES_CORE_TOOLS (line 33) also omits "web_crawl", so even if a schema were added the core tool list wouldn't surface it.

The net effect: every crawl-capable provider (Firecrawl, Tavily, and any future plugin advertising supports_crawl()) is unreachable from the model. Crawl is only callable programmatically from Python.

Steps to Reproduce

Configure a crawl-capable backend with a valid key (e.g. Firecrawl with FIRECRAWL_API_KEY).
Start a Hermes session.
Ask:

Use web_crawl on https://example.com with instructions "list product pages with name and price"

Expected Behavior

Model calls web_crawl, the dispatcher routes through the configured provider's crawl() method, results are returned.

Actual Behavior

Model has no web_crawl tool in its available toolset. It improvises: terminal with curl/requests, browser_navigate, or falls back to web_extract on a guessed category URL. Provider-side crawl is never invoked, no credits are spent at the crawl provider.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Root Cause Analysis

tools/web_tools.py registers two web tools:

line 1540: registry.register(name="web_search", ...)
line 1550: registry.register(name="web_extract", ...)

There is no third registry.register(name="web_crawl", ...) call. WEB_CRAWL_SCHEMA doesn't exist next to WEB_SEARCH_SCHEMA and WEB_EXTRACT_SCHEMA.

In toolsets.py:

line 33: "web_search", "web_extract", in _HERMES_CORE_TOOLS — no web_crawl.
line 92: TOOLSETS["web"]["tools"] is ["web_search", "web_extract"] — no web_crawl.
line 344: TOOLSETS["hermes-acp"]["tools"] lists web_search, web_extract — no web_crawl.

The dispatcher and provider infrastructure is fully built out (web_search_registry.get_active_crawl_provider() works correctly), but the tool surface is missing.

Proposed Fix

Strictly additive. Three changes:

Add WEB_CRAWL_SCHEMA next to the existing schemas. Minimal shape — url (required), instructions (optional):

WEB_CRAWL_SCHEMA = {
    "name": "web_crawl",
    "description": (
        "Crawl a website starting from a seed URL with natural-language "
        "instructions for what content to extract from each crawled page. "
        "Returns one entry per crawled page with url, title, and content "
        "(markdown). Use this for multi-page traversal of a single site "
        "(catalog/docs/blog); for single-URL extraction use web_extract."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "url": {
                "type": "string",
                "description": (
                    "Seed URL to crawl from. The crawler will follow "
                    "links from this page."
                ),
            },
            "instructions": {
                "type": "string",
                "description": (
                    "Natural-language description of what to extract "
                    "from each crawled page (e.g. 'list all product "
                    "pages with name and price')."
                ),
            },
        },
        "required": ["url"],
    },
}

This mirrors how WEB_EXTRACT_SCHEMA hides post-processing toggles. Other web_crawl_tool() params (depth, use_llm_processing, model, min_length) have sensible defaults and stay internal.

Add a matching registration after the web_extract block:

registry.register(
    name="web_crawl",
    toolset="web",
    schema=WEB_CRAWL_SCHEMA,
    handler=lambda args, **kw: web_crawl_tool(
        args.get("url", ""),
        instructions=args.get("instructions"),
    ),
    check_fn=check_web_api_key,
    requires_env=_web_requires_env(),
    is_async=True,
    emoji="🕸️",
    max_result_size_chars=100_000,
)

Add "web_crawl" to _HERMES_CORE_TOOLS, TOOLSETS["web"]["tools"], and TOOLSETS["hermes-acp"]["tools"] in toolsets.py.

Notes

Surfaced while testing a new Oxylabs AI Studio web provider plugin, but the bug pre-exists Oxylabs and affects every crawl-capable provider equally (Firecrawl, Tavily).
Verified end-to-end locally: with the three changes applied, web_crawl becomes model-callable, dispatches through the registered provider, and returns crawl results. No regression in web_search / web_extract.
Happy to submit a PR.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: web_crawl is implemented but never registered as a model-callable tool

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Root Cause Analysis

Proposed Fix

Notes

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING