hermes - 💡(How to fix) Fix Bug: web_crawl schema lets models auto-guess "instructions" instead of asking the user via clarify [1 pull requests]

Code Example

"instructions": {
    "type": "string",
    "description": (
        "Natural-language description of what to extract "
        "from each crawled page (e.g. 'list all product "
        "pages with name and price')."
    ),
},

---

plugins.web.oxylabs.provider: Oxylabs crawl: ... user_prompt='Crawl the site
  and extract the main visible page content, page title, important links,
  event/product listings, dates, prices, contact details, and any
  navigation/category structure.' ...

---

"instructions": {
    "type": "string",
    "description": (
        "What to extract from each crawled page (e.g. 'list all "
        "product pages with name and price'). Must reflect the "
        "user's stated goal. If the user did not say what to extract, "
        "ASK them via the `clarify` tool before calling — do not "
        "guess on their behalf."
    ),
},

---

# firecrawl.v2.FirecrawlClient.crawl signature
def crawl(self, url: str, *,
    prompt: Optional[str] = None,   # ← natural-language guidance
    exclude_paths: Optional[List[str]] = None,
    ...
)

---

# in plugins/web/firecrawl/provider.py, where crawl_params is built
if instructions:
    crawl_params["prompt"] = instructions

Bug Description

The web_crawl tool's schema (added in #32608 via tools/web_tools.py:1566) marks instructions as optional with a neutral description:

"instructions": {
    "type": "string",
    "description": (
        "Natural-language description of what to extract "
        "from each crawled page (e.g. 'list all product "
        "pages with name and price')."
    ),
},

Observed model behavior (gpt-5.5, fresh session, message: crawl https://tamsta.lt): the model auto-formulates a value for instructions from the URL alone, without asking the user what they want extracted. Example log line from agent.log:

plugins.web.oxylabs.provider: Oxylabs crawl: ... user_prompt='Crawl the site
  and extract the main visible page content, page title, important links,
  event/product listings, dates, prices, contact details, and any
  navigation/category structure.' ...

The user said "crawl" — three syllables. The model invented an 18-clause instruction without ever asking what the user actually wanted to extract. The crawl runs, produces results, but those results may have nothing to do with what the user had in mind.

Why the current behavior is suboptimal

instructions is the goal-shaping parameter for crawl. Wrong instructions → wrong crawl. The model's auto-formulated instruction is a guess about the user's intent — sometimes right, sometimes wrong, always taken without confirmation.

For crawl specifically, getting instructions right matters more than for web_search (where query is usually the user's verbatim ask) or web_extract (where the user typically passes URLs). A crawl is a multi-page traversal — the cost of running the wrong one is multiplied across every page returned.

Steps to Reproduce

Start a fresh Hermes session with any crawl-capable backend configured.
Send a minimal message: crawl <url> (no instructions, no extraction goal).
Check agent.log for the actual instructions passed to the crawl provider.

Observed: model invents a multi-clause instruction. The user is never asked.

Expected Behavior

When the user's message doesn't specify what to extract, the model should call clarify to ask the user — not guess. The schema is the natural place to push this behavior, because it's what the model reads at tool-selection time.

Proposed Fix

Tighten the description of the instructions field to be directive about asking the user when their goal isn't explicit:

"instructions": {
    "type": "string",
    "description": (
        "What to extract from each crawled page (e.g. 'list all "
        "product pages with name and price'). Must reflect the "
        "user's stated goal. If the user did not say what to extract, "
        "ASK them via the `clarify` tool before calling — do not "
        "guess on their behalf."
    ),
},

Schema-level wording is more effective than a runtime error envelope because the model reads it during tool selection (before deciding whether to call), not just after a failed call.

Cross-backend impact assessment

Three providers support crawl: Firecrawl, Tavily, Oxylabs.

Provider	API behavior for `instructions`	Effect of schema directive
Tavily	Optional pass-through (`provider.py:251-252`). If present, sent to Tavily API; if absent, vendor defaults apply.	Strict improvement — user-supplied instructions give Tavily a sharper target than its default.
Oxylabs	Required by API. Without it: the plugin's `crawl()` short-circuits with a typed hint.	Strict improvement — no wasted credits on instructions-less calls; user in control.
Firecrawl	The current `plugins/web/firecrawl/provider.py:610-619` logs and drops `instructions`, based on a comment claiming `/crawl` doesn't accept natural-language guidance. That assumption is stale — see sidenote below.	Asking the user a question Firecrawl currently ignores is a mild UX cost (one wasted turn). Becomes strict improvement if the Firecrawl plugin is updated to forward `instructions`.

Sidenote: Firecrawl plugin could honor `instructions` today

Not in scope for this issue (this is a schema change, not a Firecrawl plugin change), but worth flagging since it removes the "asks a useless question" UX cost from the Firecrawl row above.

The pinned SDK (firecrawl-py==4.17.0) exposes a prompt parameter on Firecrawl.crawl() (the v2 client):

# firecrawl.v2.FirecrawlClient.crawl signature
def crawl(self, url: str, *,
    prompt: Optional[str] = None,   # ← natural-language guidance
    exclude_paths: Optional[List[str]] = None,
    ...
)

The SDK also ships a crawl_params_preview(url, prompt) helper whose docstring is "Derive crawl parameters from natural-language prompt." — explicit confirmation that /crawl accepts prompts in the current API. Per-version: v2 has prompt; v1 (firecrawl.v1.V1FirecrawlApp) does not. Hermes uses v2 (from firecrawl import Firecrawl at provider.py:88 resolves to the v2 client).

The "log + drop" behavior in the plugin was likely correct against an older SDK; the comment hasn't been updated. The one-line fix would be:

# in plugins/web/firecrawl/provider.py, where crawl_params is built
if instructions:
    crawl_params["prompt"] = instructions

…instead of the current log-and-discard. With that change, all three crawl backends would uniformly honor instructions, and the schema directive proposed in this issue becomes a strict improvement everywhere.

Happy to bundle the Firecrawl fix with the schema PR if maintainers prefer, or leave it as a follow-up.

Impact

Without this change: crawl results reflect the model's guess about user intent, not the user's actual goal. Costly per-page (and per-credit for paid backends) when the guess is wrong.
With this change: one extra round-trip (model asks via clarify, user answers, model calls with the answer) — but every crawl is intentional. Matches what clarify was built for.
Adjacent benefit: if/when the Firecrawl plugin forwards prompt (see sidenote), the schema directive benefits all three crawl backends uniformly.

Notes

Surfaced while integrating a new Oxylabs plugin (#32600). The Oxylabs API rejects empty user_prompt with HTTP 400; the plugin's short-circuit returns a typed hint when instructions is missing. But because the model auto-formulates instructions on the first call, the short-circuit rarely fires — the model just guesses. Plugin-level fixes can't override what the model reads at tool-selection time; only the schema can.
This is purely a schema text change — no new code, no behavior changes for callers who already pass instructions. Strictly a model-direction improvement.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: web_crawl schema lets models auto-guess "instructions" instead of asking the user via clarify [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Bug Description

Why the current behavior is suboptimal

Steps to Reproduce

Expected Behavior

Proposed Fix

Cross-backend impact assessment

Sidenote: Firecrawl plugin could honor `instructions` today

Impact

Notes

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Bug: web_crawl schema lets models auto-guess "instructions" instead of asking the user via clarify [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Bug Description

Why the current behavior is suboptimal

Steps to Reproduce

Expected Behavior

Proposed Fix

Cross-backend impact assessment

Sidenote: Firecrawl plugin could honor instructions today

Impact

Notes

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING

Sidenote: Firecrawl plugin could honor `instructions` today