hermes - 💡(How to fix) Fix feat: add sources/categories parameter to web_search_tool for targeted search results

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

flowchart TD
    A["Agent: web_search(query='...', categories=['science'])"] 
    B["tool schema: web_search_tool(query, limit, categories=None)"]
    C["dispatch: routes to configured backend"]
    D1["firecrawl provider: client.search(sources=..., categories=...)"]
    D2["searxng provider: params['categories'] = ','.join(categories)"]
    D3["exa provider: params['category'] = categories[0]"]
    D4["brave-free provider: params['result_filter'] = ..."]
    D5["tavily provider: params['topic'] = categories[0]"]
    D6["other providers: ignore categories param"]
    E["response normalizer: already handles grouped shapes"]

Code Example

# web_tools.py:765
def web_search_tool(query: str, limit: int = 5) -> str:

---

flowchart TD
    A["Agent: web_search(query='...', categories=['science'])"] 
    B["tool schema: web_search_tool(query, limit, categories=None)"]
    C["dispatch: routes to configured backend"]
    D1["firecrawl provider: client.search(sources=..., categories=...)"]
    D2["searxng provider: params['categories'] = ','.join(categories)"]
    D3["exa provider: params['category'] = categories[0]"]
    D4["brave-free provider: params['result_filter'] = ..."]
    D5["tavily provider: params['topic'] = categories[0]"]
    D6["other providers: ignore categories param"]
    E["response normalizer: already handles grouped shapes"]

    A --> B --> C
    C --> D1 & D2 & D3 & D4 & D5 & D6
    D1 --> E
    D2 --> E
    D3 --> E
    D4 --> E
    D5 --> E
    D6 --> E

---

def web_search_tool(
    query: str,
    limit: int = 5,
    categories: Optional[list[str]] = None,
) -> str:

---

# Agent usage examples:
web_search(query="neuromorphic computing", categories=["science"])
# → SearXNG: arxiv, pubmed, semantic scholar

web_search(query="OpenAI funding", categories=["news"])
# → Backend news-specific indices

web_search(query="attention mechanism", categories=["research"])
# → Firecrawl: research papers, PDFs
RAW_BUFFERClick to expand / collapse

Feature Request: Add categories parameter to web_search_tool

The Firecrawl v2 SDK, SearXNG, Exa, Brave, and Tavily all support server-side result filtering by source type or content category. Hermes' web_search_tool currently has no way to pass these through, forcing agents to search broadly and discard irrelevant results from context.

Current behavior

# web_tools.py:765
def web_search_tool(query: str, limit: int = 5) -> str:

An agent searching for "neuromorphic computing" gets a mix of news articles, research papers, GitHub repos, and blog posts — even when it only wanted research papers.

Proposed data flow

flowchart TD
    A["Agent: web_search(query='...', categories=['science'])"] 
    B["tool schema: web_search_tool(query, limit, categories=None)"]
    C["dispatch: routes to configured backend"]
    D1["firecrawl provider: client.search(sources=..., categories=...)"]
    D2["searxng provider: params['categories'] = ','.join(categories)"]
    D3["exa provider: params['category'] = categories[0]"]
    D4["brave-free provider: params['result_filter'] = ..."]
    D5["tavily provider: params['topic'] = categories[0]"]
    D6["other providers: ignore categories param"]
    E["response normalizer: already handles grouped shapes"]

    A --> B --> C
    C --> D1 & D2 & D3 & D4 & D5 & D6
    D1 --> E
    D2 --> E
    D3 --> E
    D4 --> E
    D5 --> E
    D6 --> E

Requested behavior

def web_search_tool(
    query: str,
    limit: int = 5,
    categories: Optional[list[str]] = None,
) -> str:
# Agent usage examples:
web_search(query="neuromorphic computing", categories=["science"])
# → SearXNG: arxiv, pubmed, semantic scholar

web_search(query="OpenAI funding", categories=["news"])
# → Backend news-specific indices

web_search(query="attention mechanism", categories=["research"])
# → Firecrawl: research papers, PDFs

Backend support matrix

BackendAPI parameterCategory values supported
firecrawlsources and categoriessources: web, news, images / categories: research, github, pdf
searxngcategoriesgeneral, news, science, it, images, files, social media, map, music, videos
exacategorycompany, people, news, code
brave-freeresult_filterweb, news, images, video
tavilytopicgeneral, news, finance
parallelNot supported
ddgsNot supported
xaiNot supported

The parameter defaults to None — non-supporting backends ignore it with no behavior change.

Cross-provider naming mismatch and mapping strategy

There is no universal category taxonomy across backends:

  • "research" means something to Firecrawl but not to SearXNG
  • "science" means something to SearXNG but not to Firecrawl
  • Firecrawl has BOTH sources (web/news/images) AND categories (research/github/pdf) as separate dimensions

Recommendation: The Hermes-level categories parameter passes values through to each backend as-is, rather than trying to normalize them. The agent is responsible for knowing which values its configured backend supports. This avoids creating a leaky abstraction where Hermes tries to map "science" to every provider's different concept of it.

An alternative would be a normalized taxonomy (web, news, research, code, images) that each provider maps internally — but this requires a lookup table per backend and adds complexity. The pass-through approach is simpler, more transparent, and avoids stale mappings when providers add new categories.

Per-backend implementation notes

Firecrawl uses two separate dimensions — sources and categories — that can be combined. A single categories list mixes both: ["news"] maps to sources=["news"], ["research"] maps to categories=["research"]. The provider needs a small mapping table.

SearXNG accepts comma-separated multiple categories (categories=news,science). The provider joins the list.

Exa accepts a single string (category=company), not a list. If multiple values are provided, only the first is used.

Brave uses result_filter with prefixed values or separate endpoint URLs. The provider maps ["news"]result_filter=news.

Tavily uses topic with a single value. This is semantically different (topic vs. source type) but included for completeness since news is a common use case.

Implementation scope

Four changes:

  1. Tool schema (tools/web_tools.py:765): Add categories as an optional list[str] param
  2. Firecrawl provider (plugins/web/firecrawl/provider.py:388): Map and pass to client.search()
  3. SearXNG provider (plugins/web/searxng/provider.py:55): Join and pass to JSON API param
  4. Brave-free provider (plugins/web/brave_free/provider.py): Map to result_filter

The response normalizer (_extract_web_search_results) already handles grouped response shapes — no change needed.

Provider capability discovery

For firecrawl, brave-free, tavily, and exa, supported categories are documented API constants — fixed per spec. But SearXNG is self-hosted and users may enable/disable engines, so available categories vary per deployment. SearXNG exposes GET /config listing active categories and engines. This is additive — the initial implementation passes values through and lets the backend reject unsupported ones.

Testing strategy

The change spans four files (tool schema, three providers), each independently testable:

  1. Tool schema: Unit test that web_search_tool() accepts and forwards categories to the active provider. Mock the provider to verify the parameter arrives.
  2. Firecrawl provider: Unit test that the mapping table (news → sources, research → categories) works correctly. Test with known Firecrawl categories, unknown categories (should pass through as-is), and None (backward compat).
  3. SearXNG provider: Unit test that a list of categories is comma-joined in the HTTP params. Test single, multiple, and None.
  4. Brave-free provider: Unit test that result_filter is set correctly.

Integration test: Set up a SearXNG test fixture or mock, configure backend as searxng with SEARXNG_URL, call web_search_tool(query="test", categories=["science"]), assert the response contains only science-categorized results.

Prior art

GroktoCrawl (open-source Firecrawl-compatible stack at github.com/groktopus/groktocrawl — PR #84) already supports categories on /v2/search, backed by SearXNG's native category filtering.

Notes

  • Backward compatible: categories defaults to None, preserving existing behavior
  • Unaffected backends silently ignore the parameter
  • This enables a zero-round-trip improvement: agent scopes search server-side instead of searching broadly and discarding irrelevant results from context

Filed on behalf of Magnus Hedemark by Jasper (AI agent). See also groktopus/groktocrawl#84 for the server-side implementation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat: add sources/categories parameter to web_search_tool for targeted search results