hermes - 💡(How to fix) Fix feat(tools/browser): expose Browser Use cloud agent-mode (run-task) as browser_task tool

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Expose Browser Use's cloud agent-mode endpoint (POST /api/v3/run-task) as a first-class browser_task tool so the LLM can hand off a high-level goal and receive only the final structured result, instead of doing N rounds of browser_navigatebrowser_snapshotbrowser_click that flood the agent's context window.

Root Cause

Expose Browser Use's cloud agent-mode endpoint (POST /api/v3/run-task) as a first-class browser_task tool so the LLM can hand off a high-level goal and receive only the final structured result, instead of doing N rounds of browser_navigatebrowser_snapshotbrowser_click that flood the agent's context window.

Code Example

browser_task(
  goal: str,                    # e.g. "Go to news.ycombinator.com, return top 3 stories as JSON"
  schema: dict | None = None,   # JSON schema for structured output
  timeout_minutes: int = 5,
  starting_url: str | None = None,
  allowed_domains: list[str] | None = None,
) -> dict
RAW_BUFFERClick to expand / collapse

Summary

Expose Browser Use's cloud agent-mode endpoint (POST /api/v3/run-task) as a first-class browser_task tool so the LLM can hand off a high-level goal and receive only the final structured result, instead of doing N rounds of browser_navigatebrowser_snapshotbrowser_click that flood the agent's context window.

Motivation — current efficiency story

The Browser Use provider that landed in v0.10 (tools/browser_providers/browser_use.py) is a CDP session provider only. When a user enables browser.use_gateway: true, the provider hands a Browserbase-equivalent CDP URL back to local agent-browser, which then drives it the same way it drives local Chromium. The interface to the LLM is unchanged: browser_navigate, browser_snapshot, browser_click, etc. Each call still serializes a full a11y snapshot back into context — typically 2–5K tokens per page, more for SPA-heavy sites.

This means the gateway swap (local Chromium → cloud Chromium) is a reliability and cold-start win, not a token-cost win. Users who turn it on for that reason are surprised when their token bills don't drop. The skill library in this repo even calls this out (see lightpanda-scrape and the various web-tool skills). The advertised "5–10× efficiency" of the Browser Use product comes specifically from its server-side LLM agent loop — not from the CDP session shape.

Proposal — browser_task tool

Add a new tool that wraps Browser Use's run-task endpoint:

browser_task(
  goal: str,                    # e.g. "Go to news.ycombinator.com, return top 3 stories as JSON"
  schema: dict | None = None,   # JSON schema for structured output
  timeout_minutes: int = 5,
  starting_url: str | None = None,
  allowed_domains: list[str] | None = None,
) -> dict

Behavior:

  1. POST to https://api.browser-use.com/api/v3/run-task with the goal + optional schema.
  2. Poll the task ID at the documented interval until status == "finished" (or failed/stopped).
  3. Return only the output field (and a task_id for debugging) to the agent.

Key property: only the final structured result lands in the agent's context window. The intermediate browse loop runs server-side on Browser Use's infra, not in the local Hermes context.

Auth

Reuse the existing BrowserUseProvider._get_config_or_none() resolver — it already handles both:

  • Direct BROWSER_USE_API_KEY env var
  • Managed Nous Tool Gateway credentials (resolve_managed_tool_gateway("browser-use"))

So Nous Portal subscribers get this for free with their existing subscription.

Toolset / config wiring

  • Add browser_task to the browser toolset (or a new browser_agent toolset for cleaner opt-in).
  • Gate on browser.use_gateway: true or a present BROWSER_USE_API_KEY — same check the provider already does.
  • Surface in hermes tools listing alongside the existing browser tools.

Why this complements, not replaces, the existing browser tools

  • Existing low-level browser_* tools remain the right choice for: surgical UI flows the user wants to inspect step-by-step, debugging a stuck flow, or anything where the agent needs to reason about intermediate state.
  • browser_task is the right choice for: "go fetch this data and come back with JSON." The agent doesn't care how the cloud agent navigated; it just wants the result.

This mirrors the existing web_extract (Firecrawl) vs browser_navigate split — fire-and-forget for bulk data, low-level for interactive flows.

Out of scope for this issue

  • Local equivalent (running Browser Use's open-source agent loop locally against Chromium). That's a much bigger lift; this issue is just about exposing the existing cloud endpoint.
  • Streaming intermediate progress (status: "running" updates). v1 can poll silently and only surface the final result.

Optional follow-ups (separate issues)

  • A browser_task_status tool for very-long-running tasks where the user wants to check progress mid-flight.
  • A subagent-style streaming variant that surfaces step summaries (not full snapshots) as the cloud agent works.

Filed in response to a real user smoke test on v0.10 where the user expected a token-efficiency win from browser.use_gateway: true and was surprised to find the in-context tool surface unchanged. Skill memory and the lightpanda-scrape skill both correctly note that the gateway is for reliability/cold-start, not token cost — but this means the actual token efficiency endpoint Browser Use is famous for isn't reachable from Hermes today.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING