hermes - 💡(How to fix) Fix [Bug]: _get_cloud_provider() caches transient None, pinning process to local mode forever [3 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

def _get_cloud_provider() -> Optional[CloudBrowserProvider]: global _cached_cloud_provider, _cloud_provider_resolved if _cloud_provider_resolved: return _cached_cloud_provider

_cloud_provider_resolved = True   # ← set BEFORE resolution; sticks even if result is None
try:
    ...
except Exception as e:
    logger.debug("Could not read cloud_provider from config: %s", e)
# If neither explicit nor auto-detect found a configured provider,
# _cached_cloud_provider stays None — but _cloud_provider_resolved == True
# so we never retry.
return _cached_cloud_provider

Root Cause

Symptoms in the agent's view:

  • Every browser_navigate reports features: {"local": true} and the "Running WITHOUT residential proxies. Bot detection may be more aggressive." stealth warning.
  • The local agent-browser-linux-* binary is spawned and used despite explicit cloud config.
  • hermes status correctly shows Browser automation ✓ active via Nous subscription because the credential resolver works fine when probed fresh — but the gateway's already-imported tools.browser_tool module has the stale cache.

Fix Action

Fixed

Code Example

def _get_cloud_provider() -> Optional[CloudBrowserProvider]:
    global _cached_cloud_provider, _cloud_provider_resolved
    if _cloud_provider_resolved:
        return _cached_cloud_provider

    _cloud_provider_resolved = True   # ← set BEFORE resolution; sticks even if result is None
    try:
        ...
    except Exception as e:
        logger.debug("Could not read cloud_provider from config: %s", e)
    # If neither explicit nor auto-detect found a configured provider,
    # _cached_cloud_provider stays None — but _cloud_provider_resolved == True
    # so we never retry.
    return _cached_cloud_provider
RAW_BUFFERClick to expand / collapse

Bug Description

tools/browser_tool.py _get_cloud_provider() caches its first result for the life of the process — including transient-None outcomes. If credentials are not yet available at the very first browser call, the resolver pins the entire process to local mode forever, even after credentials self-heal seconds later.

This regresses the affordance of issue #10883 (#10883's PR added _get_session_info() runtime fallback, but the resolver itself still poisons its cache).

Code location

tools/browser_tool.py, _get_cloud_provider():

def _get_cloud_provider() -> Optional[CloudBrowserProvider]:
    global _cached_cloud_provider, _cloud_provider_resolved
    if _cloud_provider_resolved:
        return _cached_cloud_provider

    _cloud_provider_resolved = True   # ← set BEFORE resolution; sticks even if result is None
    try:
        ...
    except Exception as e:
        logger.debug("Could not read cloud_provider from config: %s", e)
    # If neither explicit nor auto-detect found a configured provider,
    # _cached_cloud_provider stays None — but _cloud_provider_resolved == True
    # so we never retry.
    return _cached_cloud_provider

Reproduction

The failure mode I hit: managed Nous Portal Browser-Use gateway, with the access token expiring shortly after gateway startup.

  1. browser.cloud_provider: browser-use set in ~/.hermes/config.yaml.
  2. Nous Portal access token has T-1m remaining at gateway boot.
  3. First browser_navigate call lands during the brief window where the token is mid-refresh and resolve_managed_tool_gateway("browser-use") returns None.
  4. BrowserUseProvider.is_configured() returns False; auto-detect chain falls through; _cached_cloud_provider stays None; _cloud_provider_resolved is True.
  5. Token rotates 30 seconds later. Subsequent calls still get the cached None for the rest of the gateway's lifetime.

Symptoms in the agent's view:

  • Every browser_navigate reports features: {"local": true} and the "Running WITHOUT residential proxies. Bot detection may be more aggressive." stealth warning.
  • The local agent-browser-linux-* binary is spawned and used despite explicit cloud config.
  • hermes status correctly shows Browser automation ✓ active via Nous subscription because the credential resolver works fine when probed fresh — but the gateway's already-imported tools.browser_tool module has the stale cache.

This is also reachable via:

  • A config read that raises during the first browser call (silently logged at debug level).
  • An explicit provider's __init__ momentarily raising (the path silently returns the existing None).

Expected Behavior

Cache-only-on-success: a None result should NOT be cached unless the user explicitly opted in to cloud_provider: local. Transient credential gaps should self-heal on the next browser call.

Proposed Fix

Invert the bookkeeping so _cloud_provider_resolved = True is set only when we actually picked a provider (or the user explicitly chose local). Surface explicit-provider instantiation failures at warning level (with exc_info=True) so operators can see why their config didn't take effect.

PR coming separately with 5 new tests:

  • explicit_local_caches_permanently (regression guard for the explicit case)
  • successful_cloud_resolution_caches_permanently (regression guard for the happy path)
  • no_credentials_yet_does_not_cache_none (primary regression guard for the bug above)
  • config_read_failure_does_not_cache_none
  • explicit_provider_instantiation_failure_does_not_cache

138 existing browser tests still green (verified locally against origin/main).

Affected Component

Tools (browser).

Operating System

Any. The bug is in the tool layer.

Hermes Version

Current main (a7e7921db). The buggy resolver shape predates #10883 and was untouched by it.

Are you willing to submit a PR for this?

  • Yes, opening shortly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: _get_cloud_provider() caches transient None, pinning process to local mode forever [3 pull requests]