hermes - 💡(How to fix) Fix [Bug]: `agent.models_dev.get_provider_info()` blocks when `models.dev` is unreachable and local cache is stale

Root Cause

The apparent issue is in the models.dev provider metadata path.

When the local cache is stale, get_provider_info(...) appears to synchronously refresh from:

https://models.dev/api.json

If that request times out, provider resolution blocks even though the stale cache already contains the requested provider.

This explains why hermes doctor appears to stall after the visible configuration checks: the next step performs provider/model validation, which reaches the blocking metadata path.

Code Example

resolve_provider_full(...)
→ agent.models_dev.get_provider_info(...)

---

get_provider_info("openai")
get_provider_info("deepseek")

---

/opt/data/models_dev_cache.json

---

from agent.models_dev import get_provider_info

get_provider_info("openai")

---

hermes doctor
hermes model

---

https://models.dev/api.json

---

touch /opt/data/models_dev_cache.json

---

I can provide `hermes debug share` output if needed.

I have not pasted it here because the issue appears reproducible through direct
`agent.models_dev.get_provider_info(...)` calls, and I want to avoid accidentally
sharing local config or secrets.

---

3.13.5

---

v0.15.1

---

Manual timing:

config: /opt/data/config.yaml

START import yaml
OK    import yaml: 0.000s

START read+safe_load config
OK    read+safe_load config: 0.011s

provider: openai-codex
default_model: gpt-5.4-mini

START import hermes_cli.auth
OK    import hermes_cli.auth: 0.056s

START import hermes_cli.providers
OK    import hermes_cli.providers: 0.001s

START resolve_provider
OK    resolve_provider: 0.000s

START get_compatible_custom_providers
OK    get_compatible_custom_providers: 0.000s

START resolve_provider_full
# blocks here

---

START import agent.models_dev
OK import agent.models_dev 0.094 <module 'agent.models_dev' from '/opt/hermes/agent/models_dev.py'>

START get_provider_info(openai)
# blocks / timeout 124

START get_provider_info(deepseek)
# blocks / timeout 124

---

curl -4 -I --max-time 10 https://models.dev/
curl: (28) Connection timed out after 10005 milliseconds

curl -6 -I --max-time 10 https://models.dev/
curl: (6) Could not resolve host: models.dev

---

exists: True
stat: os.stat_result(
  st_mode=33152,
  st_ino=30936,
  st_dev=42,
  st_nlink=1,
  st_uid=0,
  st_gid=0,
  st_size=2141313,
  st_atime=1780207255,
  st_mtime=1780207253,
  st_ctime=1780207253
)

age seconds: 19144.97229242325
loaded type: dict
provider count: 137
has openai: True
has deepseek: True

---

touch /opt/data/models_dev_cache.json

---

https://models.dev/api.json

---

- Use stale-while-revalidate semantics for models_dev_cache.json.
- Add a shorter timeout around https://models.dev/api.json.
- Add an offline/local-cache-only mode for provider validation.
- Add an environment variable such as HERMES_MODELS_DEV_OFFLINE=1.
- Log a warning when models.dev is unreachable instead of blocking.

---

get_provider_info("openai")
get_provider_info("deepseek")

Bug Description

agent.models_dev.get_provider_info(...) can block when https://models.dev/api.json is unreachable and the local models_dev_cache.json exists but is considered stale.

This is not specific to hermes doctor. hermes doctor exposes the issue because it performs provider/model validation, but the lower-level blocking path appears to be:

resolve_provider_full(...)
→ agent.models_dev.get_provider_info(...)

Any command or feature that depends on this provider/model metadata path may become slow or block.

In my case, both of these blocked until timeout:

get_provider_info("openai")
get_provider_info("deepseek")

So the issue does not appear to be provider-specific.

Steps to Reproduce

Have a readable local cache at:

/opt/data/models_dev_cache.json

Let the cache become stale according to the current freshness policy.
Make models.dev unreachable, or reproduce during an outage / network timeout.
Run any code path that resolves provider metadata, for example:

from agent.models_dev import get_provider_info

get_provider_info("openai")

or run a Hermes command that performs provider/model validation, such as:

hermes doctor
hermes model

Expected Behavior

If models.dev is unreachable but a readable local cache exists and contains the requested provider, get_provider_info(...) should return from the stale cache with a warning instead of blocking provider/model resolution.

Expected behavior could be:

Load the local cache first.
Use it immediately if fresh.
If stale, try to refresh https://models.dev/api.json with a short timeout.
If refresh fails, fall back to the stale cache with a warning.
Optionally refresh in the background.

For commands such as hermes doctor, stale provider metadata is preferable to blocking the whole health check.

Actual Behavior

When models_dev_cache.json is stale, get_provider_info(...) attempts to refresh from:

https://models.dev/api.json

If that endpoint is unreachable, the call blocks before returning provider metadata, even though the stale local cache is readable and contains the requested provider.

Touching the cache file immediately fixed the issue:

touch /opt/data/models_dev_cache.json

After that, both direct get_provider_info(...) calls and hermes doctor worked again.

Affected Component

Agent Core / provider metadata resolution.

CLI commands such as hermes doctor and hermes model are affected indirectly because they call the same provider/model metadata path.

Messaging Platform

Not gateway-specific.

Debug Report

I can provide `hermes debug share` output if needed.

I have not pasted it here because the issue appears reproducible through direct
`agent.models_dev.get_provider_info(...)` calls, and I want to avoid accidentally
sharing local config or secrets.

Operating System

Docker container on macOS host.

Host OS: macOS Tahoe 26.5

Python Version

3.13.5

Hermes Version

v0.15.1

Additional Logs / Traceback

Manual timing:

config: /opt/data/config.yaml

START import yaml
OK    import yaml: 0.000s

START read+safe_load config
OK    read+safe_load config: 0.011s

provider: openai-codex
default_model: gpt-5.4-mini

START import hermes_cli.auth
OK    import hermes_cli.auth: 0.056s

START import hermes_cli.providers
OK    import hermes_cli.providers: 0.001s

START resolve_provider
OK    resolve_provider: 0.000s

START get_compatible_custom_providers
OK    get_compatible_custom_providers: 0.000s

START resolve_provider_full
# blocks here

Further narrowed down:

START import agent.models_dev
OK import agent.models_dev 0.094 <module 'agent.models_dev' from '/opt/hermes/agent/models_dev.py'>

START get_provider_info(openai)
# blocks / timeout 124

START get_provider_info(deepseek)
# blocks / timeout 124

Network test:

curl -4 -I --max-time 10 https://models.dev/
curl: (28) Connection timed out after 10005 milliseconds

curl -6 -I --max-time 10 https://models.dev/
curl: (6) Could not resolve host: models.dev

Cache inspection:

exists: True
stat: os.stat_result(
  st_mode=33152,
  st_ino=30936,
  st_dev=42,
  st_nlink=1,
  st_uid=0,
  st_gid=0,
  st_size=2141313,
  st_atime=1780207255,
  st_mtime=1780207253,
  st_ctime=1780207253
)

age seconds: 19144.97229242325
loaded type: dict
provider count: 137
has openai: True
has deepseek: True

Workaround:

touch /opt/data/models_dev_cache.json

Root Cause Analysis

The apparent issue is in the models.dev provider metadata path.

When the local cache is stale, get_provider_info(...) appears to synchronously refresh from:

https://models.dev/api.json

If that request times out, provider resolution blocks even though the stale cache already contains the requested provider.

This explains why hermes doctor appears to stall after the visible configuration checks: the next step performs provider/model validation, which reaches the blocking metadata path.

Proposed Fix

Use stale-while-revalidate behavior for models_dev_cache.json.

Suggested behavior:

Load readable local cache first.
If the cache is fresh, use it directly.
If the cache is stale, keep it available as an immediate fallback.
Attempt remote refresh with a short timeout.
If refresh fails, return provider metadata from stale cache and emit a warning.
Optionally refresh models.dev in the background.

Possible implementation options:

- Use stale-while-revalidate semantics for models_dev_cache.json.
- Add a shorter timeout around https://models.dev/api.json.
- Add an offline/local-cache-only mode for provider validation.
- Add an environment variable such as HERMES_MODELS_DEV_OFFLINE=1.
- Log a warning when models.dev is unreachable instead of blocking.

Expected result:

If models.dev is unreachable but models_dev_cache.json exists and contains the requested provider, calls such as:

get_provider_info("openai")
get_provider_info("deepseek")

should return from stale cache with a warning instead of timing out or blocking provider/model validation.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: `agent.models_dev.get_provider_info()` blocks when `models.dev` is unreachable and local cache is stale

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback

Root Cause Analysis

Proposed Fix

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING