hermes - ✅(Solved) Fix Cron jobs fail silently when llama.cpp model changes — should query /v1/models at runtime [1 pull requests, 1 comments, 2 participants]

pmaho · 2026-05-05T09:16:36Z

[hermes] PR 20150: fix: auto-discover model from local inference server in cron scheduler - Repository: NousResearch/hermes-agent - Author: vominh1919 - State:… # PR #20150: fix: auto-discover model from local inference server in cron scheduler - Repository: NousResearch/hermes-agent - Author: vominh1919 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/20150 ## Description (problem / solution / changelog) ## Problem Cron jobs without an explicit `model` and `provider` pinned fail with a generic `RuntimeError: Connection error` when: 1. The llama.cpp / Ollama / vLLM server is running a **different model** than what's in `config.yaml` 2. The local inference server is **not running at all** The current code hardcodes the model from `config.yaml` at cron job creation time (scheduler.py ~line 1075-1091). If the user swaps models in llama.cpp (very common — crashing, swapping, multiple profiles), cron jobs break silently with no actionable error message. ## Fix Two changes in `cron/scheduler.py`, inserted after the runtime provider resolution block: ### 1. Auto-discover model from `/v1/models` When no explicit model is pinned by the job (`job.get("model")` is empty), the scheduler now queries `{base_url}/v1/models` to discover what model is actually loaded on the inference server. Uses the first available model. This is a best-effort check — failures are logged at `debug` level and the existing fallback to `config.yaml` still applies. ### 2. Clear error for unreachable local servers When the resolved `base_url` points to a local server (`localhost`, `127.0.0.1`, `[::1]`), the scheduler probes `/v1/models` with a 5-second timeout. If the server is unreachable, it raises a clear `RuntimeError`: > Local inference server not reachable at http://localhost:8080. Is llama.cpp / Ollama / vLLM running? Instead of the generic SDK "Connection error" that gives no diagnostic information. ## Why this matters Local inference users frequently swap models: - Server crashes → user loads a different model - User switches models via `/model` command - Multiple profiles with different models - Ollama users pull new models regularly Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions. Fixes #20125 ## Changed files - `cron/scheduler.py` (modified, +46/-0) ## Fixed - Fixed by PR: fix: auto-discover model from local inference server in cron scheduler (https://github.com/NousResearch/hermes-agent/pull/20150) ## Bug Report ### What happened Cron jobs without an explicit `model` and `provider` pinned fail with a generic `RuntimeError: Connection error` when llama.cpp is running a different model than what is in `config.yaml`, or when llama.cpp is not running at all. ### What should happen Cron jobs without explicit model config should: 1. **Query the `/v1/models` endpoint** at runtime to discover what is actually loaded on the local llama.cpp server 2. Use whatever model is available (or pick the first one) 3. Give a clear error message if the server is unreachable (e.g., "llama.cpp server not reachable at localhost:8080") instead of the generic "Connection error" ### Current behavior (scheduler.py line ~946-960) ```python model = job.get("model") or os.getenv("HERMES_MODEL") or "" # Falls back to config.yaml model.default if not job.get("model"): model = _model_cfg.get("default", model) ``` This hardcodes the config.yaml model at cron job creation time. If the user changes their llama.cpp model (very common with local inference — crashing, swapping models, etc.), cron jobs break silently. ### Expected behavior ```python model = job.get("model") or os.getenv("HERMES_MODEL") or "" if not model: # Try to auto-discover from running server available = requests.get(f"{base_url}/v1/models", timeout=5).json() if available.get("data"): model = available["data"][0]["id"] # Use whatever is loaded else: model = _model_cfg.get("default", "") ``` ### Why this matters Local llama.cpp users frequently swap models: - Server crashes, user loads a different model - User switches models mid-session via `/model` - User has multiple profiles with different models Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions. The current behavior requires manually pinning every cron job to a specific model, which defeats the purpose of local inference flexibility. ### Reproduction 1. Create a cron job without specifying `model`/`provider` 2. Change llama.cpp to load a different model (or restart it) 3. Run the cron job → gets `RuntimeError: Connection error` ### Environment - Hermes Agent 0.12.0 - Custom llama.cpp provider (localhost:8080)

hermes2026-05-05 09:16:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20125•Fetched 2026-05-06 06:38:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

pmaho

Participants

pmaho

vominh1919

Timeline (top)

labeled ×3commented ×1cross-referenced ×1

Error Message

Cron jobs without an explicit model and provider pinned fail with a generic RuntimeError: Connection error when llama.cpp is running a different model than what is in config.yaml, or when llama.cpp is not running at all. 3. Give a clear error message if the server is unreachable (e.g., "llama.cpp server not reachable at localhost:8080") instead of the generic "Connection error" 3. Run the cron job → gets RuntimeError: Connection error

Root Cause

Local llama.cpp users frequently swap models:

Server crashes, user loads a different model
User switches models mid-session via /model
User has multiple profiles with different models

Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions. The current behavior requires manually pinning every cron job to a specific model, which defeats the purpose of local inference flexibility.

Fix Action

Fixed

Fixed by PR: fix: auto-discover model from local inference server in cron scheduler (https://github.com/NousResearch/hermes-agent/pull/20150)

PR fix notes

PR #20150: fix: auto-discover model from local inference server in cron scheduler

Repository: NousResearch/hermes-agent
Author: vominh1919
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20150

Description (problem / solution / changelog)

Problem

Cron jobs without an explicit model and provider pinned fail with a generic RuntimeError: Connection error when:

The llama.cpp / Ollama / vLLM server is running a different model than what's in config.yaml
The local inference server is not running at all

The current code hardcodes the model from config.yaml at cron job creation time (scheduler.py ~line 1075-1091). If the user swaps models in llama.cpp (very common — crashing, swapping, multiple profiles), cron jobs break silently with no actionable error message.

Fix

Two changes in cron/scheduler.py, inserted after the runtime provider resolution block:

1. Auto-discover model from `/v1/models`

When no explicit model is pinned by the job (job.get("model") is empty), the scheduler now queries {base_url}/v1/models to discover what model is actually loaded on the inference server. Uses the first available model. This is a best-effort check — failures are logged at debug level and the existing fallback to config.yaml still applies.

2. Clear error for unreachable local servers

When the resolved base_url points to a local server (localhost, 127.0.0.1, [::1]), the scheduler probes /v1/models with a 5-second timeout. If the server is unreachable, it raises a clear RuntimeError:

Local inference server not reachable at http://localhost:8080. Is llama.cpp / Ollama / vLLM running?

Instead of the generic SDK "Connection error" that gives no diagnostic information.

Why this matters

Local inference users frequently swap models:

Server crashes → user loads a different model
User switches models via /model command
Multiple profiles with different models
Ollama users pull new models regularly

Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions.

Fixes #20125

Changed files

cron/scheduler.py (modified, +46/-0)

Code Example

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
# Falls back to config.yaml model.default
if not job.get("model"):
    model = _model_cfg.get("default", model)

---

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

RAW_BUFFERClick to expand / collapse

Bug Report

What happened

What should happen

Cron jobs without explicit model config should:

Query the /v1/models endpoint at runtime to discover what is actually loaded on the local llama.cpp server
Use whatever model is available (or pick the first one)
Give a clear error message if the server is unreachable (e.g., "llama.cpp server not reachable at localhost:8080") instead of the generic "Connection error"

Current behavior (scheduler.py line ~946-960)

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
# Falls back to config.yaml model.default
if not job.get("model"):
    model = _model_cfg.get("default", model)

This hardcodes the config.yaml model at cron job creation time. If the user changes their llama.cpp model (very common with local inference — crashing, swapping models, etc.), cron jobs break silently.

Expected behavior

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

Why this matters

Local llama.cpp users frequently swap models:

Server crashes, user loads a different model
User switches models mid-session via /model
User has multiple profiles with different models

Reproduction

Create a cron job without specifying model/provider
Change llama.cpp to load a different model (or restart it)
Run the cron job → gets RuntimeError: Connection error

Environment

Hermes Agent 0.12.0
Custom llama.cpp provider (localhost:8080)

extent analysis

TL;DR

Modify the cron job creation logic to query the /v1/models endpoint at runtime to discover the available model on the local llama.cpp server.

Guidance

Update the scheduler.py file to use the proposed expected behavior code, which queries the /v1/models endpoint to auto-discover the available model.
Verify that the base_url variable is correctly set to the llama.cpp server URL (e.g., http://localhost:8080).
Test the updated cron job creation logic by reproducing the issue and checking that the correct model is used and a clear error message is given if the server is unreachable.
Consider adding error handling for cases where the /v1/models endpoint returns an error or no available models are found.

Example

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

Notes

This solution assumes that the /v1/models endpoint returns a JSON response with a data key containing a list of available models. If the endpoint returns a different format, the code may need to be adjusted accordingly.

Recommendation

Apply the proposed workaround by updating the scheduler.py file to use the auto-discovery logic, as it provides a more resilient and flexible solution for local inference users.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

#authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Cron jobs fail silently when llama.cpp model changes — should query /v1/models at runtime [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #20150: fix: auto-discover model from local inference server in cron scheduler

Description (problem / solution / changelog)

Problem

Fix

1. Auto-discover model from /v1/models

2. Clear error for unreachable local servers

Why this matters

Changed files

Code Example

Bug Report

What happened

What should happen

Current behavior (scheduler.py line ~946-960)

Expected behavior

Why this matters

Reproduction

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Auto-discover model from `/v1/models`