hermes - ✅(Solved) Fix Cron jobs fail silently when llama.cpp model changes — should query /v1/models at runtime [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20125Fetched 2026-05-06 06:38:37
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×3commented ×1cross-referenced ×1

Error Message

Cron jobs without an explicit model and provider pinned fail with a generic RuntimeError: Connection error when llama.cpp is running a different model than what is in config.yaml, or when llama.cpp is not running at all. 3. Give a clear error message if the server is unreachable (e.g., "llama.cpp server not reachable at localhost:8080") instead of the generic "Connection error" 3. Run the cron job → gets RuntimeError: Connection error

Root Cause

Local llama.cpp users frequently swap models:

  • Server crashes, user loads a different model
  • User switches models mid-session via /model
  • User has multiple profiles with different models

Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions. The current behavior requires manually pinning every cron job to a specific model, which defeats the purpose of local inference flexibility.

Fix Action

Fixed

PR fix notes

PR #20150: fix: auto-discover model from local inference server in cron scheduler

Description (problem / solution / changelog)

Problem

Cron jobs without an explicit model and provider pinned fail with a generic RuntimeError: Connection error when:

  1. The llama.cpp / Ollama / vLLM server is running a different model than what's in config.yaml
  2. The local inference server is not running at all

The current code hardcodes the model from config.yaml at cron job creation time (scheduler.py ~line 1075-1091). If the user swaps models in llama.cpp (very common — crashing, swapping, multiple profiles), cron jobs break silently with no actionable error message.

Fix

Two changes in cron/scheduler.py, inserted after the runtime provider resolution block:

1. Auto-discover model from /v1/models

When no explicit model is pinned by the job (job.get("model") is empty), the scheduler now queries {base_url}/v1/models to discover what model is actually loaded on the inference server. Uses the first available model. This is a best-effort check — failures are logged at debug level and the existing fallback to config.yaml still applies.

2. Clear error for unreachable local servers

When the resolved base_url points to a local server (localhost, 127.0.0.1, [::1]), the scheduler probes /v1/models with a 5-second timeout. If the server is unreachable, it raises a clear RuntimeError:

Local inference server not reachable at http://localhost:8080. Is llama.cpp / Ollama / vLLM running?

Instead of the generic SDK "Connection error" that gives no diagnostic information.

Why this matters

Local inference users frequently swap models:

  • Server crashes → user loads a different model
  • User switches models via /model command
  • Multiple profiles with different models
  • Ollama users pull new models regularly

Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions.

Fixes #20125

Changed files

  • cron/scheduler.py (modified, +46/-0)

Code Example

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
# Falls back to config.yaml model.default
if not job.get("model"):
    model = _model_cfg.get("default", model)

---

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")
RAW_BUFFERClick to expand / collapse

Bug Report

What happened

Cron jobs without an explicit model and provider pinned fail with a generic RuntimeError: Connection error when llama.cpp is running a different model than what is in config.yaml, or when llama.cpp is not running at all.

What should happen

Cron jobs without explicit model config should:

  1. Query the /v1/models endpoint at runtime to discover what is actually loaded on the local llama.cpp server
  2. Use whatever model is available (or pick the first one)
  3. Give a clear error message if the server is unreachable (e.g., "llama.cpp server not reachable at localhost:8080") instead of the generic "Connection error"

Current behavior (scheduler.py line ~946-960)

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
# Falls back to config.yaml model.default
if not job.get("model"):
    model = _model_cfg.get("default", model)

This hardcodes the config.yaml model at cron job creation time. If the user changes their llama.cpp model (very common with local inference — crashing, swapping models, etc.), cron jobs break silently.

Expected behavior

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

Why this matters

Local llama.cpp users frequently swap models:

  • Server crashes, user loads a different model
  • User switches models mid-session via /model
  • User has multiple profiles with different models

Cron jobs should be resilient to this — they are scheduled background tasks, not interactive sessions. The current behavior requires manually pinning every cron job to a specific model, which defeats the purpose of local inference flexibility.

Reproduction

  1. Create a cron job without specifying model/provider
  2. Change llama.cpp to load a different model (or restart it)
  3. Run the cron job → gets RuntimeError: Connection error

Environment

  • Hermes Agent 0.12.0
  • Custom llama.cpp provider (localhost:8080)

extent analysis

TL;DR

Modify the cron job creation logic to query the /v1/models endpoint at runtime to discover the available model on the local llama.cpp server.

Guidance

  • Update the scheduler.py file to use the proposed expected behavior code, which queries the /v1/models endpoint to auto-discover the available model.
  • Verify that the base_url variable is correctly set to the llama.cpp server URL (e.g., http://localhost:8080).
  • Test the updated cron job creation logic by reproducing the issue and checking that the correct model is used and a clear error message is given if the server is unreachable.
  • Consider adding error handling for cases where the /v1/models endpoint returns an error or no available models are found.

Example

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

Notes

This solution assumes that the /v1/models endpoint returns a JSON response with a data key containing a list of available models. If the endpoint returns a different format, the code may need to be adjusted accordingly.

Recommendation

Apply the proposed workaround by updating the scheduler.py file to use the auto-discovery logic, as it provides a more resilient and flexible solution for local inference users.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

model = job.get("model") or os.getenv("HERMES_MODEL") or ""
if not model:
    # Try to auto-discover from running server
    available = requests.get(f"{base_url}/v1/models", timeout=5).json()
    if available.get("data"):
        model = available["data"][0]["id"]  # Use whatever is loaded
    else:
        model = _model_cfg.get("default", "")

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Cron jobs fail silently when llama.cpp model changes — should query /v1/models at runtime [1 pull requests, 1 comments, 2 participants]