hermes - 💡(How to fix) Fix [Bug]: Hermes-Agent keeps probing non-existent google/gemini-3-flash-preview model when using LM Studio OpenAI-compatible API

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Environment:

Hermes-Agent: latest version Backend: LM Studio OpenAI-compatible API OS: macOS

Local model:

google/gemma-4-26b-a4b

Observed LM Studio logs:

2026-05-11 22:23:21 [DEBUG] Received request: GET to /api/v1/models

2026-05-11 22:23:21 [INFO] Returning 3 models from v1 API

2026-05-11 22:23:21 [DEBUG] Received request: GET to /v1/models/google/gemini-3-flash-preview

2026-05-11 22:23:21 [ERROR] Error: Model with identifier 'google/gemini-3-flash-preview' not found

2026-05-11 22:23:21 [DEBUG] Received request: GET to /v1/models

2026-05-11 22:23:21 [INFO] Returning { "data": [ { "id": "google/gemma-4-26b-a4b", "object": "model", "owned_by": "organization_owner" } ], "object": "list" } After LM Studio loads the model and is ready, if I'm lucky enough that my configured gemma4-26b-a4b works, then sending a message in the Hermes CLI will take three minutes to trigger LM Studio to start working. The logs in LM Studio are as follows:

2026-05-12 09:39:33 [DEBUG]

warmup: warmup with image size = 768 x 768

2026-05-12 09:39:33 [DEBUG]

alloc_compute_meta: MTL0 compute buffer size = 150.63 MiB

alloc_compute_meta: CPU compute buffer size = 6.77 MiB

alloc_compute_meta: graph splits = 1, nodes = 1569

2026-05-12 09:39:33 [DEBUG]

warmup: flash attention is enabled

srv load_model: loaded multimodal

Root Cause

Root Cause Analysis (optional)

Code Example

Environment:

Hermes-Agent: latest version
Backend: LM Studio OpenAI-compatible API
OS: macOS

Local model:

google/gemma-4-26b-a4b

Observed LM Studio logs:

2026-05-11 22:23:21 [DEBUG]
Received request: GET to /api/v1/models

2026-05-11 22:23:21 [INFO]
Returning 3 models from v1 API

2026-05-11 22:23:21 [DEBUG]
Received request: GET to /v1/models/google/gemini-3-flash-preview

2026-05-11 22:23:21 [ERROR]
Error: Model with identifier 'google/gemini-3-flash-preview' not found

2026-05-11 22:23:21 [DEBUG]
Received request: GET to /v1/models

2026-05-11 22:23:21 [INFO]
Returning {
  "data": [
    {
      "id": "google/gemma-4-26b-a4b",
      "object": "model",
      "owned_by": "organization_owner"
    }
  ],
  "object": "list"
}
After LM Studio loads the model and is ready, if I'm lucky enough that my configured gemma4-26b-a4b works, then sending a message in the Hermes CLI will take three minutes to trigger LM Studio to start working. The logs in LM Studio are as follows:

2026-05-12 09:39:33 [DEBUG]

warmup: warmup with image size = 768 x 768

2026-05-12 09:39:33 [DEBUG]

alloc_compute_meta: MTL0 compute buffer size = 150.63 MiB

alloc_compute_meta: CPU compute buffer size = 6.77 MiB

alloc_compute_meta: graph splits = 1, nodes = 1569

2026-05-12 09:39:33 [DEBUG]

warmup: flash attention is enabled

srv load_model: loaded multimodal

---

No Python traceback observed.

The issue appears during model probing / provider initialization.
RAW_BUFFERClick to expand / collapse

Bug Description

When using Hermes-Agent with LM Studio OpenAI-compatible local API, Hermes-Agent repeatedly attempts to query a non-existent model:

google/gemini-3-flash-preview

even though LM Studio only exposes locally loaded models such as:

google/gemma-4-26b-a4b

This causes repeated model probe failures and prevents stable initialization / usage.

The issue appears to come from Hermes-Agent internally defaulting to a Gemini provider model name instead of respecting the actual /v1/models response returned by LM Studio.

Steps to Reproduce

Install and launch LM Studio

Load a local model:

google/gemma-4-26b-a4b Enable LM Studio OpenAI-compatible API server Configure Hermes-Agent to use the LM Studio endpoint Start Hermes-Agent

LM Studio logs then show repeated requests like:

GET /api/v1/models GET /v1/models/google/gemini-3-flash-preview

followed by:

Error: Model with identifier 'google/gemini-3-flash-preview' not found

Expected Behavior

Hermes-Agent should:

Respect the model IDs returned by:

GET /v1/models

Use the actually available local model:

google/gemma-4-26b-a4b Avoid probing hardcoded Gemini cloud model names unless explicitly configured

Actual Behavior

Hermes-Agent repeatedly probes:

google/gemini-3-flash-preview

even though:

GET /v1/models

returns:

{ "data": [ { "id": "google/gemma-4-26b-a4b", "object": "model", "owned_by": "organization_owner" } ], "object": "list" }

This leads to repeated initialization errors and model lookup failures.

Affected Component

CLI (interactive chat)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Environment:

Hermes-Agent: latest version
Backend: LM Studio OpenAI-compatible API
OS: macOS

Local model:

google/gemma-4-26b-a4b

Observed LM Studio logs:

2026-05-11 22:23:21 [DEBUG]
Received request: GET to /api/v1/models

2026-05-11 22:23:21 [INFO]
Returning 3 models from v1 API

2026-05-11 22:23:21 [DEBUG]
Received request: GET to /v1/models/google/gemini-3-flash-preview

2026-05-11 22:23:21 [ERROR]
Error: Model with identifier 'google/gemini-3-flash-preview' not found

2026-05-11 22:23:21 [DEBUG]
Received request: GET to /v1/models

2026-05-11 22:23:21 [INFO]
Returning {
  "data": [
    {
      "id": "google/gemma-4-26b-a4b",
      "object": "model",
      "owned_by": "organization_owner"
    }
  ],
  "object": "list"
}
After LM Studio loads the model and is ready, if I'm lucky enough that my configured gemma4-26b-a4b works, then sending a message in the Hermes CLI will take three minutes to trigger LM Studio to start working. The logs in LM Studio are as follows:

2026-05-12 09:39:33 [DEBUG]

warmup: warmup with image size = 768 x 768

2026-05-12 09:39:33 [DEBUG]

alloc_compute_meta: MTL0 compute buffer size = 150.63 MiB

alloc_compute_meta: CPU compute buffer size = 6.77 MiB

alloc_compute_meta: graph splits = 1, nodes = 1569

2026-05-12 09:39:33 [DEBUG]

warmup: flash attention is enabled

srv load_model: loaded multimodal

Operating System

Ubuntu 24.04

Python Version

3.11.15

Hermes Version

0.13.0

Additional Logs / Traceback (optional)

No Python traceback observed.

The issue appears during model probing / provider initialization.

Root Cause Analysis (optional)

Possible causes:

Hermes-Agent may contain a hardcoded default Gemini model:

google/gemini-3-flash-preview Hermes-Agent may be mixing: Google Gemini provider logic OpenAI-compatible provider logic Hermes-Agent may ignore actual /v1/models responses and probe a fallback/default model instead. Model auto-discovery logic may not correctly support LM Studio local model IDs.

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING