hermes - 💡(How to fix) Fix Hermes Agent + llama.cpp + Qwen/DeepSeek Local Models Produce Corrupted Output(but Gemma works with same config)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Full Error Output

Code Example

http://localhost:8080/v1

---

http://localhost:8000/v1

---

llama-server.exe ^
  -m "qwen2.5-coder-7b-instruct-q4_k_m.gguf" ^
  -c 64000 ^
  -ngl 35 ^
  -t 6 ^
  -b 128 ^
  --flash-attn on ^
  --jinja ^
  --host 127.0.0.1 ^
  --port 8000

---

2O)..
.O:Otest her).
文件:):O her

---

"script": "script" "script"

---

GET /v1/models -> 200
GET /health -> 200
POST /v1/chat/completions -> 200

---

GET /v1/chat/completions -> 404

---



---

Welcome to Hermes Agent! Type your message or /help for commands.
 Tip: /profile shows which profile is active and its home directory.

   tirith security scanner enabled but not available — command scanning will use pattern matching only

────────────────────────────────────────
● create a new tool which displays a message box on windows 10 screen  when hermes completes answering the user query
Initializing agent...

────────────────────────────────────────

╭─ ⚕ Hermes ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    the script: "test": " in." "" " "script" " ":
    "script " "script" " "n" " " "":: " "auto" '"" ' ' "script" " " " "" " "previous":` " " "" "
    " "": "previous" " " "
    "auto` "auto "auto" " " " " " " " "
     "auto" " " script" " "".g
     ""
     " " " " "current" " "" "e

    "current""" " script" """: "script "" " "script" " " " " " ""script"
    ": "": " ""


╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ⚕ qwen2.5-coder-7b-instru...19.8K/64K │ [███░░░░░░░] 31% │ 1m │ ⏲ 38s
RAW_BUFFERClick to expand / collapse

What's Going Wrong?

Hermes Agent + llama.cpp + Qwen/DeepSeek Local Models Produce Corrupted Output (but Gemma works with same config).

Environment

  • OS: Windows 10

  • GPU: RTX 3050 6GB

  • Backend: llama.cpp (llama-server.exe)

  • Hermes Agent: latest

  • API Mode: OpenAI-compatible endpoint

  • Context tested:

    • 4096
    • 8192
    • 16384
    • 64000

Working Configuration

Model

Gemma4-4b-E4B-it-Q4_K_M.gguf

Endpoint

http://localhost:8080/v1

Result

Hermes works correctly.


Broken Configurations

Models

  • DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
  • Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf

Endpoint

http://localhost:8000/v1

llama.cpp launch

llama-server.exe ^
  -m "qwen2.5-coder-7b-instruct-q4_k_m.gguf" ^
  -c 64000 ^
  -ngl 35 ^
  -t 6 ^
  -b 128 ^
  --flash-attn on ^
  --jinja ^
  --host 127.0.0.1 ^
  --port 8000

hermes config.yaml

context_window: 8192 max_tokens: 2048

Symptoms

Hermes initializes successfully but generated output becomes corrupted/gibberish.

Example outputs:

2O)..
.O:Otest her).
文件:):O her

and:

"script": "script" "script"

The issue occurs consistently with:

  • Qwen2.5-Coder-7B
  • DeepSeek-R1-Distill-Qwen-7B

but NOT with Gemma4-4B.


Additional Observations

llama.cpp server appears healthy

These endpoints work:

GET /v1/models -> 200
GET /health -> 200
POST /v1/chat/completions -> 200

Hermes sometimes sends:

GET /v1/chat/completions -> 404

Possibly harmless, but mentioning for completeness.


Tested Fixes

Tried:

  • reducing context from 64K to 8K
  • using /v1 endpoint correctly
  • enabling --jinja
  • disabling giant context
  • changing batch sizes
  • changing quantization
  • switching models

Issue still persists only for Qwen/DeepSeek models inside Hermes.


Suspected Causes

Possibly one of:

  • prompt template incompatibility
  • tokenizer mismatch
  • malformed system prompt formatting
  • OpenAI compatibility parsing issue
  • Hermes orchestration prompt incompatibility with Qwen/DeepSeek GGUFs
  • llama.cpp chat template handling

Request

Would appreciate guidance on:

  • recommended llama.cpp settings for Hermes
  • officially supported local model formats
  • whether Qwen/DeepSeek require specific chat template settings
  • whether Hermes currently assumes Gemma/Llama-style formatting internally
  • whether there are known issues with Qwen-based GGUF models

Steps Taken

1.Tested hermes with qwen2.5-coder--7b and deepseek-r1-7b 2.Changed context from 4096, 8192, 16384, 64000 in config.yaml

Installation Method

PowerShell installer (Windows)

Operating System

Windows 10 64b

Python Version

No response

Hermes Version

No response

Debug Report

Full Error Output

Welcome to Hermes Agent! Type your message or /help for commands.
✦ Tip: /profile shows which profile is active and its home directory.

  ⚠ tirith security scanner enabled but not available — command scanning will use pattern matching only

────────────────────────────────────────
● create a new tool which displays a message box on windows 10 screen  when hermes completes answering the user query
Initializing agent...

────────────────────────────────────────

╭─ ⚕ Hermes ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    the script: "test": " in." "" " "script" " ":
    "script " "script" " "n" " " "":: " "auto" '"" ' ' "script" " " " "" " "previous":` " " "" "
    " "": "previous" " " "
    "auto` "auto "auto" " " " " " " " "
     "auto" " " script" " "".g
     ""
     " " " " "current" " "" "e

    "current""" " script" """: "script "" " "script" " " " " " ""script"
    ": "": " ""


╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ⚕ qwen2.5-coder-7b-instru... │ 19.8K/64K │ [███░░░░░░░] 31% │ 1m │ ⏲ 38s

What I've Already Tried

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Hermes Agent + llama.cpp + Qwen/DeepSeek Local Models Produce Corrupted Output(but Gemma works with same config)