Code Example

http://localhost:8080/v1

---

http://localhost:8000/v1

---

llama-server.exe ^
  -m "qwen2.5-coder-7b-instruct-q4_k_m.gguf" ^
  -c 64000 ^
  -ngl 35 ^
  -t 6 ^
  -b 128 ^
  --flash-attn on ^
  --jinja ^
  --host 127.0.0.1 ^
  --port 8000

---

2O)..
.O:Otest her).
文件:):O her

---

"script": "script" "script"

---

GET /v1/models -> 200
GET /health -> 200
POST /v1/chat/completions -> 200

---

GET /v1/chat/completions -> 404

---



---

Welcome to Hermes Agent! Type your message or /help for commands.
✦ Tip: /profile shows which profile is active and its home directory.

  ⚠ tirith security scanner enabled but not available — command scanning will use pattern matching only

────────────────────────────────────────
● create a new tool which displays a message box on windows 10 screen  when hermes completes answering the user query
Initializing agent...

────────────────────────────────────────

╭─ ⚕ Hermes ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    the script: "test": " in." "" " "script" " ":
    "script " "script" " "n" " " "":: " "auto" '"" ' ' "script" " " " "" " "previous":` " " "" "
    " "": "previous" " " "
    "auto` "auto "auto" " " " " " " " "
     "auto" " " script" " "".g
     ""
     " " " " "current" " "" "e

    "current""" " script" """: "script "" " "script" " " " " " ""script"
    ": "": " ""


╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ⚕ qwen2.5-coder-7b-instru... │ 19.8K/64K │ [███░░░░░░░] 31% │ 1m │ ⏲ 38s

What's Going Wrong?

Hermes Agent + llama.cpp + Qwen/DeepSeek Local Models Produce Corrupted Output (but Gemma works with same config).

Environment

OS: Windows 10
GPU: RTX 3050 6GB
Backend: llama.cpp (llama-server.exe)
Hermes Agent: latest
API Mode: OpenAI-compatible endpoint
Context tested:
- 4096
- 8192
- 16384
- 64000

Working Configuration

Model

Gemma4-4b-E4B-it-Q4_K_M.gguf

Endpoint

http://localhost:8080/v1

Result

Hermes works correctly.

Broken Configurations

Models

DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf

Endpoint

http://localhost:8000/v1

llama.cpp launch

llama-server.exe ^
  -m "qwen2.5-coder-7b-instruct-q4_k_m.gguf" ^
  -c 64000 ^
  -ngl 35 ^
  -t 6 ^
  -b 128 ^
  --flash-attn on ^
  --jinja ^
  --host 127.0.0.1 ^
  --port 8000

hermes config.yaml

context_window: 8192 max_tokens: 2048

Symptoms

Hermes initializes successfully but generated output becomes corrupted/gibberish.

Example outputs:

2O)..
.O:Otest her).
文件:):O her

and:

"script": "script" "script"

The issue occurs consistently with:

Qwen2.5-Coder-7B
DeepSeek-R1-Distill-Qwen-7B

but NOT with Gemma4-4B.

Additional Observations

llama.cpp server appears healthy

These endpoints work:

GET /v1/models -> 200
GET /health -> 200
POST /v1/chat/completions -> 200

Hermes sometimes sends:

GET /v1/chat/completions -> 404

Possibly harmless, but mentioning for completeness.

Tested Fixes

Tried:

reducing context from 64K to 8K
using /v1 endpoint correctly
enabling --jinja
disabling giant context
changing batch sizes
changing quantization
switching models

Issue still persists only for Qwen/DeepSeek models inside Hermes.

Suspected Causes

Possibly one of:

prompt template incompatibility
tokenizer mismatch
malformed system prompt formatting
OpenAI compatibility parsing issue
Hermes orchestration prompt incompatibility with Qwen/DeepSeek GGUFs
llama.cpp chat template handling

Request

Would appreciate guidance on:

recommended llama.cpp settings for Hermes
officially supported local model formats
whether Qwen/DeepSeek require specific chat template settings
whether Hermes currently assumes Gemma/Llama-style formatting internally
whether there are known issues with Qwen-based GGUF models

Steps Taken

1.Tested hermes with qwen2.5-coder--7b and deepseek-r1-7b 2.Changed context from 4096, 8192, 16384, 64000 in config.yaml

Installation Method

PowerShell installer (Windows)

Operating System

Windows 10 64b

Python Version

No response

Hermes Version

No response

Debug Report

Full Error Output

Welcome to Hermes Agent! Type your message or /help for commands.
✦ Tip: /profile shows which profile is active and its home directory.

  ⚠ tirith security scanner enabled but not available — command scanning will use pattern matching only

────────────────────────────────────────
● create a new tool which displays a message box on windows 10 screen  when hermes completes answering the user query
Initializing agent...

────────────────────────────────────────

╭─ ⚕ Hermes ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    the script: "test": " in." "" " "script" " ":
    "script " "script" " "n" " " "":: " "auto" '"" ' ' "script" " " " "" " "previous":` " " "" "
    " "": "previous" " " "
    "auto` "auto "auto" " " " " " " " "
     "auto" " " script" " "".g
     ""
     " " " " "current" " "" "e

    "current""" " script" """: "script "" " "script" " " " " " ""script"
    ": "": " ""


╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ⚕ qwen2.5-coder-7b-instru... │ 19.8K/64K │ [███░░░░░░░] 31% │ 1m │ ⏲ 38s

What I've Already Tried

No response

hermes - 💡(How to fix) Fix Hermes Agent + llama.cpp + Qwen/DeepSeek Local Models Produce Corrupted Output(but Gemma works with same config)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Full Error Output

Code Example

What's Going Wrong?

Hermes Agent + llama.cpp + Qwen/DeepSeek Local Models Produce Corrupted Output (but Gemma works with same config).

Environment

Working Configuration

Model

Endpoint

Result

Broken Configurations

Models

Endpoint

llama.cpp launch

hermes config.yaml

context_window: 8192 max_tokens: 2048

Symptoms

Additional Observations

llama.cpp server appears healthy

Hermes sometimes sends:

Tested Fixes

Suspected Causes

Request

Steps Taken

Installation Method

Operating System

Python Version

Hermes Version

Debug Report

Full Error Output

What I've Already Tried

Still need to ship something?

TRENDING