hermes - 💡(How to fix) Fix Question: 24GB VRAM local models for Kanban workers - real-world results? [1 participants]

hermes2026-05-13 14:21:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#25041•Fetched 2026-05-14 03:49:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

kxkayser

Participants

kxkayser

Timeline (top)

labeled ×2

Root Cause

Root cause: Hermes's massive system prompt (+10K tokens, 30+ tool schemas, memory injection) overwhelms small models. Cloud models work fine.

Fix Action

Fix / Workaround

All using context_length patch (Issue #24072 applies, but model capability is the blocker).

Do Qwen3-30B/32B reliably call tools (kanban_show, terminal, write_file, etc.) in complex chains?
Do they handle multi-step Kanban tasks or crash like 9B/14B models?
Are timeout patches needed (Issue #3404 - hardcoded 30s aux/45s compression timeout)?
Any specific config tweaks beyond context_length: 128000?

#523: Local Model Setup Skill - model recommendations
#3404: Configurable timeouts for local models
#24072: Context length override (patch works for loading but not tool reliability)
#2074: Ollama models not recognizing environment (similar problem)

RAW_BUFFERClick to expand / collapse

tl;dr: Has anyone successfully run Hermes Kanban workers on local models with 24GB VRAM? Looking for real-world data before GPU upgrade.

Background

I currently run Hermes with 12GB VRAM. Testing local models for Kanban workers:

qwen3.5:9b (6.6GB): Crashes after 28-91s, hallucinates bash commands instead of using tools
qwen2.5:14b (9GB): Crashes after 12-33s, overwhelmed by system prompt
gemma4:e4b (9.6GB): Same failures

All using context_length patch (Issue #24072 applies, but model capability is the blocker).

Root cause: Hermes's massive system prompt (+10K tokens, 30+ tool schemas, memory injection) overwhelms small models. Cloud models work fine.

The Question

I'm considering a 24GB GPU upgrade specifically for local Kanban agents. Based on #523 (Local Model Setup Skill), these fit:

Model	Size	Context	Recommendation
Qwen3-30B-A3B	~17GB	128K	"Best accuracy/efficiency balance"
Qwen3-32B	~20GB	128K	"Dense model, highest accuracy"

Has anyone tested these with Kanban workers specifically?

What I need to know:

Do Qwen3-30B/32B reliably call tools (kanban_show, terminal, write_file, etc.) in complex chains?
Do they handle multi-step Kanban tasks or crash like 9B/14B models?
Are timeout patches needed (Issue #3404 - hardcoded 30s aux/45s compression timeout)?
Any specific config tweaks beyond context_length: 128000?

Environment

Goal: Async Kanban agents with local models
Current: 12GB VRAM → considering 24GB upgrade
Use case: Research/data analysis tasks via analyst, researcher profiles
Alternative: Keep cloud models (working, has costs)

Would appreciate any real-world experience before spending $1500 on a new GPU.

Related Issues

#523: Local Model Setup Skill - model recommendations
#3404: Configurable timeouts for local models
#24072: Context length override (patch works for loading but not tool reliability)
#2074: Ollama models not recognizing environment (similar problem)

If no one has tested this, I can volunteer to be the guinea pig and document results if there's interest.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#indexing error #inference speed #output truncation #response parsing #generation error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Question: 24GB VRAM local models for Kanban workers - real-world results? [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Background

The Question

What I need to know:

Environment

Related Issues

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Question: 24GB VRAM local models for Kanban workers - real-world results? [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Background

The Question

What I need to know:

Environment

Related Issues

Still need to ship something?

RELATED_DISCOVERY

TRENDING