hermes - 💡(How to fix) Fix Question: 24GB VRAM local models for Kanban workers - real-world results? [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25041Fetched 2026-05-14 03:49:31
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
labeled ×2

Root Cause

Root cause: Hermes's massive system prompt (+10K tokens, 30+ tool schemas, memory injection) overwhelms small models. Cloud models work fine.

Fix Action

Fix / Workaround

All using context_length patch (Issue #24072 applies, but model capability is the blocker).

  1. Do Qwen3-30B/32B reliably call tools (kanban_show, terminal, write_file, etc.) in complex chains?
  2. Do they handle multi-step Kanban tasks or crash like 9B/14B models?
  3. Are timeout patches needed (Issue #3404 - hardcoded 30s aux/45s compression timeout)?
  4. Any specific config tweaks beyond context_length: 128000?
  • #523: Local Model Setup Skill - model recommendations
  • #3404: Configurable timeouts for local models
  • #24072: Context length override (patch works for loading but not tool reliability)
  • #2074: Ollama models not recognizing environment (similar problem)
RAW_BUFFERClick to expand / collapse

tl;dr: Has anyone successfully run Hermes Kanban workers on local models with 24GB VRAM? Looking for real-world data before GPU upgrade.

Background

I currently run Hermes with 12GB VRAM. Testing local models for Kanban workers:

  • qwen3.5:9b (6.6GB): Crashes after 28-91s, hallucinates bash commands instead of using tools
  • qwen2.5:14b (9GB): Crashes after 12-33s, overwhelmed by system prompt
  • gemma4:e4b (9.6GB): Same failures

All using context_length patch (Issue #24072 applies, but model capability is the blocker).

Root cause: Hermes's massive system prompt (+10K tokens, 30+ tool schemas, memory injection) overwhelms small models. Cloud models work fine.

The Question

I'm considering a 24GB GPU upgrade specifically for local Kanban agents. Based on #523 (Local Model Setup Skill), these fit:

ModelSizeContextRecommendation
Qwen3-30B-A3B~17GB128K"Best accuracy/efficiency balance"
Qwen3-32B~20GB128K"Dense model, highest accuracy"

Has anyone tested these with Kanban workers specifically?

What I need to know:

  1. Do Qwen3-30B/32B reliably call tools (kanban_show, terminal, write_file, etc.) in complex chains?
  2. Do they handle multi-step Kanban tasks or crash like 9B/14B models?
  3. Are timeout patches needed (Issue #3404 - hardcoded 30s aux/45s compression timeout)?
  4. Any specific config tweaks beyond context_length: 128000?

Environment

  • Goal: Async Kanban agents with local models
  • Current: 12GB VRAM → considering 24GB upgrade
  • Use case: Research/data analysis tasks via analyst, researcher profiles
  • Alternative: Keep cloud models (working, has costs)

Would appreciate any real-world experience before spending $1500 on a new GPU.

Related Issues

  • #523: Local Model Setup Skill - model recommendations
  • #3404: Configurable timeouts for local models
  • #24072: Context length override (patch works for loading but not tool reliability)
  • #2074: Ollama models not recognizing environment (similar problem)

If no one has tested this, I can volunteer to be the guinea pig and document results if there's interest.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING