hermes - 💡(How to fix) Fix [i18n] Thai Translation: Features Part 1c - Delegation, Fallback Providers [1 participants]

hermes2026-04-23 17:21:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14661•Fetched 2026-04-24 06:15:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

nanobro

Participants

nanobro

Timeline (top)

labeled ×2

Feature	Fallback Mechanism	Config Location
Main agent model	`fallback_model` ใน config.yaml - one-shot failover on errors	`fallback_model:` (top-level)
Vision	Auto-detection chain + internal OpenRouter retry	`auxiliary.vision`
Web extraction	Auto-detection chain + internal OpenRouter retry	`auxiliary.web_extract`
Context compression	Auto-detection chain, degrades to no-summary if unavailable	`auxiliary.compression`
Session search	Auto-detection chain	`auxiliary.session_search`
Skills hub	Auto-detection chain	`auxiliary.skills_hub`
MCP helpers	Auto-detection chain	`auxiliary.mcp`
Memory flush	Auto-detection chain	`auxiliary.flush_memories`
Approval classification	Auto-detection chain	`auxiliary.approval`
Title generation	Auto-detection chain	`auxiliary.title_generation`
Delegation	Provider override only (no automatic fallback)	`delegation.provider` / `delegation.model`
Cron jobs	Per-job provider override only (no automatic fallback)	Per-job `provider` / `model`

Error Message

delegate_task( goal="Debug why tests fail", context="Error: assertion in test_foo.py line 42", toolsets=["terminal", "file"] )

Root Cause

Feature	Fallback Mechanism	Config Location
Main agent model	`fallback_model` ใน config.yaml - one-shot failover on errors	`fallback_model:` (top-level)
Vision	Auto-detection chain + internal OpenRouter retry	`auxiliary.vision`
Web extraction	Auto-detection chain + internal OpenRouter retry	`auxiliary.web_extract`
Context compression	Auto-detection chain, degrades to no-summary if unavailable	`auxiliary.compression`
Session search	Auto-detection chain	`auxiliary.session_search`
Skills hub	Auto-detection chain	`auxiliary.skills_hub`
MCP helpers	Auto-detection chain	`auxiliary.mcp`
Memory flush	Auto-detection chain	`auxiliary.flush_memories`
Approval classification	Auto-detection chain	`auxiliary.approval`
Title generation	Auto-detection chain	`auxiliary.title_generation`
Delegation	Provider override only (no automatic fallback)	`delegation.provider` / `delegation.model`
Cron jobs	Per-job provider override only (no automatic fallback)	Per-job `provider` / `model`

Code Example

delegate_task(
    goal="Debug why tests fail",
    context="Error: assertion in test_foo.py line 42",
    toolsets=["terminal", "file"]
)

---

delegate_task(tasks=[
    {"goal": "Research topic A", "toolsets": ["web"]},
    {"goal": "Research topic B", "toolsets": ["web"]},
    {"goal": "Fix the build", "toolsets": ["terminal", "file"]}
])

---

# BAD - subagent has no idea what "the error" is
delegate_task(goal="Fix the error")

# GOOD - subagent has all context it needs
delegate_task(
    goal="Fix the TypeError in api/handlers.py",
    context="""The file api/handlers.py has a TypeError on line 47:
    'NoneType' object has no attribute 'get'.
    The function process_request() receives a dict from parse_body(),
    but parse_body() returns None when Content-Type is missing.
    The project is at /home/user/myproject and uses Python 3.11."""
)

---

delegate_task(tasks=[
    {
        "goal": "Research the current state of WebAssembly in 2025",
        "context": "Focus on: browser support, non-browser runtimes, language support",
        "toolsets": ["web"]
    },
    {
        "goal": "Research the current state of RISC-V adoption in 2025",
        "context": "Focus on: server chips, embedded systems, software ecosystem",
        "toolsets": ["web"]
    },
    {
        "goal": "Research quantum computing progress in 2025",
        "context": "Focus on: error correction breakthroughs, practical applications, key players",
        "toolsets": ["web"]
    }
])

---

delegate_task(
    goal="Review the authentication module for security issues and fix any found",
    context="""Project at /home/user/webapp.
    Auth module files: src/auth/login.py, src/auth/jwt.py, src/auth/middleware.py.
    The project uses Flask, PyJWT, and bcrypt.
    Focus on: SQL injection, JWT validation, password handling, session management.
    Fix any issues found and run the test suite (pytest tests/auth/).""",
    toolsets=["terminal", "file"]
)

---

delegate_task(
    goal="Refactor all Python files in src/ to replace print() with proper logging",
    context="""Project at /home/user/myproject.
    Use the 'logging' module with logger = logging.getLogger(__name__).
    Replace print() calls with appropriate log levels:
    - print(f"Error: ...") -> logger.error(...)
    - print(f"Warning: ...") -> logger.warning(...)
    - print(f"Debug: ...") -> logger.debug(...)
    - Other prints -> logger.info(...)
    Don't change print() in test files or CLI output.
    Run pytest after to verify nothing broke.""",
    toolsets=["terminal", "file"]
)

---

# In ~/.hermes/config.yaml
delegation:
  model: "google/gemini-flash-2.0"    # Cheaper model for subagents
  provider: "openrouter"              # Optional: route subagents to a different provider

---

delegate_task(
    goal="Quick file check",
    context="Check if /etc/nginx/nginx.conf exists and print its first 10 lines",
    max_iterations=10  # Simple task, don't need many turns
)

---

delegate_task(
    goal="Survey three code review approaches and recommend one",
    role="orchestrator",  # Allows this child to spawn its own workers
    context="...",
)

---

# In ~/.hermes/config.yaml
delegation:
  max_iterations: 50                        # Max turns per child (default: 50)
  # max_concurrent_children: 3              # Parallel children per batch (default: 3)
  # max_spawn_depth: 1                      # Tree depth (1-3, default 1 = flat). Raise to 2 to allow orchestrator children to spawn leaves; 3 for three levels.
  # orchestrator_enabled: true              # Disable to force all children to leaf role.
  model: "google/gemini-3-flash-preview"             # Optional provider/model override
  provider: "openrouter"                             # Optional built-in provider

# Or use a direct custom endpoint instead of provider:
delegation:
  model: "qwen2.5-coder"
  base_url: "http://localhost:1234/v1"
  api_key: "local-key"

---

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

---

fallback_model:
  provider: custom
  model: my-local-model
  base_url: http://localhost:8000/v1
  key_env: MY_LOCAL_KEY              # env var name containing the API key

---

model:
  provider: anthropic
  default: claude-sonnet-4-6

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

---

model:
  provider: openrouter
  default: anthropic/claude-opus-4

fallback_model:
  provider: nous
  model: nous-hermes-3

---

fallback_model:
  provider: custom
  model: llama-3.1-70b
  base_url: http://localhost:8000/v1
  key_env: LOCAL_API_KEY

---

fallback_model:
  provider: openai-codex
  model: gpt-5.3-codex

---

OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
API-key providers (z.ai, Kimi, MiniMax, Xiaomi MiMo, Hugging Face, Anthropic) → give up

---

Main provider (if vision-capable) → OpenRouter → Nous Portal →
Codex OAuth → Anthropic → Custom endpoint → give up

---

auxiliary:
  vision:
    provider: "auto"              # auto | openrouter | nous | codex | main | anthropic
    model: ""                     # e.g. "openai/gpt-4o"
    base_url: ""                  # direct endpoint (takes precedence over provider)
    api_key: ""                   # API key for base_url

  web_extract:
    provider: "auto"
    model: ""

  compression:
    provider: "auto"
    model: ""

  session_search:
    provider: "auto"
    model: ""
    timeout: 30
    max_concurrency: 3
    extra_body: {}

  skills_hub:
    provider: "auto"
    model: ""

  mcp:
    provider: "auto"
    model: ""

  flush_memories:
    provider: "auto"
    model: ""

---

auxiliary:
  compression:
    provider: main                                    # Same provider options as other auxiliary tasks
    model: google/gemini-3-flash-preview
    base_url: null                                   # Custom OpenAI-compatible endpoint

---

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4
  # base_url: http://localhost:8000/v1               # Optional custom endpoint

---

auxiliary:
  session_search:
    provider: main
    model: glm-4.5-air
    max_concurrency: 2
    extra_body:
      enable_thinking: false

---

auxiliary:
  vision:
    base_url: "http://localhost:1234/v1"
    api_key: "local-key"
    model: "qwen2.5-vl"

---

auxiliary:
  compression:
    provider: "auto"                              # auto | openrouter | nous | main
    model: "google/gemini-3-flash-preview"

---

delegation:
  provider: "openrouter"                      # override provider สำหรับ subagents ทั้งหมด
  model: "google/gemini-3-flash-preview"      # override model
  # base_url: "http://localhost:1234/v1"      # หรือใช้ direct endpoint
  # api_key: "local-key"

---

cronjob(
    action="create",
    schedule="every 2h",
    prompt="Check server status",
    provider="openrouter",
    model="google/gemini-3-flash-preview"
)

RAW_BUFFERClick to expand / collapse

📄 user-guide/features/delegation.md

sidebar_position: 7 title: "การมอบหมายงานให้ Subagent" description: "สร้าง child agents แบบ isolated สำหรับ workstreams แบบขนานด้วย delegate_task"

การมอบหมายงานให้ Subagent

เครื่องมือ delegate_task จะสร้าง child AIAgent instances ที่มี context แบบ isolated, toolsets ที่จำกัด, และ terminal sessions เป็นของตัวเอง แต่ละ child จะได้รับ conversation ใหม่และทำงานอย่างอิสระ โดยมีเพียง final summary ของมันเท่านั้นที่จะเข้าสู่ context ของ parent

Single Task

delegate_task(
    goal="Debug why tests fail",
    context="Error: assertion in test_foo.py line 42",
    toolsets=["terminal", "file"]
)

Parallel Batch

โดยค่าเริ่มต้นรองรับ subagents แบบ concurrent ได้สูงสุด 3 ตัว (สามารถตั้งค่าได้ ไม่มีขีดจำกัดที่แน่นอน):

delegate_task(tasks=[
    {"goal": "Research topic A", "toolsets": ["web"]},
    {"goal": "Research topic B", "toolsets": ["web"]},
    {"goal": "Fix the build", "toolsets": ["terminal", "file"]}
])

วิธีการทำงานของ Subagent Context

:::warning สำคัญ: Subagents ไม่รู้อะไรเลย Subagents จะเริ่มต้นด้วย conversation ที่สดใหม่โดยสมบูรณ์ พวกมันไม่มีความรู้เกี่ยวกับ conversation history ของ parent, prior tool calls, หรือสิ่งใดๆ ที่พูดคุยกันก่อนการมอบหมายงาน context เดียวที่ subagent มีมาจากฟิลด์ goal และ context ที่ parent agent กรอกเมื่อเรียกใช้ delegate_task :::

สิ่งนี้หมายความว่า parent agent ต้องส่ง ทุกอย่าง ที่ subagent ต้องการในการเรียกใช้:

# BAD - subagent has no idea what "the error" is
delegate_task(goal="Fix the error")

# GOOD - subagent has all context it needs
delegate_task(
    goal="Fix the TypeError in api/handlers.py",
    context="""The file api/handlers.py has a TypeError on line 47:
    'NoneType' object has no attribute 'get'.
    The function process_request() receives a dict from parse_body(),
    but parse_body() returns None when Content-Type is missing.
    The project is at /home/user/myproject and uses Python 3.11."""
)

subagent จะได้รับ system prompt ที่เน้นเฉพาะเจาะจง ซึ่งสร้างขึ้นจาก goal และ context ของคุณ โดยมีคำสั่งให้มันทำงานให้เสร็จและให้ structured summary ว่ามันทำอะไรไปบ้าง, มันค้นพบอะไร, ไฟล์ใดที่ถูกแก้ไข, และปัญหาใดที่พบ

ตัวอย่างการใช้งานจริง

การวิจัยแบบขนาน (Parallel Research)

ทำการวิจัยหลายหัวข้อพร้อมกันและรวบรวม summary:

delegate_task(tasks=[
    {
        "goal": "Research the current state of WebAssembly in 2025",
        "context": "Focus on: browser support, non-browser runtimes, language support",
        "toolsets": ["web"]
    },
    {
        "goal": "Research the current state of RISC-V adoption in 2025",
        "context": "Focus on: server chips, embedded systems, software ecosystem",
        "toolsets": ["web"]
    },
    {
        "goal": "Research quantum computing progress in 2025",
        "context": "Focus on: error correction breakthroughs, practical applications, key players",
        "toolsets": ["web"]
    }
])

Code Review + Fix

มอบหมาย workflow การตรวจสอบและแก้ไขให้ context ใหม่:

delegate_task(
    goal="Review the authentication module for security issues and fix any found",
    context="""Project at /home/user/webapp.
    Auth module files: src/auth/login.py, src/auth/jwt.py, src/auth/middleware.py.
    The project uses Flask, PyJWT, and bcrypt.
    Focus on: SQL injection, JWT validation, password handling, session management.
    Fix any issues found and run the test suite (pytest tests/auth/).""",
    toolsets=["terminal", "file"]
)

Multi-File Refactoring

มอบหมายงาน refactoring ขนาดใหญ่ที่อาจทำให้ context ของ parent ล้น:

delegate_task(
    goal="Refactor all Python files in src/ to replace print() with proper logging",
    context="""Project at /home/user/myproject.
    Use the 'logging' module with logger = logging.getLogger(__name__).
    Replace print() calls with appropriate log levels:
    - print(f"Error: ...") -> logger.error(...)
    - print(f"Warning: ...") -> logger.warning(...)
    - print(f"Debug: ...") -> logger.debug(...)
    - Other prints -> logger.info(...)
    Don't change print() in test files or CLI output.
    Run pytest after to verify nothing broke.""",
    toolsets=["terminal", "file"]
)

รายละเอียด Batch Mode

เมื่อคุณระบุ array tasks, subagents จะทำงานแบบ parallel โดยใช้ thread pool:

Maximum concurrency: โดยค่าเริ่มต้นรองรับ 3 tasks (สามารถตั้งค่าได้ผ่าน delegation.max_concurrent_children หรือ env var DELEGATION_MAX_CONCURRENT_CHILDREN; ค่าต่ำสุดคือ 1 ไม่มีขีดจำกัดที่แน่นอน) หาก batch มีขนาดใหญ่กว่าขีดจำกัด จะส่ง tool error แทนที่จะถูกตัดทอนอย่างเงียบๆ
Thread pool: ใช้ ThreadPoolExecutor โดยใช้ concurrency limit ที่ตั้งค่าไว้เป็น max workers
Progress display: ในโหมด CLI, tree-view จะแสดง tool calls จากแต่ละ subagent แบบ real-time พร้อมบรรทัดการทำงานที่เสร็จสมบูรณ์ต่อ task ในโหมด gateway, progress จะถูก batch และส่งต่อไปยัง progress callback ของ parent
Result ordering: ผลลัพธ์จะถูกเรียงตาม task index เพื่อให้ตรงกับลำดับอินพุต ไม่ว่าการทำงานจะเสร็จสมบูรณ์เมื่อใดก็ตาม
Interrupt propagation: การขัดจังหวะ parent (เช่น การส่งข้อความใหม่) จะขัดจังหวะ child ที่กำลังทำงานทั้งหมด

การมอบหมายงานแบบ single-task จะทำงานโดยตรงโดยไม่มี overhead ของ thread pool

Model Override

คุณสามารถกำหนด model ที่แตกต่างกันสำหรับ subagents ผ่าน config.yaml ซึ่งมีประโยชน์สำหรับการมอบหมายงานง่ายๆ ให้กับ model ที่มีราคาถูกกว่า/เร็วกว่า:

# In ~/.hermes/config.yaml
delegation:
  model: "google/gemini-flash-2.0"    # Cheaper model for subagents
  provider: "openrouter"              # Optional: route subagents to a different provider

หากละเว้น subagents จะใช้ model เดียวกันกับ parent

เคล็ดลับการเลือก Toolset

พารามิเตอร์ toolsets ควบคุมว่า subagent สามารถเข้าถึงเครื่องมือใดได้ เลือกตามงาน:

Toolset Pattern	Use Case
`["terminal", "file"]`	Code work, debugging, file editing, builds
`["web"]`	Research, fact-checking, documentation lookup
`["terminal", "file", "web"]`	Full-stack tasks (default)
`["file"]`	Read-only analysis, code review without execution
`["terminal"]`	System administration, process management

เครื่องมือบางชุดถูกบล็อกสำหรับ subagents ไม่ว่าคุณจะระบุอะไรก็ตาม:

delegation - ถูกบล็อกสำหรับ leaf subagents (ค่า default). สงวนไว้สำหรับ children ประเภท role="orchestrator", ถูกจำกัดด้วย max_spawn_depth — ดู Depth Limit and Nested Orchestration ด้านล่าง.
clarify - subagents ไม่สามารถโต้ตอบกับผู้ใช้ได้
memory - ไม่อนุญาตให้เขียนไปยัง shared persistent memory
code_execution - children ควรให้เหตุผลแบบ step-by-step
send_message - ไม่อนุญาตให้มี side effects ข้าม platform (เช่น การส่งข้อความ Telegram)

Max Iterations

แต่ละ subagent มีขีดจำกัดรอบการทำงาน (default: 50) ซึ่งควบคุมจำนวนรอบการเรียกใช้ tool ที่ทำได้:

delegate_task(
    goal="Quick file check",
    context="Check if /etc/nginx/nginx.conf exists and print its first 10 lines",
    max_iterations=10  # Simple task, don't need many turns
)

Depth Limit and Nested Orchestration

โดยค่าเริ่มต้น การมอบหมายงานจะ แบน (flat): parent (depth 0) จะสร้าง children (depth 1) และ children เหล่านั้นจะไม่สามารถมอบหมายงานต่อได้ สิ่งนี้ป้องกันการมอบหมายงานแบบ recursive ที่รันไม่หยุด

สำหรับ workflow แบบหลายขั้นตอน (research -> synthesis, หรือการประสานงานแบบ parallel เหนือ sub-problems), parent สามารถสร้าง children ประเภท orchestrator ที่ สามารถ มอบหมายงานให้ worker ของตัวเองได้:

delegate_task(
    goal="Survey three code review approaches and recommend one",
    role="orchestrator",  # Allows this child to spawn its own workers
    context="...",
)

role="leaf" (default): child ไม่สามารถมอบหมายงานต่อได้ — เหมือนกับพฤติกรรม flat-delegation.
role="orchestrator": child ยังคงมี toolset delegation. ถูกจำกัดด้วย delegation.max_spawn_depth (default 1 = flat, ดังนั้น role="orchestrator" จะไม่มีผลอะไรเมื่อใช้ค่า default). เพิ่ม max_spawn_depth เป็น 2 เพื่อให้ child ประเภท orchestrator สามารถสร้าง leaf grandchildren ได้; 3 สำหรับสามระดับ (cap).
delegation.orchestrator_enabled: false: global kill switch ที่บังคับให้ทุก child เป็น leaf ไม่ว่าพารามิเตอร์ role จะเป็นอย่างไร

คำเตือนเรื่องค่าใช้จ่าย: ด้วย max_spawn_depth: 3 และ max_concurrent_children: 3, tree สามารถมี leaf agents แบบ concurrent ได้ถึง 3×3×3 = 27 ตัว แต่ละระดับที่เพิ่มขึ้นจะเพิ่มค่าใช้จ่ายทวีคูณ — โปรดเพิ่ม max_spawn_depth ด้วยความตั้งใจ

คุณสมบัติหลัก (Key Properties)

แต่ละ subagent ได้รับ terminal session ของตัวเอง (แยกจาก parent)
Nested delegation เป็นแบบ opt-in — เฉพาะ children ประเภท role="orchestrator" เท่านั้นที่สามารถมอบหมายงานต่อได้ และเฉพาะเมื่อ max_spawn_depth ถูกเพิ่มจากค่า default 1 (flat). ปิดการใช้งานทั่วโลกด้วย orchestrator_enabled: false.
Leaf subagents ไม่สามารถ เรียกใช้: delegate_task, clarify, memory, send_message, execute_code. Orchestrator subagents ยังคงมี delegate_task แต่ก็ยังไม่สามารถใช้สี่ตัวที่เหลือได้
Interrupt propagation — การขัดจังหวะ parent จะขัดจังหวะ child ที่กำลังทำงานทั้งหมด (รวมถึง grandchildren ภายใต้ orchestrators)
มีเพียง final summary เท่านั้นที่เข้าสู่ context ของ parent ทำให้การใช้ token มีประสิทธิภาพ
Subagents สืบทอด API key, provider configuration, และ credential pool ของ parent (ช่วยให้สามารถหมุนเวียน key เมื่อเกิด rate limits)

Delegation vs execute_code

Factor	delegate_task	execute_code
Reasoning	Full LLM reasoning loop	Just Python code execution
Context	Fresh isolated conversation	No conversation, just script
Tool access	All non-blocked tools with reasoning	7 tools via RPC, no reasoning
Parallelism	3 concurrent subagents by default (configurable)	Single script
Best for	Complex tasks needing judgment	Mechanical multi-step pipelines
Token cost	Higher (full LLM loop)	Lower (only stdout returned)
User interaction	None (subagents can't clarify)	None

Rule of thumb: ใช้ delegate_task เมื่อ subtask ต้องการการให้เหตุผล (reasoning), การตัดสินใจ (judgment), หรือการแก้ปัญหาแบบหลายขั้นตอน ใช้ execute_code เมื่อคุณต้องการการประมวลผลข้อมูลเชิงกลไก (mechanical data processing) หรือ workflow แบบ scripted

Configuration

# In ~/.hermes/config.yaml
delegation:
  max_iterations: 50                        # Max turns per child (default: 50)
  # max_concurrent_children: 3              # Parallel children per batch (default: 3)
  # max_spawn_depth: 1                      # Tree depth (1-3, default 1 = flat). Raise to 2 to allow orchestrator children to spawn leaves; 3 for three levels.
  # orchestrator_enabled: true              # Disable to force all children to leaf role.
  model: "google/gemini-3-flash-preview"             # Optional provider/model override
  provider: "openrouter"                             # Optional built-in provider

# Or use a direct custom endpoint instead of provider:
delegation:
  model: "qwen2.5-coder"
  base_url: "http://localhost:1234/v1"
  api_key: "local-key"

:::tip Agent จะจัดการการมอบหมายงานโดยอัตโนมัติตามความซับซ้อนของงาน คุณไม่จำเป็นต้องร้องขอให้มันมอบหมายงานอย่างชัดเจน — มันจะทำเมื่อเห็นว่าเหมาะสม :::

📄 user-guide/features/fallback-providers.md

title: Fallback Providers description: Configure automatic failover to backup LLM providers when your primary model is unavailable. sidebar_label: Fallback Providers sidebar_position: 8

Fallback Providers

Hermes Agent มีความทนทานสามระดับที่ช่วยให้เซสชันของคุณยังคงทำงานได้แม้ว่า provider จะมีปัญหา:

Credential pools - การหมุนเวียน API key หลายตัวสำหรับ provider ตัวเดียวกัน (พยายามใช้ตัวนี้ก่อน)
Primary model fallback - การสลับอัตโนมัติไปยัง provider:model อื่น เมื่อ model หลักของคุณล้มเหลว
Auxiliary task fallback - การระบุ provider อย่างอิสระสำหรับ side tasks เช่น vision, compression, และ web extraction

Credential pools จัดการการหมุนเวียนภายใน provider เดียวกัน (เช่น OpenRouter keys หลายตัว) ส่วนหน้านี้ครอบคลุมการ fallback ข้าม provider ทั้งสองส่วนเป็นทางเลือกและทำงานได้อย่างอิสระ

Primary Model Fallback

เมื่อ provider LLM หลักของคุณพบข้อผิดพลาด - เช่น rate limits, server overload, auth failures, connection drops - Hermes สามารถสลับไปยัง provider:model สำรองได้โดยอัตโนมัติระหว่างเซสชันโดยที่คุณไม่สูญเสียการสนทนา

Configuration

เพิ่มส่วน fallback_model ในไฟล์ ~/.hermes/config.yaml:

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

ทั้ง provider และ model เป็นส่วนที่จำเป็น หากขาดส่วนใดส่วนหนึ่ง การ fallback จะถูกปิดใช้งาน

Supported Providers

Provider	Value	Requirements
AI Gateway	`ai-gateway`	`AI_GATEWAY_API_KEY`
OpenRouter	`openrouter`	`OPENROUTER_API_KEY`
Nous Portal	`nous`	`hermes auth` (OAuth)
OpenAI Codex	`openai-codex`	`hermes model` (ChatGPT OAuth)
GitHub Copilot	`copilot`	`COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, หรือ `GITHUB_TOKEN`
GitHub Copilot ACP	`copilot-acp`	External process (editor integration)
Anthropic	`anthropic`	`ANTHROPIC_API_KEY` หรือ Claude Code credentials
z.ai / GLM	`zai`	`GLM_API_KEY`
Kimi / Moonshot	`kimi-coding`	`KIMI_API_KEY`
MiniMax	`minimax`	`MINIMAX_API_KEY`
MiniMax (China)	`minimax-cn`	`MINIMAX_CN_API_KEY`
DeepSeek	`deepseek`	`DEEPSEEK_API_KEY`
NVIDIA NIM	`nvidia`	`NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`)
Ollama Cloud	`ollama-cloud`	`OLLAMA_API_KEY`
Google Gemini (OAuth)	`google-gemini-cli`	`hermes model` (Google OAuth; optional: `HERMES_GEMINI_PROJECT_ID`)
Google AI Studio	`gemini`	`GOOGLE_API_KEY` (alias: `GEMINI_API_KEY`)
xAI (Grok)	`xai` (alias `grok`)	`XAI_API_KEY` (optional: `XAI_BASE_URL`)
AWS Bedrock	`bedrock`	Standard boto3 auth (`AWS_REGION` + `AWS_PROFILE` หรือ `AWS_ACCESS_KEY_ID`)
Qwen Portal (OAuth)	`qwen-oauth`	`hermes model` (Qwen Portal OAuth; optional: `HERMES_QWEN_BASE_URL`)
OpenCode Zen	`opencode-zen`	`OPENCODE_ZEN_API_KEY`
OpenCode Go	`opencode-go`	`OPENCODE_GO_API_KEY`
Kilo Code	`kilocode`	`KILOCODE_API_KEY`
Xiaomi MiMo	`xiaomi`	`XIAOMI_API_KEY`
Arcee AI	`arcee`	`ARCEEAI_API_KEY`
Alibaba / DashScope	`alibaba`	`DASHSCOPE_API_KEY`
Hugging Face	`huggingface`	`HF_TOKEN`
Custom endpoint	`custom`	`base_url` + `key_env` (see below)

Custom Endpoint Fallback

สำหรับ endpoint ที่เข้ากันได้กับ OpenAI แบบกำหนดเอง ให้เพิ่ม base_url และทางเลือก key_env:

fallback_model:
  provider: custom
  model: my-local-model
  base_url: http://localhost:8000/v1
  key_env: MY_LOCAL_KEY              # env var name containing the API key

When Fallback Triggers

การ fallback จะเปิดใช้งานโดยอัตโนมัติเมื่อ model หลักล้มเหลวด้วย:

Rate limits (HTTP 429) - หลังจากพยายาม retry จนหมด
Server errors (HTTP 500, 502, 503) - หลังจากพยายาม retry จนหมด
Auth failures (HTTP 401, 403) - ทันที (ไม่จำเป็นต้อง retry)
Not found (HTTP 404) - ทันที
Invalid responses - เมื่อ API ส่งคืน response ที่ผิดรูปแบบหรือว่างเปล่าซ้ำๆ

เมื่อมีการเปิดใช้งาน Hermes จะ:

Resolve credentials สำหรับ fallback provider
สร้าง API client ใหม่
สลับ model, provider, และ client ณ จุดนั้น
รีเซ็ตตัวนับ retry และดำเนินการสนทนาต่อ

การสลับนี้เป็นไปอย่างราบรื่น - ประวัติการสนทนา, tool calls, และ context จะถูกเก็บรักษาไว้ Agent จะดำเนินการต่อจากจุดที่หยุดไปอย่างแม่นยำ เพียงแต่ใช้ model ที่แตกต่างออกไป

:::info One-Shot Fallback จะเปิดใช้งานได้สูงสุดหนึ่งครั้งต่อเซสชัน หาก fallback provider ล้มเหลวด้วย การจัดการข้อผิดพลาดปกติจะเข้าควบคุม (retry, จากนั้นข้อความแสดงข้อผิดพลาด) สิ่งนี้ป้องกันการเกิดลูป failover แบบต่อเนื่อง :::

Examples

OpenRouter เป็น fallback สำหรับ Anthropic native:

model:
  provider: anthropic
  default: claude-sonnet-4-6

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

Nous Portal เป็น fallback สำหรับ OpenRouter:

model:
  provider: openrouter
  default: anthropic/claude-opus-4

fallback_model:
  provider: nous
  model: nous-hermes-3

Local model เป็น fallback สำหรับ cloud:

fallback_model:
  provider: custom
  model: llama-3.1-70b
  base_url: http://localhost:8000/v1
  key_env: LOCAL_API_KEY

Codex OAuth เป็น fallback:

fallback_model:
  provider: openai-codex
  model: gpt-5.3-codex

Where Fallback Works

Context	Fallback Supported
CLI sessions	✔
Messaging gateway (Telegram, Discord, etc.)	✔
Subagent delegation	✘ (subagents do not inherit fallback config)
Cron jobs	✘ (run with a fixed provider)
Auxiliary tasks (vision, compression)	✘ (use their own provider chain - see below)

:::tip ไม่มี environment variables สำหรับ fallback_model - ต้องกำหนดค่าผ่าน config.yaml เท่านั้น นี่เป็นสิ่งที่ตั้งใจ: การตั้งค่า fallback เป็นการตัดสินใจโดยเจตนา ไม่ใช่สิ่งที่การ export shell ที่ล้าสมัยควรจะ override :::

Auxiliary Task Fallback

Hermes ใช้ model น้ำหนักเบาแยกต่างหากสำหรับ side tasks แต่ละ task มี provider resolution chain ของตัวเองซึ่งทำหน้าที่เป็นระบบ fallback ในตัว

Tasks with Independent Provider Resolution

Task	What It Does	Config Key
Vision	Image analysis, browser screenshots	`auxiliary.vision`
Web Extract	Web page summarization	`auxiliary.web_extract`
Compression	Context compression summaries	`auxiliary.compression`
Session Search	Past session summarization	`auxiliary.session_search`
Skills Hub	Skill search and discovery	`auxiliary.skills_hub`
MCP	MCP helper operations	`auxiliary.mcp`
Memory Flush	Memory consolidation	`auxiliary.flush_memories`
Approval	Smart command-approval classification	`auxiliary.approval`
Title Generation	Session title summaries	`auxiliary.title_generation`

Auto-Detection Chain

เมื่อ provider ของ task ถูกตั้งค่าเป็น "auto" (ค่าเริ่มต้น) Hermes จะพยายามใช้ provider ตามลำดับจนกว่าจะพบตัวที่ใช้งานได้:

สำหรับ text tasks (compression, web extract, etc.):

OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
API-key providers (z.ai, Kimi, MiniMax, Xiaomi MiMo, Hugging Face, Anthropic) → give up

สำหรับ vision tasks:

Main provider (if vision-capable) → OpenRouter → Nous Portal →
Codex OAuth → Anthropic → Custom endpoint → give up

หาก provider ที่ถูก resolve ล้มเหลวในขณะเรียกใช้ Hermes ยังมี internal retry: หาก provider ไม่ใช่ OpenRouter และไม่ได้ตั้งค่า base_url อย่างชัดเจน มันจะลองใช้ OpenRouter เป็น fallback ตัวสุดท้าย

Configuring Auxiliary Providers

แต่ละ task สามารถกำหนดค่าได้อย่างอิสระใน config.yaml:

auxiliary:
  vision:
    provider: "auto"              # auto | openrouter | nous | codex | main | anthropic
    model: ""                     # e.g. "openai/gpt-4o"
    base_url: ""                  # direct endpoint (takes precedence over provider)
    api_key: ""                   # API key for base_url

  web_extract:
    provider: "auto"
    model: ""

  compression:
    provider: "auto"
    model: ""

  session_search:
    provider: "auto"
    model: ""
    timeout: 30
    max_concurrency: 3
    extra_body: {}

  skills_hub:
    provider: "auto"
    model: ""

  mcp:
    provider: "auto"
    model: ""

  flush_memories:
    provider: "auto"
    model: ""

ทุก task ข้างต้นใช้รูปแบบ provider / model / base_url เดียวกัน การกำหนดค่า Context compression อยู่ภายใต้ auxiliary.compression:

auxiliary:
  compression:
    provider: main                                    # Same provider options as other auxiliary tasks
    model: google/gemini-3-flash-preview
    base_url: null                                   # Custom OpenAI-compatible endpoint

และ fallback model ใช้:

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4
  # base_url: http://localhost:8000/v1               # Optional custom endpoint

สำหรับ auxiliary.session_search, Hermes ยังรองรับ:

max_concurrency เพื่อจำกัดจำนวน session summaries ที่ทำงานพร้อมกัน
extra_body เพื่อส่ง fields คำขอที่เข้ากันได้กับ OpenAI ของ provider นั้นๆ ผ่านในการเรียกใช้ summarization

ตัวอย่าง:

auxiliary:
  session_search:
    provider: main
    model: glm-4.5-air
    max_concurrency: 2
    extra_body:
      enable_thinking: false

หาก provider ของคุณไม่รองรับ field reasoning-control ที่เข้ากันได้กับ OpenAI แบบ native, extra_body จะไม่ช่วยสำหรับส่วนนั้น; ในกรณีนั้น max_concurrency ยังคงมีประโยชน์สำหรับการลด 429s ที่เกิดจากการเรียกใช้พร้อมกัน

ทั้งสามส่วน - auxiliary, compression, fallback - ทำงานในลักษณะเดียวกัน: กำหนด provider เพื่อเลือกผู้ที่จัดการคำขอ, model เพื่อเลือก model, และ base_url เพื่อชี้ไปยัง custom endpoint (ซึ่งจะ override provider)

Provider Options for Auxiliary Tasks

ตัวเลือกเหล่านี้ใช้ได้กับ config auxiliary:, compression:, และ fallback_model: เท่านั้น - "main" ไม่ใช่ค่าที่ถูกต้องสำหรับ model.provider ระดับบนสุด สำหรับ custom endpoints ให้ใช้ provider: custom ในส่วน model: ของคุณ (ดู AI Providers)

Provider	Description	Requirements
`"auto"`	พยายามใช้ provider ตามลำดับจนกว่าจะพบตัวที่ใช้งานได้ (ค่าเริ่มต้น)	ต้องมีการกำหนดค่า provider อย่างน้อยหนึ่งตัว
`"openrouter"`	บังคับใช้ OpenRouter	`OPENROUTER_API_KEY`
`"nous"`	บังคับใช้ Nous Portal	`hermes auth`
`"codex"`	บังคับใช้ Codex OAuth	`hermes model` → Codex
`"main"`	ใช้ provider ที่ agent หลักใช้ (สำหรับ auxiliary tasks เท่านั้น)	ต้องมีการกำหนดค่า main provider ที่ใช้งานอยู่
`"anthropic"`	บังคับใช้ Anthropic native	`ANTHROPIC_API_KEY` หรือ Claude Code credentials

Direct Endpoint Override

สำหรับ auxiliary task ใดๆ การตั้งค่า base_url จะข้ามการ resolve provider โดยสิ้นเชิง และส่งคำขอโดยตรงไปยัง endpoint นั้น:

auxiliary:
  vision:
    base_url: "http://localhost:1234/v1"
    api_key: "local-key"
    model: "qwen2.5-vl"

base_url มีความสำคัญกว่า provider Hermes ใช้ api_key ที่กำหนดค่าไว้สำหรับการตรวจสอบสิทธิ์ และจะ fallback ไปใช้ OPENAI_API_KEY หากไม่ได้ตั้งค่า มันจะไม่ใช้ OPENROUTER_API_KEY สำหรับ custom endpoints

Context Compression Fallback

Context compression ใช้ config block auxiliary.compression เพื่อควบคุมว่า model และ provider ใดที่จัดการ summarization:

auxiliary:
  compression:
    provider: "auto"                              # auto | openrouter | nous | main
    model: "google/gemini-3-flash-preview"

:::info Legacy migration Configs เก่าที่มี compression.summary_model / compression.summary_provider / compression.summary_base_url จะถูกย้ายไปยัง auxiliary.compression.* โดยอัตโนมัติในการโหลดครั้งแรก (config version 17). :::

หากไม่มี provider สำหรับ compression, Hermes จะข้ามการสนทนาตรงกลางโดยไม่สร้าง summary แทนที่จะทำให้เซสชันล้มเหลว

Delegation Provider Override

Subagents ที่ถูกสร้างโดย delegate_task จะไม่ใช้ primary fallback model อย่างไรก็ตาม พวกมันสามารถถูกส่งต่อไปยัง provider:model pair อื่นได้เพื่อการเพิ่มประสิทธิภาพด้านต้นทุน:

delegation:
  provider: "openrouter"                      # override provider สำหรับ subagents ทั้งหมด
  model: "google/gemini-3-flash-preview"      # override model
  # base_url: "http://localhost:1234/v1"      # หรือใช้ direct endpoint
  # api_key: "local-key"

ดู Subagent Delegation สำหรับรายละเอียดการกำหนดค่าทั้งหมด

Cron Job Providers

Cron jobs จะทำงานด้วย provider ใดก็ตามที่กำหนดค่าไว้ ณ เวลาที่ดำเนินการ พวกเขาไม่รองรับ fallback model หากต้องการใช้ provider ที่แตกต่างกันสำหรับ cron jobs, ให้กำหนดค่า override provider และ model บน cron job นั้นๆ:

cronjob(
    action="create",
    schedule="every 2h",
    prompt="Check server status",
    provider="openrouter",
    model="google/gemini-3-flash-preview"
)

ดู Scheduled Tasks (Cron) สำหรับรายละเอียดการกำหนดค่าทั้งหมด

Summary

Feature	Fallback Mechanism	Config Location
Main agent model	`fallback_model` ใน config.yaml - one-shot failover on errors	`fallback_model:` (top-level)
Vision	Auto-detection chain + internal OpenRouter retry	`auxiliary.vision`
Web extraction	Auto-detection chain + internal OpenRouter retry	`auxiliary.web_extract`
Context compression	Auto-detection chain, degrades to no-summary if unavailable	`auxiliary.compression`
Session search	Auto-detection chain	`auxiliary.session_search`
Skills hub	Auto-detection chain	`auxiliary.skills_hub`
MCP helpers	Auto-detection chain	`auxiliary.mcp`
Memory flush	Auto-detection chain	`auxiliary.flush_memories`
Approval classification	Auto-detection chain	`auxiliary.approval`
Title generation	Auto-detection chain	`auxiliary.title_generation`
Delegation	Provider override only (no automatic fallback)	`delegation.provider` / `delegation.model`
Cron jobs	Per-job provider override only (no automatic fallback)	Per-job `provider` / `model`

extent analysis

TL;DR

To resolve the issue, ensure that the fallback_model is correctly configured in the config.yaml file with a valid provider and model, and that the necessary API keys or credentials are set up.

Guidance

Verify fallback_model configuration: Check the config.yaml file for the fallback_model section and ensure it has a valid provider and model specified.
Check API key setup: Confirm that the necessary API keys or credentials are set up for the fallback provider, such as OPENROUTER_API_KEY for OpenRouter.
Test with a simple task: Try running a simple task using the delegate_task function to test if the fallback model is working correctly.
Review logs for errors: Check the logs for any error messages related to the fallback model or API key setup.

Example

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

Notes

The fallback_model configuration only works for CLI sessions and messaging gateway, not for subagent delegation or cron jobs.
The fallback model will only trigger once per session, and if it fails, the regular error handling will take over.

Recommendation

Apply the workaround by configuring the fallback_model in the config.yaml file with a valid provider and model, and ensure the necessary API keys or credentials are set up. This will allow the agent to automatically switch to the fallback model in case of errors with the primary model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #conversation history #environment variable #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - 💡(How to fix) Fix [i18n] Thai Translation: Features Part 1c - Delegation, Fallback Providers [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

📄 user-guide/features/delegation.md

sidebar_position: 7 title: "การมอบหมายงานให้ Subagent" description: "สร้าง child agents แบบ isolated สำหรับ workstreams แบบขนานด้วย delegate_task"

การมอบหมายงานให้ Subagent

Single Task

Parallel Batch

วิธีการทำงานของ Subagent Context

ตัวอย่างการใช้งานจริง

การวิจัยแบบขนาน (Parallel Research)

Code Review + Fix

Multi-File Refactoring

รายละเอียด Batch Mode

Model Override

เคล็ดลับการเลือก Toolset

Max Iterations

Depth Limit and Nested Orchestration

คุณสมบัติหลัก (Key Properties)

Delegation vs execute_code

Configuration

📄 user-guide/features/fallback-providers.md

title: Fallback Providers description: Configure automatic failover to backup LLM providers when your primary model is unavailable. sidebar_label: Fallback Providers sidebar_position: 8

Fallback Providers

Primary Model Fallback

Configuration

Supported Providers

Custom Endpoint Fallback

When Fallback Triggers

Examples

Where Fallback Works

Auxiliary Task Fallback

Tasks with Independent Provider Resolution

Auto-Detection Chain

Configuring Auxiliary Providers

Provider Options for Auxiliary Tasks

Direct Endpoint Override

Context Compression Fallback

Delegation Provider Override

Cron Job Providers

Summary

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING