hermes - 💡(How to fix) Fix [i18n] Thai Translation: Features Part 2d - Tools, TTS, Vision, Voice, Dashboard [1 participants]

hermes2026-04-24 08:52:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15005•Fetched 2026-04-25 06:25:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

nanobro

Participants

nanobro

Timeline (top)

labeled ×2

Feature	Platform	Description
Interactive Voice	CLI	กด Ctrl+B เพื่อบันทึก, agent จะตรวจจับความเงียบโดยอัตโนมัติและตอบกลับ
Auto Voice Reply	Telegram, Discord	Agent จะส่งเสียงพูดพร้อมกับการตอบกลับที่เป็นข้อความ
Voice Channel	Discord	Bot จะเข้าร่วม VC, ฟังผู้ใช้พูด, และพูดคำตอบกลับไป

Error Message

ดูไฟล์ log ของ agent, gateway, และ error พร้อมฟังก์ชันการกรองและ live tailing.

File - สลับระหว่างไฟล์ log agent, errors, และ gateway
Level - กรองตามระดับ log: ALL, DEBUG, INFO, WARNING, หรือ ERROR
Component - กรองตาม source component: all, gateway, agent, tools, cli, หรือ cron
Lines - เลือกจำนวนบรรทัดที่ต้องการแสดง (50, 100, 200, หรือ 500)
Auto-refresh - สลับ live tailing ที่จะ poll เพื่อหา log lines ใหม่ทุก 5 วินาที
Color-coded - log lines จะถูกใส่สีตามความรุนแรง (สีแดงสำหรับ errors, สีเหลืองสำหรับ warnings, สีทึมสำหรับ debug)

Root Cause

Feature	Platform	Description
Interactive Voice	CLI	กด Ctrl+B เพื่อบันทึก, agent จะตรวจจับความเงียบโดยอัตโนมัติและตอบกลับ
Auto Voice Reply	Telegram, Discord	Agent จะส่งเสียงพูดพร้อมกับการตอบกลับที่เป็นข้อความ
Voice Channel	Discord	Bot จะเข้าร่วม VC, ฟังผู้ใช้พูด, และพูดคำตอบกลับไป

Fix Action

Fix / Workaround

Category	Examples	Description
Web	`web_search`, `web_extract`	ค้นหาเว็บและดึงเนื้อหาจากหน้าเว็บ
Terminal & Files	`terminal`, `process`, `read_file`, `patch`	รันคำสั่งและจัดการไฟล์
Browser	`browser_navigate`, `browser_snapshot`, `browser_vision`	การทำงานอัตโนมัติของเบราว์เซอร์แบบโต้ตอบ พร้อมรองรับ text และ vision
Media	`vision_analyze`, `image_generate`, `text_to_speech`	การวิเคราะห์และการสร้างแบบ multimodal
Agent orchestration	`todo`, `clarify`, `execute_code`, `delegate_task`	การวางแผน, การชี้แจง, การรันโค้ด, และการมอบหมายงานให้ subagent
Memory & recall	`memory`, `session_search`	memory แบบถาวรและการค้นหา session
Automation & delivery	`cronjob`, `send_message`	งานตามกำหนดเวลาด้วย action create/list/update/pause/resume/run/remove และการส่งข้อความขาออก
Integrations	`ha_`, MCP server tools, `rl_`	Home Assistant, MCP, RL training, และการเชื่อมต่ออื่น ๆ

Code Example

hermes status

---

Your Nous subscription includes the Tool Gateway.

  The Tool Gateway gives you access to web search, image generation,
  text-to-speech, and browser automation through your Nous subscription.
  No need to sign up for separate API keys - just pick the tools you want.

  ○ Web search & extract (Firecrawl) - not configured
  ○ Image generation (FAL) - not configured
  ○ Text-to-speech (OpenAI TTS) - not configured
  ○ Browser automation (Browser Use) - not configured

  ● Enable Tool Gateway
  ○ Skip

---

hermes tools

---

web:
  backend: firecrawl
  use_gateway: true

image_gen:
  use_gateway: true

tts:
  provider: openai
  use_gateway: true

browser:
  cloud_provider: browser-use
  use_gateway: true

---

hermes tools    # เลือกเครื่องมือ -> เลือก provider แบบ direct

---

web:
  backend: firecrawl
  use_gateway: false  # ตอนนี้จะใช้ FIRECRAWL_API_KEY จาก .env

---

hermes status

---

◆ Nous Tool Gateway
  Nous Portal   ✓ managed tools available
  Web tools       ✓ active via Nous subscription
  Image gen       ✓ active via Nous subscription
  TTS             ✓ active via Nous subscription
  Browser         ○ active via Browser Use key
  Modal           ○ available via subscription (optional)

---

TOOL_GATEWAY_DOMAIN=nousresearch.com     # Base domain for gateway routing
TOOL_GATEWAY_SCHEME=https                 # HTTP หรือ HTTPS (default: https)
TOOL_GATEWAY_USER_TOKEN=your-token        # Auth token (normally auto-populated)
FIRECRAWL_GATEWAY_URL=https://...         # Override for the Firecrawl endpoint specifically

---

# Use specific toolsets
hermes chat --toolsets "web,terminal"

# See all available tools
hermes tools

# Configure tools per platform (interactive)
hermes tools

---

# In ~/.hermes/config.yaml
terminal:
  backend: local    # or: docker, ssh, singularity, modal, daytona
  cwd: "."          # Working directory
  timeout: 180      # Command timeout in seconds

---

terminal:
  backend: docker
  docker_image: python:3.11-slim

---

terminal:
  backend: ssh

---

# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa

---

# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim

# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif

---

uv pip install modal
modal setup
hermes config set terminal.backend modal

---

terminal:
  backend: docker  # or singularity, modal, daytona
  container_cpu: 1              # CPU cores (default: 1)
  container_memory: 5120        # Memory in MB (default: 5GB)
  container_disk: 51200         # Disk in MB (default: 50GB)
  container_persistent: true    # Persist filesystem across sessions (default: true)

---

terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}

# Then manage with the process tool:
process(action="list")       # แสดง process ที่กำลังทำงานทั้งหมด
process(action="poll", session_id="proc_abc123")   # ตรวจสอบสถานะ
process(action="wait", session_id="proc_abc123")   # บล็อกจนกว่าจะเสร็จ
process(action="log", session_id="proc_abc123")    # ผลลัพธ์ทั้งหมด
process(action="kill", session_id="proc_abc123")   # ยุติการทำงาน
process(action="write", session_id="proc_abc123", data="y")  # ส่ง input

---

# In ~/.hermes/config.yaml
tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts"
  speed: 1.0                    # Global speed multiplier (provider-specific settings override this)
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
    speed: 1.0                  # Converted to rate percentage (+/-%)
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"  # Adam
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
    base_url: "https://api.openai.com/v1"  # Override for OpenAI-compatible TTS endpoints
    speed: 1.0                  # 0.25 - 4.0
  minimax:
    model: "speech-2.8-hd"     # speech-2.8-hd (default), speech-2.8-turbo
    voice_id: "English_Graceful_Lady"  # See https://platform.minimax.io/faq/system-voice-id
    speed: 1                    # 0.5 - 2.0
    vol: 1                      # 0 - 10
    pitch: 0                    # -12 - 12
  mistral:
    model: "voxtral-mini-tts-2603"
    voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8"  # Paul - Neutral (default)
  gemini:
    model: "gemini-2.5-flash-preview-tts"  # or gemini-2.5-pro-preview-tts
    voice: "Kore"               # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc.
  xai:
    voice_id: "eve"             # xAI TTS voice (see https://docs.x.ai/docs/api-reference#tts)
    language: "en"              # ISO 639-1 code
    sample_rate: 24000          # 22050 / 24000 (default) / 44100 / 48000
    bit_rate: 128000            # MP3 bitrate; only applies when codec=mp3
    # base_url: "https://api.x.ai/v1"   # Override via XAI_BASE_URL env var
  neutts:
    ref_audio: ''
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu
  kittentts:
    model: KittenML/kitten-tts-nano-0.8-int8   # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
    voice: Jasper                               # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
    speed: 1.0                                  # 0.5 - 2.0
    clean_text: true                            # Expand numbers, currencies, units

---

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Fedora
sudo dnf install ffmpeg

---

# In ~/.hermes/config.yaml
stt:
  provider: "local"           # "local" | "groq" | "openai" | "mistral"
  local:
    model: "base"             # tiny, base, small, medium, large-v3
  openai:
    model: "whisper-1"        # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
  mistral:
    model: "voxtral-mini-latest"  # voxtral-mini-latest, voxtral-mini-2602

---

/paste

---

/terminal-setup

---

brew install pngpaste

---

# Ubuntu/Debian
sudo apt install xclip

# Fedora
sudo dnf install xclip

# Arch
sudo pacman -S xclip

---

# Ubuntu/Debian
sudo apt install wl-clipboard

# Fedora
sudo dnf install wl-clipboard

# Arch
sudo pacman -S wl-clipboard

---

echo $XDG_SESSION_TYPE
# "wayland" = Wayland, "x11" = X11, "tty" = no display server

---

# 1. ตรวจสอบการตรวจจับ WSL
grep -i microsoft /proc/version

# 2. ตรวจสอบว่า PowerShell เข้าถึงได้
which powershell.exe

# 3. คัดลอกรูปภาพ จากนั้นตรวจสอบ
powershell.exe -NoProfile -Command "Add-Type -AssemblyName System.Windows.Forms; [System.Windows.Forms.Clipboard]::ContainsImage()"
# ควรแสดง "True"

---

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/png;base64,..."
  }
}

---

# CLI voice mode (microphone + audio playback)
pip install "hermes-agent[voice]"

# Discord + Telegram messaging (includes discord.py[voice] for VC support)
pip install "hermes-agent[messaging]"

# Premium TTS (ElevenLabs)
pip install "hermes-agent[tts-premium]"

# Local TTS (NeuTTS, optional)
python -m pip install -U neutts[all]

# Everything at once
pip install "hermes-agent[all]"

---

# macOS
brew install portaudio ffmpeg opus
brew install espeak-ng   # for NeuTTS

# Ubuntu/Debian
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng   # for NeuTTS

---

# Speech-to-Text — local provider needs NO key at all
# pip install faster-whisper          # Free, runs locally, recommended
GROQ_API_KEY=your-key                 # Groq Whisper — fast, free tier (cloud)
VOICE_TOOLS_OPENAI_KEY=your-key       # OpenAI Whisper — paid (cloud)

# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
ELEVENLABS_API_KEY=***           # ElevenLabs — premium quality
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS

---

hermes                # Start the interactive CLI

---

/voice          Toggle voice mode on/off
/voice on       Enable voice mode
/voice off      Disable voice mode
/voice tts      Toggle TTS output
/voice status   Show current state

---

hermes gateway        # Start the gateway (connects to configured platforms)
hermes gateway setup  # Interactive setup wizard for first-time configuration

---

DISCORD_REQUIRE_MENTION=false

---

DISCORD_FREE_RESPONSE_CHANNELS=123456789,987654321

---

/voice          Toggle voice mode on/off
/voice on       Voice replies only when you send a voice message
/voice tts      Voice replies for ALL messages
/voice off      Disable voice replies
/voice status   Show current setting

---

https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274881432640

---

# macOS (Homebrew)
brew install opus

# Ubuntu/Debian
sudo apt install libopus0

---

# ~/.hermes/.env

# Discord bot (already configured for text)
DISCORD_BOT_TOKEN=your-bot-token
DISCORD_ALLOWED_USERS=your-user-id

# STT — local provider needs no key (pip install faster-whisper)
# GROQ_API_KEY=your-key            # Alternative: cloud-based, fast, free tier

# TTS — optional. Edge TTS and NeuTTS need no key.
# ELEVENLABS_API_KEY=***      # Premium quality
# VOICE_TOOLS_OPENAI_KEY=***  # OpenAI TTS / Whisper

---

hermes gateway        # Start with existing configuration

---

/voice join      Bot joins your current voice channel
/voice channel   Alias for /voice join
/voice leave     Bot disconnects from voice channel
/voice status    Show voice mode and connected channel

---

# ~/.hermes/.env
DISCORD_ALLOWED_USERS=284102345871466496

---

# Voice recording (CLI)
voice:
  record_key: "ctrl+b"            # Key to start/stop recording
  max_recording_seconds: 120       # Maximum recording length
  auto_tts: false                  # Auto-enable TTS when voice mode starts
  beep_enabled: true               # Play record start/stop beeps
  silence_threshold: 200           # RMS level (0-32767) below which counts as silence
  silence_duration: 3.0            # Seconds of silence before auto-stop

# Speech-to-Text
stt:
  provider: "local"                  # "local" (free) | "groq" | "openai"
  local:
    model: "base"                    # tiny, base, small, medium, large-v3
  # model: "whisper-1"              # Legacy: used when provider is not set

# Text-to-Speech
tts:
  provider: "edge"                 # "edge" (free) | "elevenlabs" | "openai" | "neutts" | "minimax"
  edge:
    voice: "en-US-AriaNeural"      # 322 voices, 74 languages
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"    # Adam
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"                 # alloy, echo, fable, onyx, nova, shimmer
    base_url: "https://api.openai.com/v1"  # optional: override for self-hosted or OpenAI-compatible endpoints
  neutts:
    ref_audio: ''
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu

---

# Speech-to-Text providers (local needs no key)
# pip install faster-whisper        # Free local STT — no API key needed
GROQ_API_KEY=...                    # Groq Whisper (fast, free tier)
VOICE_TOOLS_OPENAI_KEY=...         # OpenAI Whisper (paid)

# STT advanced overrides (optional)
STT_GROQ_MODEL=whisper-large-v3-turbo    # Override default Groq STT model
STT_OPENAI_MODEL=whisper-1               # Override default OpenAI STT model
GROQ_BASE_URL=https://api.groq.com/openai/v1     # Custom Groq endpoint
STT_OPENAI_BASE_URL=https://api.openai.com/v1    # Custom OpenAI STT endpoint

# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
ELEVENLABS_API_KEY=***             # ElevenLabs (premium quality)
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS

# Discord voice channel
DISCORD_BOT_TOKEN=...
DISCORD_ALLOWED_USERS=...

---

brew install portaudio    # macOS
sudo apt install portaudio19-dev  # Ubuntu

---

hermes dashboard

---

# Custom port
hermes dashboard --port 8080

# Bind to all interfaces (use with caution on shared networks)
hermes dashboard --host 0.0.0.0

# Start without opening browser
hermes dashboard --no-open

---

pip install hermes-agent[web]

---

You → /reload
  Reloaded .env (3 var(s) updated)

---

# Terminal 1: start the backend API
hermes dashboard --no-open

# Terminal 2: start the Vite dev server with HMR
cd web/
npm install
npm run dev

---

# ~/.hermes/dashboard-themes/ocean.yaml
name: ocean
label: Ocean
description: Deep sea blues with coral accents

colors:
  background: "#0a1628"
  foreground: "#e0f0ff"
  card: "#0f1f35"
  card-foreground: "#e0f0ff"
  primary: "#ff6b6b"
  primary-foreground: "#0a1628"
  secondary: "#152540"
  secondary-foreground: "#e0f0ff"
  muted: "#1a2d4a"
  muted-foreground: "#7899bb"
  accent: "#1f3555"
  accent-foreground: "#e0f0ff"
  destructive: "#fb2c36"
  destructive-foreground: "#fff"
  success: "#4ade80"
  warning: "#fbbf24"
  border: "color-mix(in srgb, #ff6b6b 15%, transparent)"
  input: "color-mix(in srgb, #ff6b6b 15%, transparent)"
  ring: "#ff6b6b"
  popover: "#0f1f35"
  popover-foreground: "#e0f0ff"

overlay:
  noiseOpacity: 0.08
  noiseBlendMode: color-dodge
  warmGlowOpacity: 0.15
  warmGlowColor: "rgba(255,107,107,0.2)"

RAW_BUFFERClick to expand / collapse

📄 user-guide/features/tool-gateway.md

title: "Nous Tool Gateway" description: "Route web search, image generation, text-to-speech, and browser automation through your Nous subscription - no extra API keys needed" sidebar_label: "Tool Gateway" sidebar_position: 2

Nous Tool Gateway

:::tip Get Started The Tool Gateway is included with paid Nous Portal subscriptions. Manage your subscription → :::

Tool Gateway ช่วยให้ผู้ที่สมัครสมาชิก Nous Portal แบบเสียเงิน สามารถใช้ web search, image generation, text-to-speech, และ browser automation ผ่านการสมัครสมาชิกที่มีอยู่แล้ว โดยไม่จำเป็นต้องลงทะเบียน API keys แยกต่างหากจาก Firecrawl, FAL, OpenAI, หรือ Browser Use

What's Included

Tool	What It Does	Direct Alternative
Web search & extract	Search the web and extract page content via Firecrawl	`FIRECRAWL_API_KEY`, `EXA_API_KEY`, `PARALLEL_API_KEY`, `TAVILY_API_KEY`
Image generation	Generate images via FAL (8 models: FLUX 2 Klein/Pro, GPT-Image, Nano Banana Pro, Ideogram, Recraft V4 Pro, Qwen, Z-Image)	`FAL_KEY`
Text-to-speech	Convert text to speech via OpenAI TTS	`VOICE_TOOLS_OPENAI_KEY`, `ELEVENLABS_API_KEY`
Browser automation	Control cloud browsers via Browser Use	`BROWSER_USE_API_KEY`, `BROWSERBASE_API_KEY`

เครื่องมือทั้งสี่ตัวนี้จะเรียกเก็บเงินจาก Nous subscription ของคุณ คุณสามารถเปิดใช้งานการผสมผสานใดก็ได้ - ตัวอย่างเช่น ใช้ gateway สำหรับ web และ image generation ในขณะที่ยังคงใช้ ElevenLabs key ของคุณเองสำหรับ TTS

Eligibility

Tool Gateway มีให้ใช้งานสำหรับผู้ที่สมัครสมาชิก Nous Portal แบบเสียเงินเท่านั้น บัญชีระดับฟรีไม่มีสิทธิ์เข้าถึง - upgrade your subscription เพื่อปลดล็อก

ในการตรวจสอบสถานะของคุณ:

hermes status

มองหาส่วน Nous Tool Gateway มันจะแสดงว่าเครื่องมือใดที่ใช้งานได้ผ่าน gateway, เครื่องมือใดที่ใช้ direct keys, และเครื่องมือใดที่ยังไม่ได้ตั้งค่า

Enabling the Tool Gateway

During model setup

เมื่อคุณรัน hermes model และเลือก Nous Portal เป็น provider, Hermes จะเสนอให้เปิดใช้งาน Tool Gateway โดยอัตโนมัติ:

Your Nous subscription includes the Tool Gateway.

  The Tool Gateway gives you access to web search, image generation,
  text-to-speech, and browser automation through your Nous subscription.
  No need to sign up for separate API keys - just pick the tools you want.

  ○ Web search & extract (Firecrawl) - not configured
  ○ Image generation (FAL) - not configured
  ○ Text-to-speech (OpenAI TTS) - not configured
  ○ Browser automation (Browser Use) - not configured

  ● Enable Tool Gateway
  ○ Skip

เลือก Enable Tool Gateway และคุณก็เสร็จเรียบร้อยแล้ว

หากคุณมี direct API keys สำหรับเครื่องมือบางตัวอยู่แล้ว, prompt จะปรับเปลี่ยน - คุณสามารถเปิดใช้งาน gateway สำหรับเครื่องมือทั้งหมด (existing keys ของคุณจะถูกเก็บไว้ใน .env แต่จะไม่ถูกใช้ขณะรัน) เปิดใช้งานเฉพาะสำหรับเครื่องมือที่ยังไม่ได้ตั้งค่า หรือข้ามไปเลยก็ได้

Via `hermes tools`

คุณยังสามารถเปิดใช้งาน gateway ได้ทีละเครื่องมือผ่านการตั้งค่า tool แบบโต้ตอบ:

hermes tools

เลือกหมวดหมู่เครื่องมือ (Web, Browser, Image Generation, หรือ TTS), จากนั้นเลือก Nous Subscription เป็น provider วิธีนี้จะตั้งค่า use_gateway: true สำหรับเครื่องมือนั้นใน config ของคุณ

Manual configuration

ตั้งค่า flag use_gateway โดยตรงใน ~/.hermes/config.yaml:

web:
  backend: firecrawl
  use_gateway: true

image_gen:
  use_gateway: true

tts:
  provider: openai
  use_gateway: true

browser:
  cloud_provider: browser-use
  use_gateway: true

How It Works

เมื่อตั้งค่า use_gateway: true สำหรับเครื่องมือใดเครื่องมือหนึ่ง, runtime จะทำการ route API calls ผ่าน Nous Tool Gateway แทนการใช้ direct API keys:

Web tools - web_search และ web_extract ใช้ Firecrawl endpoint ของ gateway
Image generation - image_generate ใช้ FAL endpoint ของ gateway
TTS - text_to_speech ใช้ OpenAI Audio endpoint ของ gateway
Browser - browser_navigate และเครื่องมือ browser อื่นๆ ใช้ Browser Use endpoint ของ gateway

gateway จะทำการ authenticate โดยใช้ Nous Portal credentials ของคุณ (ซึ่งถูกเก็บไว้ใน ~/.hermes/auth.json หลังจากรัน hermes model)

Precedence

แต่ละเครื่องมือจะตรวจสอบ use_gateway ก่อน:

use_gateway: true → route ผ่าน gateway แม้ว่า direct API keys จะมีอยู่ใน .env
use_gateway: false (หรือไม่มีการระบุ) → ใช้ direct API keys หากมี, และจะ fallback ไปใช้ gateway ก็ต่อเมื่อไม่มี direct keys

นั่นหมายความว่าคุณสามารถสลับระหว่าง gateway และ direct keys ได้ตลอดเวลาโดยไม่ต้องลบ credentials ใน .env ของคุณ

Switching Back to Direct Keys

ในการหยุดใช้ gateway สำหรับเครื่องมือใดเครื่องมือหนึ่ง:

hermes tools    # เลือกเครื่องมือ -> เลือก provider แบบ direct

หรือตั้งค่า use_gateway: false ใน config:

web:
  backend: firecrawl
  use_gateway: false  # ตอนนี้จะใช้ FIRECRAWL_API_KEY จาก .env

เมื่อคุณเลือก provider ที่ไม่ใช่ gateway ใน hermes tools, flag use_gateway จะถูกตั้งค่าเป็น false โดยอัตโนมัติเพื่อป้องกัน config ที่ขัดแย้งกัน

Checking Status

hermes status

ส่วน Nous Tool Gateway จะแสดง:

◆ Nous Tool Gateway
  Nous Portal   ✓ managed tools available
  Web tools       ✓ active via Nous subscription
  Image gen       ✓ active via Nous subscription
  TTS             ✓ active via Nous subscription
  Browser         ○ active via Browser Use key
  Modal           ○ available via subscription (optional)

เครื่องมือที่ระบุว่า "active via Nous subscription" จะถูก route ผ่าน gateway เครื่องมือที่มี keys ของตัวเองจะแสดงว่า provider ใดที่ใช้งานได้

Advanced: Self-Hosted Gateway

สำหรับการติดตั้ง gateway แบบ self-hosted หรือ custom gateway deployments, คุณสามารถ override gateway endpoints ผ่าน environment variables ใน ~/.hermes/.env:

TOOL_GATEWAY_DOMAIN=nousresearch.com     # Base domain for gateway routing
TOOL_GATEWAY_SCHEME=https                 # HTTP หรือ HTTPS (default: https)
TOOL_GATEWAY_USER_TOKEN=your-token        # Auth token (normally auto-populated)
FIRECRAWL_GATEWAY_URL=https://...         # Override for the Firecrawl endpoint specifically

env vars เหล่านี้จะมองเห็นได้เสมอใน configuration ไม่ว่าสถานะการสมัครสมาชิกจะเป็นอย่างไร - มีประโยชน์สำหรับการตั้งค่า infrastructure แบบกำหนดเอง

FAQ

Do I need to delete my existing API keys?

ไม่จำเป็น เมื่อตั้งค่า use_gateway: true, runtime จะข้าม direct API keys และ route ผ่าน gateway ของคุณ Keys ของคุณจะยังคงอยู่ใน .env โดยไม่ถูกแตะต้อง หากคุณปิดใช้งาน gateway ในภายหลัง, พวกมันจะถูกใช้ใหม่อีกครั้งโดยอัตโนมัติ

Can I use the gateway for some tools and direct keys for others?

ได้ flag use_gateway เป็นแบบต่อเครื่องมือ คุณสามารถผสมผสานได้ - ตัวอย่างเช่น gateway สำหรับ web และ image generation, ElevenLabs key ของคุณเองสำหรับ TTS, และ Browserbase สำหรับ browser automation

What if my subscription expires?

เครื่องมือที่เคยถูก route ผ่าน gateway จะหยุดทำงานจนกว่าคุณจะ renew your subscription หรือสลับไปใช้ direct API keys ผ่าน hermes tools

Does the gateway work with the messaging gateway?

ได้ Tool Gateway จะ route tool API calls ไม่ว่าคุณจะใช้ CLI, Telegram, Discord, หรือแพลตฟอร์ม messaging อื่นๆ มันทำงานที่ระดับ tool runtime ไม่ใช่ระดับ entry point

Is Modal included?

Modal (serverless terminal backend) มีให้ใช้งานเป็น optional add-on ผ่าน Nous subscription มันไม่ได้ถูกเปิดใช้งานโดย Tool Gateway prompt - ให้ตั้งค่าแยกต่างหากผ่าน hermes setup terminal หรือใน config.yaml

📄 user-guide/features/tools.md

sidebar_position: 1 title: "Tools & Toolsets" description: "ภาพรวมเครื่องมือของ Hermes Agent - สิ่งที่มีให้ใช้, วิธีการทำงานของ toolsets, และ terminal backends"

เครื่องมือและชุดเครื่องมือ (Tools & Toolsets)

เครื่องมือคือฟังก์ชันที่ขยายขีดความสามารถของ agent โดยจะถูกจัดระเบียบเป็น toolsets ทางตรรกะ ซึ่งสามารถเปิดหรือปิดใช้งานได้ตามแพลตฟอร์ม

Available Tools

Hermes มาพร้อมกับ registry เครื่องมือในตัวที่ครอบคลุมการค้นหาเว็บ (web search), การทำงานอัตโนมัติของเบราว์เซอร์ (browser automation), การรันคำสั่งใน terminal, การแก้ไขไฟล์, memory, delegation, RL training, การส่งข้อความ, Home Assistant, และอื่น ๆ

:::note Honcho cross-session memory มีให้ใช้เป็น memory provider plugin (plugins/memory/honcho/) ไม่ใช่ toolset ที่มาพร้อมกับระบบตั้งแต่ต้น ดูที่ Plugins สำหรับการติดตั้ง :::

หมวดหมู่ระดับสูง:

Category	Examples	Description
Web	`web_search`, `web_extract`	ค้นหาเว็บและดึงเนื้อหาจากหน้าเว็บ
Terminal & Files	`terminal`, `process`, `read_file`, `patch`	รันคำสั่งและจัดการไฟล์
Browser	`browser_navigate`, `browser_snapshot`, `browser_vision`	การทำงานอัตโนมัติของเบราว์เซอร์แบบโต้ตอบ พร้อมรองรับ text และ vision
Media	`vision_analyze`, `image_generate`, `text_to_speech`	การวิเคราะห์และการสร้างแบบ multimodal
Agent orchestration	`todo`, `clarify`, `execute_code`, `delegate_task`	การวางแผน, การชี้แจง, การรันโค้ด, และการมอบหมายงานให้ subagent
Memory & recall	`memory`, `session_search`	memory แบบถาวรและการค้นหา session
Automation & delivery	`cronjob`, `send_message`	งานตามกำหนดเวลาด้วย action create/list/update/pause/resume/run/remove และการส่งข้อความขาออก
Integrations	`ha_`, MCP server tools, `rl_`	Home Assistant, MCP, RL training, และการเชื่อมต่ออื่น ๆ

สำหรับ registry ที่เป็นทางการที่ได้จากโค้ด ดูที่ Built-in Tools Reference และ Toolsets Reference

:::tip Nous Tool Gateway ผู้สมัครสมาชิก Nous Portal แบบเสียเงินสามารถใช้ web search, image generation, TTS, และ browser automation ผ่าน Tool Gateway โดยไม่จำเป็นต้องใช้ API keys แยกต่างหาก เพียงรัน hermes model เพื่อเปิดใช้งาน หรือกำหนดค่าเครื่องมือแต่ละตัวด้วย hermes tools :::

Using Toolsets

# Use specific toolsets
hermes chat --toolsets "web,terminal"

# See all available tools
hermes tools

# Configure tools per platform (interactive)
hermes tools

toolsets ทั่วไป ได้แก่ web, terminal, file, browser, vision, image_gen, moa, skills, tts, todo, memory, session_search, cronjob, code_execution, delegation, clarify, homeassistant, และ rl

ดูที่ Toolsets Reference สำหรับชุดเครื่องมือทั้งหมด รวมถึงค่าตั้งต้นของแพลตฟอร์ม เช่น hermes-cli, hermes-telegram, และ toolsets แบบ dynamic MCP เช่น mcp-<server>

Terminal Backends

เครื่องมือ terminal สามารถรันคำสั่งในสภาพแวดล้อมที่แตกต่างกัน:

Backend	Description	Use Case
`local`	รันบนเครื่องของคุณ (ค่าเริ่มต้น)	การพัฒนา, งานที่เชื่อถือได้
`docker`	container แบบแยกส่วน	ความปลอดภัย, การทำซ้ำผลลัพธ์
`ssh`	server ระยะไกล	Sandboxing, ป้องกันไม่ให้ agent เข้าถึงโค้ดของตัวเอง
`singularity`	HPC containers	Cluster computing, rootless
`modal`	การรันบน Cloud	Serverless, การขยายขนาด
`daytona`	Cloud sandbox workspace	สภาพแวดล้อม dev ระยะไกลแบบถาวร

Configuration

# In ~/.hermes/config.yaml
terminal:
  backend: local    # or: docker, ssh, singularity, modal, daytona
  cwd: "."          # Working directory
  timeout: 180      # Command timeout in seconds

Docker Backend

terminal:
  backend: docker
  docker_image: python:3.11-slim

SSH Backend

แนะนำสำหรับความปลอดภัย - agent จะไม่สามารถแก้ไขโค้ดของตัวเองได้:

terminal:
  backend: ssh

# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa

Singularity/Apptainer

# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim

# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif

Modal (Serverless Cloud)

uv pip install modal
modal setup
hermes config set terminal.backend modal

Container Resources

กำหนดค่า CPU, memory, disk, และ persistence สำหรับ container backends ทั้งหมด:

terminal:
  backend: docker  # or singularity, modal, daytona
  container_cpu: 1              # CPU cores (default: 1)
  container_memory: 5120        # Memory in MB (default: 5GB)
  container_disk: 51200         # Disk in MB (default: 50GB)
  container_persistent: true    # Persist filesystem across sessions (default: true)

เมื่อ container_persistent: true แพ็กเกจ ไฟล์ และ config ที่ติดตั้งจะคงอยู่ข้าม session

Container Security

container backends ทั้งหมดทำงานด้วยการเสริมความแข็งแกร่งด้านความปลอดภัย:

Read-only root filesystem (Docker)
All Linux capabilities dropped
No privilege escalation
PID limits (256 processes)
Full namespace isolation
Persistent workspace via volumes, not writable root layer

Docker สามารถรับ env allowlist ที่ชัดเจนผ่าน terminal.docker_forward_env ได้ แต่ตัวแปรที่ส่งต่อจะมองเห็นได้สำหรับคำสั่งภายใน container และควรถูกพิจารณาว่าเปิดเผยต่อ session นั้น

Background Process Management

เริ่ม process ในพื้นหลังและจัดการมัน:

terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}

# Then manage with the process tool:
process(action="list")       # แสดง process ที่กำลังทำงานทั้งหมด
process(action="poll", session_id="proc_abc123")   # ตรวจสอบสถานะ
process(action="wait", session_id="proc_abc123")   # บล็อกจนกว่าจะเสร็จ
process(action="log", session_id="proc_abc123")    # ผลลัพธ์ทั้งหมด
process(action="kill", session_id="proc_abc123")   # ยุติการทำงาน
process(action="write", session_id="proc_abc123", data="y")  # ส่ง input

PTY mode (pty=true) ช่วยให้ใช้ CLI tools แบบโต้ตอบได้ เช่น Codex และ Claude Code

Sudo Support

หากคำสั่งใดต้องการ sudo คุณจะถูกแจ้งให้ใส่รหัสผ่าน (ซึ่งจะถูกแคชสำหรับ session นั้น) หรือตั้งค่า SUDO_PASSWORD ใน ~/.hermes/.env

:::warning บนแพลตฟอร์มส่งข้อความ หาก sudo ล้มเหลว ผลลัพธ์จะรวมคำแนะนำให้เพิ่ม SUDO_PASSWORD ใน ~/.hermes/.env :::

📄 user-guide/features/tts.md

sidebar_position: 9 title: "Voice & TTS" description: "การแปลงข้อความเป็นเสียงพูดและการถอดเสียงข้อความเสียงข้ามแพลตฟอร์มทั้งหมด"

Voice & TTS

Hermes Agent รองรับทั้งการแปลงข้อความเป็นเสียงพูด (text-to-speech) และการถอดเสียงข้อความเสียงข้ามแพลตฟอร์มการส่งข้อความทั้งหมด

:::tip Nous Subscribers หากคุณมีการสมัครสมาชิกแบบเสียเงิน Nous Portal คุณสามารถใช้ OpenAI TTS ผ่าน Tool Gateway ได้โดยไม่จำเป็นต้องมี OpenAI API key แยกต่างหาก ให้รัน hermes model หรือ hermes tools เพื่อเปิดใช้งาน :::

Text-to-Speech

แปลงข้อความเป็นเสียงพูดด้วยผู้ให้บริการ 9 ราย:

Provider	Quality	Cost	API Key
Edge TTS (default)	Good	Free	None needed
ElevenLabs	Excellent	Paid	`ELEVENLABS_API_KEY`
OpenAI TTS	Good	Paid	`VOICE_TOOLS_OPENAI_KEY`
MiniMax TTS	Excellent	Paid	`MINIMAX_API_KEY`
Mistral (Voxtral TTS)	Excellent	Paid	`MISTRAL_API_KEY`
Google Gemini TTS	Excellent	Free tier	`GEMINI_API_KEY`
xAI TTS	Excellent	Paid	`XAI_API_KEY`
NeuTTS	Good	Free (local)	None needed
KittenTTS	Good	Free (local)	None needed

Platform Delivery

Platform	Delivery	Format
Telegram	Voice bubble (plays inline)	Opus `.ogg`
Discord	Voice bubble (Opus/OGG), falls back to file attachment	Opus/MP3
WhatsApp	Audio file attachment	MP3
CLI	Saved to `~/.hermes/audio_cache/`	MP3

Configuration

# In ~/.hermes/config.yaml
tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts"
  speed: 1.0                    # Global speed multiplier (provider-specific settings override this)
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
    speed: 1.0                  # Converted to rate percentage (+/-%)
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"  # Adam
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
    base_url: "https://api.openai.com/v1"  # Override for OpenAI-compatible TTS endpoints
    speed: 1.0                  # 0.25 - 4.0
  minimax:
    model: "speech-2.8-hd"     # speech-2.8-hd (default), speech-2.8-turbo
    voice_id: "English_Graceful_Lady"  # See https://platform.minimax.io/faq/system-voice-id
    speed: 1                    # 0.5 - 2.0
    vol: 1                      # 0 - 10
    pitch: 0                    # -12 - 12
  mistral:
    model: "voxtral-mini-tts-2603"
    voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8"  # Paul - Neutral (default)
  gemini:
    model: "gemini-2.5-flash-preview-tts"  # or gemini-2.5-pro-preview-tts
    voice: "Kore"               # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc.
  xai:
    voice_id: "eve"             # xAI TTS voice (see https://docs.x.ai/docs/api-reference#tts)
    language: "en"              # ISO 639-1 code
    sample_rate: 24000          # 22050 / 24000 (default) / 44100 / 48000
    bit_rate: 128000            # MP3 bitrate; only applies when codec=mp3
    # base_url: "https://api.x.ai/v1"   # Override via XAI_BASE_URL env var
  neutts:
    ref_audio: ''
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu
  kittentts:
    model: KittenML/kitten-tts-nano-0.8-int8   # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
    voice: Jasper                               # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
    speed: 1.0                                  # 0.5 - 2.0
    clean_text: true                            # Expand numbers, currencies, units

Speed control: ค่า tts.speed ทั่วโลกมีผลกับผู้ให้บริการทั้งหมดโดยค่าเริ่มต้น ผู้ให้บริการแต่ละรายสามารถกำหนดค่า speed ของตนเองเพื่อแทนที่ได้ (เช่น tts.openai.speed: 1.5) ความเร็วเฉพาะของผู้ให้บริการจะมีลำดับความสำคัญเหนือค่าทั่วโลก ค่าเริ่มต้นคือ 1.0 (ความเร็วปกติ)

Telegram Voice Bubbles & ffmpeg

Voice bubble ของ Telegram ต้องการรูปแบบเสียง Opus/OGG:

OpenAI, ElevenLabs, และ Mistral สร้าง Opus ได้โดยธรรมชาติ - ไม่ต้องตั้งค่าเพิ่มเติม
Edge TTS (default) ส่งออก MP3 และต้องใช้ ffmpeg ในการแปลง:
MiniMax TTS ส่งออก MP3 และต้องใช้ ffmpeg ในการแปลงสำหรับ voice bubble ของ Telegram
Google Gemini TTS ส่งออก raw PCM และใช้ ffmpeg เพื่อเข้ารหัส Opus โดยตรงสำหรับ voice bubble ของ Telegram
xAI TTS ส่งออก MP3 และต้องใช้ ffmpeg ในการแปลงสำหรับ voice bubble ของ Telegram
NeuTTS ส่งออก WAV และต้องใช้ ffmpeg ในการแปลงสำหรับ voice bubble ของ Telegram
KittenTTS ส่งออก WAV และต้องใช้ ffmpeg ในการแปลงสำหรับ voice bubble ของ Telegram

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Fedora
sudo dnf install ffmpeg

หากไม่มี ffmpeg เสียงจาก Edge TTS, MiniMax TTS, NeuTTS, และ KittenTTS จะถูกส่งเป็นไฟล์เสียงปกติ (เล่นได้ แต่แสดงเป็นเครื่องเล่นสี่เหลี่ยมแทน voice bubble)

:::tip หากคุณต้องการ voice bubbles โดยไม่ต้องติดตั้ง ffmpeg ให้เปลี่ยนไปใช้ผู้ให้บริการ OpenAI, ElevenLabs, หรือ Mistral :::

Voice Message Transcription (STT)

ข้อความเสียงที่ส่งผ่าน Telegram, Discord, WhatsApp, Slack, หรือ Signal จะถูกถอดเสียงโดยอัตโนมัติและแทรกเป็นข้อความลงในการสนทนา Agent จะเห็น transcript นี้เป็นข้อความปกติ

Provider	Quality	Cost	API Key
Local Whisper (default)	Good	Free	None needed
Groq Whisper API	Good–Best	Free tier	`GROQ_API_KEY`
OpenAI Whisper API	Good–Best	Paid	`VOICE_TOOLS_OPENAI_KEY` หรือ `OPENAI_API_KEY`

:::info Zero Config การถอดเสียงในเครื่อง (Local transcription) จะทำงานได้ทันทีเมื่อติดตั้ง faster-whisper หากไม่สามารถใช้ได้ Hermes ยังสามารถใช้ local whisper CLI จากตำแหน่งติดตั้งทั่วไป (เช่น /opt/homebrew/bin) หรือคำสั่งที่กำหนดเองผ่าน HERMES_LOCAL_STT_COMMAND :::

Configuration

# In ~/.hermes/config.yaml
stt:
  provider: "local"           # "local" | "groq" | "openai" | "mistral"
  local:
    model: "base"             # tiny, base, small, medium, large-v3
  openai:
    model: "whisper-1"        # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
  mistral:
    model: "voxtral-mini-latest"  # voxtral-mini-latest, voxtral-mini-2602

Provider Details

Local (faster-whisper) - รัน Whisper ในเครื่องผ่าน faster-whisper ใช้ CPU เป็นค่าเริ่มต้น และใช้ GPU หากมี ขนาดโมเดล:

Model	Size	Speed	Quality
`tiny`	~75 MB	Fastest	Basic
`base`	~150 MB	Fast	Good (default)
`small`	~500 MB	Medium	Better
`medium`	~1.5 GB	Slower	Great
`large-v3`	~3 GB	Slowest	Best

Groq API - ต้องใช้ GROQ_API_KEY เป็นทางเลือกบนคลาวด์ที่ดีเมื่อคุณต้องการตัวเลือก STT แบบ hosted ฟรี

OpenAI API - รับ VOICE_TOOLS_OPENAI_KEY ก่อน และจะ fallback ไปยัง OPENAI_API_KEY รองรับ whisper-1, gpt-4o-mini-transcribe, และ gpt-4o-transcribe

Mistral API (Voxtral Transcribe) - ต้องใช้ MISTRAL_API_KEY ใช้โมเดล Voxtral Transcribe ของ Mistral รองรับ 13 ภาษา, speaker diarization, และ word-level timestamps ติดตั้งด้วย pip install hermes-agent[mistral]

Custom local CLI fallback - กำหนด HERMES_LOCAL_STT_COMMAND หากคุณต้องการให้ Hermes เรียกใช้คำสั่ง transcription ในเครื่องโดยตรง เทมเพลตคำสั่งรองรับ placeholders {input_path}, {output_dir}, {language}, และ {model}

Fallback Behavior

หากผู้ให้บริการที่คุณกำหนดค่าไว้ไม่พร้อมใช้งาน Hermes จะทำการ fallback โดยอัตโนมัติ:

Local faster-whisper unavailable → พยายามใช้ local whisper CLI หรือ HERMES_LOCAL_STT_COMMAND ก่อนผู้ให้บริการบนคลาวด์
Groq key not set → fallback ไปยังการถอดเสียงในเครื่อง จากนั้น OpenAI
OpenAI key not set → fallback ไปยังการถอดเสียงในเครื่อง จากนั้น Groq
Mistral key/SDK not set → ข้ามในการตรวจจับอัตโนมัติ; fallback ไปยังผู้ให้บริการที่พร้อมใช้งานถัดไป
Nothing available → ข้อความเสียงจะถูกส่งผ่านพร้อมหมายเหตุที่แม่นยำถึงผู้ใช้

📄 user-guide/features/vision.md

title: Vision & Image Paste description: Paste images from your clipboard into the Hermes CLI for multimodal vision analysis. sidebar_label: Vision & Image Paste sidebar_position: 7

การวางภาพและรูปภาพ (Vision & Image Paste)

Hermes Agent รองรับ multimodal vision - คุณสามารถวางรูปภาพจาก clipboard ของคุณลงใน CLI ได้โดยตรง และขอให้ agent วิเคราะห์ อธิบาย หรือทำงานกับรูปภาพเหล่านั้น รูปภาพจะถูกส่งไปยัง model ในรูปแบบ content blocks ที่เข้ารหัส base64 ดังนั้น model ใด ๆ ที่รองรับ vision ก็สามารถประมวลผลได้

วิธีการทำงาน

คัดลอกรูปภาพไปยัง clipboard ของคุณ (เช่น screenshot, รูปภาพจาก browser เป็นต้น)
แนบรูปภาพโดยใช้วิธีการใดวิธีการหนึ่งด้านล่าง
พิมพ์คำถามของคุณและกด Enter
รูปภาพจะปรากฏเป็น badge [📎 Image #1] เหนือช่อง input
เมื่อกด submit รูปภาพจะถูกส่งไปยัง model ในรูปแบบ vision content block

คุณสามารถแนบรูปภาพได้หลายรูปก่อนส่ง - แต่ละรูปจะมี badge ของตัวเอง หากต้องการล้างรูปภาพที่แนบทั้งหมด ให้กด Ctrl+C

รูปภาพจะถูกบันทึกที่ ~/.hermes/images/ ในรูปแบบไฟล์ PNG พร้อมชื่อไฟล์ที่ระบุ timestamp

วิธีการวาง (Paste Methods)

วิธีที่คุณแนบรูปภาพขึ้นอยู่กับสภาพแวดล้อมของ terminal ของคุณ วิธีการทั้งหมดไม่ได้ใช้ได้ทุกที่ - นี่คือรายละเอียดทั้งหมด:

คำสั่ง `/paste`

ตัวสำรองที่เชื่อถือได้ที่สุดสำหรับการแนบรูปภาพแบบชัดเจน

/paste

พิมพ์ /paste แล้วกด Enter Hermes จะตรวจสอบ clipboard ของคุณเพื่อหารูปภาพและแนบรูปภาพนั้น นี่คือตัวเลือกที่ปลอดภัยที่สุดเมื่อ terminal ของคุณเขียนทับ Cmd+V/Ctrl+V หรือเมื่อคุณคัดลอกเฉพาะรูปภาพและไม่มี payload ข้อความแบบ bracketed-paste ให้ตรวจสอบ

Ctrl+V / Cmd+V

Hermes ปัจจุบันถือว่าการวาง (paste) เป็นกระบวนการแบบหลายชั้น:

การวางข้อความปกติก่อน
การสำรองข้อมูล clipboard / OSC52 text หาก terminal ไม่ได้ส่งข้อความอย่างสะอาด
การแนบรูปภาพเมื่อ clipboard หรือ pasted payload แก้ไขเป็นรูปภาพหรือ image path

นั่นหมายความว่า path ชั่วคราวของ macOS screenshot ที่วาง หรือ image URIs แบบ file://... สามารถแนบได้ทันที แทนที่จะอยู่ใน composer เป็นข้อความดิบ

:::warning หาก clipboard ของคุณมี เฉพาะรูปภาพ (ไม่มีข้อความ) terminal ก็ยังไม่สามารถส่ง binary image bytes ได้โดยตรง ให้ใช้ /paste เป็นตัวสำรองสำหรับการแนบรูปภาพแบบชัดเจน :::

`/terminal-setup` สำหรับ VS Code / Cursor / Windsurf

หากคุณรัน TUI ภายใน integrated terminal ตระกูล VS Code บน macOS, Hermes สามารถติดตั้ง bindings ที่แนะนำคือ workbench.action.terminal.sendSequence เพื่อให้รองรับ multiline และ undo/redo ได้ดีขึ้น:

/terminal-setup

สิ่งนี้มีประโยชน์อย่างยิ่งเมื่อ Cmd+Enter, Cmd+Z, หรือ Shift+Cmd+Z ถูก IDE ดักจับ ให้รันคำสั่งนี้บนเครื่อง local เท่านั้น - ห้ามรันภายใน SSH session

ความเข้ากันได้ของแพลตฟอร์ม

Environment	`/paste`	Cmd/Ctrl+V	`/terminal-setup`	Notes
macOS Terminal / iTerm2	✅	✅	n/a	ประสบการณ์ที่ดีที่สุด - native clipboard + screenshot-path recovery
Apple Terminal	✅	✅	n/a	หาก Cmd+←/→/⌫ ถูกเขียนทับ ให้ใช้ Ctrl+A / Ctrl+E / Ctrl+U fallbacks
Linux X11 desktop	✅	✅	n/a	ต้องใช้ `xclip` (`apt install xclip`)
Linux Wayland desktop	✅	✅	n/a	ต้องใช้ `wl-paste` (`apt install wl-clipboard`)
WSL2 (Windows Terminal)	✅	✅	n/a	ใช้ `powershell.exe` - ไม่ต้องติดตั้งเพิ่มเติม
VS Code / Cursor / Windsurf (local)	✅	✅	✅	แนะนำสำหรับ Cmd+Enter / undo / redo parity ที่ดีขึ้น
VS Code / Cursor / Windsurf (SSH)	❌²	❌²	❌³	ให้รัน `/terminal-setup` บนเครื่อง local แทน
SSH terminal (any)	❌²	❌²	n/a	ไม่สามารถเข้าถึง clipboard ระยะไกลได้

² ดูที่ SSH & Remote Sessions ด้านล่าง ³ คำสั่งนี้เขียน keybindings ของ IDE local และไม่ควรถูกรันจาก host ระยะไกล

การตั้งค่าเฉพาะแพลตฟอร์ม

macOS

ไม่ต้องตั้งค่าใด ๆ Hermes ใช้ osascript (ที่มาพร้อมกับ macOS) เพื่ออ่าน clipboard สำหรับประสิทธิภาพที่เร็วขึ้น คุณสามารถติดตั้ง pngpaste เพิ่มเติมได้:

brew install pngpaste

Linux (X11)

ติดตั้ง xclip:

# Ubuntu/Debian
sudo apt install xclip

# Fedora
sudo dnf install xclip

# Arch
sudo pacman -S xclip

Linux (Wayland)

Desktop Linux ยุคใหม่ (Ubuntu 22.04+, Fedora 34+) มักใช้ Wayland เป็นค่าเริ่มต้น ติดตั้ง wl-clipboard:

# Ubuntu/Debian
sudo apt install wl-clipboard

# Fedora
sudo dnf install wl-clipboard

# Arch
sudo pacman -S wl-clipboard

:::tip วิธีตรวจสอบว่าคุณใช้ Wayland

echo $XDG_SESSION_TYPE
# "wayland" = Wayland, "x11" = X11, "tty" = no display server

:::

WSL2

ไม่ต้องตั้งค่าเพิ่มเติม Hermes ตรวจจับ WSL2 โดยอัตโนมัติ (ผ่าน /proc/version) และใช้ powershell.exe เพื่อเข้าถึง Windows clipboard ผ่าน System.Windows.Forms.Clipboard ของ .NET นี่คือส่วนที่ทำงานร่วมกันของ Windows interop ใน WSL2 ซึ่ง powershell.exe มีให้ใช้งานโดยค่าเริ่มต้น

ข้อมูล clipboard จะถูกถ่ายโอนเป็น PNG ที่เข้ารหัส base64 ผ่าน stdout ดังนั้นจึงไม่จำเป็นต้องมีการแปลง file path หรือไฟล์ชั่วคราว

:::info WSLg Note หากคุณกำลังรัน WSLg (WSL2 พร้อมการรองรับ GUI), Hermes จะลองใช้ path ของ PowerShell ก่อน จากนั้นจึงย้อนกลับไปใช้ wl-paste clipboard bridge ของ WSLg รองรับเฉพาะรูปแบบ BMP สำหรับรูปภาพเท่านั้น - Hermes จะแปลง BMP เป็น PNG โดยอัตโนมัติโดยใช้ Pillow (หากติดตั้ง) หรือคำสั่ง convert ของ ImageMagick :::

ตรวจสอบการเข้าถึง clipboard ของ WSL2

# 1. ตรวจสอบการตรวจจับ WSL
grep -i microsoft /proc/version

# 2. ตรวจสอบว่า PowerShell เข้าถึงได้
which powershell.exe

# 3. คัดลอกรูปภาพ จากนั้นตรวจสอบ
powershell.exe -NoProfile -Command "Add-Type -AssemblyName System.Windows.Forms; [System.Windows.Forms.Clipboard]::ContainsImage()"
# ควรแสดง "True"

SSH & Remote Sessions

การวางรูปภาพจาก clipboard ไม่ได้ทำงานอย่างสมบูรณ์ผ่าน SSH เมื่อคุณ SSH เข้าไปยังเครื่องระยะไกล, Hermes CLI จะทำงานบน host ระยะไกล เครื่องมือ clipboard (xclip, wl-paste, powershell.exe, osascript) จะอ่าน clipboard ของเครื่องที่มันทำงานอยู่ - ซึ่งคือเซิร์ฟเวอร์ระยะไกล ไม่ใช่เครื่อง local ของคุณ ดังนั้นรูปภาพ clipboard local ของคุณจึงไม่สามารถเข้าถึงได้จากฝั่งระยะไกล

ข้อความยังสามารถส่งผ่านได้ด้วย terminal paste หรือ OSC52 ในบางครั้ง แต่การเข้าถึงรูปภาพ clipboard และ path ชั่วคราวของ screenshot local ยังคงผูกติดอยู่กับเครื่องที่รัน Hermes

วิธีแก้ไขสำหรับ SSH

อัปโหลดไฟล์รูปภาพ - บันทึกรูปภาพในเครื่อง local, อัปโหลดไปยังเซิร์ฟเวอร์ระยะไกลผ่าน scp, file explorer ของ VSCode (drag-and-drop), หรือวิธีการถ่ายโอนไฟล์ใด ๆ จากนั้นอ้างอิงด้วย path (มีแผนจะเพิ่มคำสั่ง /attach <filepath> ในการปล่อยเวอร์ชันในอนาคต)
ใช้ URL - หากรูปภาพสามารถเข้าถึงได้ทางออนไลน์ ให้วาง URL ในข้อความของคุณ agent สามารถใช้ vision_analyze เพื่อดูรูปภาพ URL ใด ๆ ได้โดยตรง
X11 forwarding - เชื่อมต่อด้วย ssh -X เพื่อส่งต่อ X11 สิ่งนี้ช่วยให้ xclip บนเครื่องระยะไกลสามารถเข้าถึง X11 clipboard local ของคุณได้ ต้องมี X server ทำงานบนเครื่อง local (XQuartz บน macOS, built-in บน Linux X11 desktops) ช้าสำหรับรูปภาพขนาดใหญ่
ใช้แพลตฟอร์มข้อความ - ส่งรูปภาพไปยัง Hermes ผ่าน Telegram, Discord, Slack, หรือ WhatsApp แพลตฟอร์มเหล่านี้จัดการการอัปโหลดรูปภาพโดยธรรมชาติและไม่ได้รับผลกระทบจากข้อจำกัดของ clipboard/terminal

เหตุผลที่ Terminal ไม่สามารถวางรูปภาพได้

นี่เป็นแหล่งที่มาของความสับสนทั่วไป ดังนั้นนี่คือคำอธิบายทางเทคนิค:

Terminal เป็น interface ที่ อิงข้อความ (text-based) เมื่อคุณกด Ctrl+V (หรือ Cmd+V), terminal emulator:

อ่าน clipboard สำหรับ เนื้อหาข้อความ
ห่อหุ้มมันด้วย escape sequences ของ bracketed paste
ส่งมันไปยังแอปพลิเคชันผ่าน text stream ของ terminal

หาก clipboard มีเฉพาะรูปภาพ (ไม่มีข้อความ), terminal ก็ไม่มีอะไรจะส่ง ไม่มี standard terminal escape sequence สำหรับ binary image data terminal จึงทำอะไรไม่ได้

นี่คือเหตุผลที่ Hermes ใช้การตรวจสอบ clipboard แยกต่างหาก - แทนที่จะรับข้อมูลรูปภาพผ่านเหตุการณ์ terminal paste, มันเรียกใช้เครื่องมือระดับ OS (osascript, powershell.exe, xclip, wl-paste) โดยตรงผ่าน subprocess เพื่ออ่าน clipboard อย่างอิสระ

Supported Models

การวางรูปภาพใช้ได้กับโมเดลที่รองรับ vision ทุกตัว รูปภาพจะถูกส่งเป็น data URL ที่เข้ารหัส base64 ในรูปแบบ vision content ของ OpenAI:

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/png;base64,..."
  }
}

model สมัยใหม่ส่วนใหญ่รองรับรูปแบบนี้ รวมถึง GPT-4 Vision, Claude (พร้อม vision), Gemini, และ open-source multimodal models ที่ให้บริการผ่าน OpenRouter.

📄 user-guide/features/voice-mode.md

sidebar_position: 10 title: "Voice Mode" description: "Real-time voice conversations with Hermes Agent - CLI, Telegram, Discord (DMs, text channels, and voice channels)"

Voice Mode

Hermes Agent รองรับการโต้ตอบด้วยเสียงแบบเต็มรูปแบบทั้งใน CLI และแพลตฟอร์มการส่งข้อความ พูดคุยกับ agent โดยใช้ไมโครโฟนของคุณ ฟังการตอบกลับด้วยเสียง และสนทนาด้วยเสียงแบบเรียลไทม์ใน Discord voice channels

หากคุณต้องการคำแนะนำการตั้งค่าแบบปฏิบัติจริง พร้อมการกำหนดค่าที่แนะนำและรูปแบบการใช้งานจริง โปรดดูที่ Use Voice Mode with Hermes

Prerequisites

ก่อนใช้งานฟีเจอร์ด้านเสียง โปรดตรวจสอบให้แน่ใจว่าคุณได้ติดตั้งสิ่งเหล่านี้แล้ว:

Hermes Agent ถูกติดตั้งแล้ว - pip install hermes-agent (ดู Installation)
มีการกำหนดค่า LLM provider - รัน hermes model หรือตั้งค่า credentials ของ provider ที่คุณต้องการใน ~/.hermes/.env
มีการตั้งค่า base setup ที่ใช้งานได้ - รัน hermes เพื่อตรวจสอบว่า agent ตอบสนองต่อข้อความได้ก่อนเปิดใช้งานเสียง

:::tip ไดเรกทอรี ~/.hermes/ และ config.yaml ค่าเริ่มต้นจะถูกสร้างขึ้นโดยอัตโนมัติในครั้งแรกที่คุณรัน hermes คุณจำเป็นต้องสร้าง ~/.hermes/.env ด้วยตนเองสำหรับ API keys เท่านั้น :::

Overview

Feature	Platform	Description
Interactive Voice	CLI	กด Ctrl+B เพื่อบันทึก, agent จะตรวจจับความเงียบโดยอัตโนมัติและตอบกลับ
Auto Voice Reply	Telegram, Discord	Agent จะส่งเสียงพูดพร้อมกับการตอบกลับที่เป็นข้อความ
Voice Channel	Discord	Bot จะเข้าร่วม VC, ฟังผู้ใช้พูด, และพูดคำตอบกลับไป

Requirements

Python Packages

# CLI voice mode (microphone + audio playback)
pip install "hermes-agent[voice]"

# Discord + Telegram messaging (includes discord.py[voice] for VC support)
pip install "hermes-agent[messaging]"

# Premium TTS (ElevenLabs)
pip install "hermes-agent[tts-premium]"

# Local TTS (NeuTTS, optional)
python -m pip install -U neutts[all]

# Everything at once
pip install "hermes-agent[all]"

Extra	Packages	Required For
`voice`	`sounddevice`, `numpy`	CLI voice mode
`messaging`	`discord.py[voice]`, `python-telegram-bot`, `aiohttp`	Discord & Telegram bots
`tts-premium`	`elevenlabs`	ElevenLabs TTS provider

สำหรับ local TTS provider: ให้ติดตั้ง neutts แยกต่างหากด้วย python -m pip install -U neutts[all]. เมื่อใช้งานครั้งแรก ระบบจะดาวน์โหลดโมเดลให้โดยอัตโนมัติ

:::info discord.py[voice] จะติดตั้ง PyNaCl (สำหรับการเข้ารหัสเสียง) และ opus bindings โดยอัตโนมัติ สิ่งนี้จำเป็นสำหรับการรองรับ Discord voice channel :::

System Dependencies

# macOS
brew install portaudio ffmpeg opus
brew install espeak-ng   # for NeuTTS

# Ubuntu/Debian
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng   # for NeuTTS

Dependency	Purpose	Required For
PortAudio	Microphone input and audio playback	CLI voice mode
ffmpeg	Audio format conversion (MP3 → Opus, PCM → WAV)	All platforms
Opus	Discord voice codec	Discord voice channels
espeak-ng	Phonemizer backend	Local NeuTTS provider

API Keys

เพิ่มใน ~/.hermes/.env:

# Speech-to-Text — local provider needs NO key at all
# pip install faster-whisper          # Free, runs locally, recommended
GROQ_API_KEY=your-key                 # Groq Whisper — fast, free tier (cloud)
VOICE_TOOLS_OPENAI_KEY=your-key       # OpenAI Whisper — paid (cloud)

# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
ELEVENLABS_API_KEY=***           # ElevenLabs — premium quality
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS

:::tip หากติดตั้ง faster-whisper voice mode จะทำงานได้โดยใช้ zero API keys สำหรับ STT โมเดล (~150 MB สำหรับ base) จะดาวน์โหลดโดยอัตโนมัติในการใช้งานครั้งแรก :::

CLI Voice Mode

Quick Start

เริ่มต้น CLI และเปิดใช้งาน voice mode:

hermes                # Start the interactive CLI

จากนั้นใช้คำสั่งเหล่านี้ภายใน CLI:

/voice          Toggle voice mode on/off
/voice on       Enable voice mode
/voice off      Disable voice mode
/voice tts      Toggle TTS output
/voice status   Show current state

How It Works

เริ่ม CLI ด้วย hermes และเปิดใช้งาน voice mode ด้วย /voice on
กด Ctrl+B - จะมีเสียงบี๊บดังขึ้น (880Hz) และเริ่มบันทึก
พูด - แถบระดับเสียงแบบเรียลไทม์จะแสดง input ของคุณ: ● [ ▂▃▅▇▇▅▂] ❯
หยุดพูด - หลังจากเงียบไป 3 วินาที การบันทึกจะหยุดโดยอัตโนมัติ
มีเสียงบี๊บสองครั้ง (660Hz) เพื่อยืนยันว่าการบันทึกสิ้นสุดแล้ว
เสียงจะถูกถอดเสียงผ่าน Whisper และส่งไปยัง agent
หากเปิดใช้งาน TTS การตอบกลับของ agent จะถูกพูดออกมา
การบันทึกจะ เริ่มใหม่โดยอัตโนมัติ - พูดอีกครั้งโดยไม่ต้องกดปุ่มใดๆ

วงจรนี้จะดำเนินต่อไปจนกว่าคุณจะกด Ctrl+B ระหว่างการบันทึก (ออกจาก continuous mode) หรือการบันทึก 3 ครั้งติดต่อกันตรวจไม่พบคำพูด

:::tip คีย์สำหรับบันทึกสามารถกำหนดค่าได้ผ่าน voice.record_key ใน ~/.hermes/config.yaml (ค่าเริ่มต้น: ctrl+b) :::

Silence Detection

อัลกอริทึมสองขั้นตอนจะตรวจจับเมื่อคุณพูดจบ:

Speech confirmation - รอสัญญาณเสียงที่สูงกว่าเกณฑ์ RMS (200) อย่างน้อย 0.3 วินาที โดยยอมให้มีการลดลงชั่วคราวระหว่างพยางค์
End detection - เมื่อยืนยันว่ามีการพูดแล้ว จะทริกเกอร์หลังจากความเงียบต่อเนื่อง 3.0 วินาที

หากไม่ตรวจพบคำพูดเลยเป็นเวลา 15 วินาที การบันทึกจะหยุดโดยอัตโนมัติ

ทั้ง silence_threshold และ silence_duration สามารถกำหนดค่าได้ใน config.yaml นอกจากนี้คุณยังสามารถปิดเสียงบี๊บเริ่มต้น/หยุดการบันทึกได้ด้วย voice.beep_enabled: false

Streaming TTS

เมื่อเปิดใช้งาน TTS agent จะพูดคำตอบของมัน ทีละประโยค ขณะที่สร้างข้อความ - คุณไม่จำเป็นต้องรอการตอบกลับทั้งหมด:

บัฟเฟอร์ text deltas ให้เป็นประโยคที่สมบูรณ์ (ขั้นต่ำ 20 ตัวอักษร)
ลบ markdown formatting และบล็อก <think>
สร้างและเล่นเสียงต่อประโยคแบบเรียลไทม์

Hallucination Filter

Whisper บางครั้งสร้างข้อความผีจากความเงียบหรือเสียงรบกวนพื้นหลัง ("Thank you for watching", "Subscribe", ฯลฯ) agent จะกรองสิ่งเหล่านี้ออกโดยใช้ชุดวลี hallucination ที่ทราบ 26 วลีในหลายภาษา บวกกับ regex pattern ที่ดักจับการเปลี่ยนแปลงซ้ำๆ

Gateway Voice Reply (Telegram & Discord)

หากคุณยังไม่ได้ตั้งค่า messaging bots ของคุณ โปรดดูที่ guides เฉพาะแพลตฟอร์ม:

เริ่ม gateway เพื่อเชื่อมต่อกับ messaging platforms ของคุณ:

hermes gateway        # Start the gateway (connects to configured platforms)
hermes gateway setup  # Interactive setup wizard for first-time configuration

Discord: Channels vs DMs

Bot รองรับโหมดการโต้ตอบสองโหมดบน Discord:

Mode	How to Talk	Mention Required	Setup
Direct Message (DM)	เปิดโปรไฟล์ของ bot → "Message"	No	Works immediately
Server Channel	พิมพ์ใน text channel ที่ bot อยู่	Yes (`@botname`)	Bot must be invited to the server

DM (แนะนำสำหรับการใช้งานส่วนตัว): เพียงเปิด DM กับ bot และพิมพ์ - ไม่จำเป็นต้อง @mention เสียงตอบกลับและคำสั่งทั้งหมดทำงานเหมือนใน channels

Server channels: Bot จะตอบกลับเมื่อคุณ @mention มันเท่านั้น (เช่น @hermesbyt4 hello) ตรวจสอบให้แน่ใจว่าคุณเลือก bot user จาก pop-up mention ไม่ใช่ role ที่มีชื่อเดียวกัน

:::tip ในการปิดข้อกำหนด @mention ใน server channels ให้เพิ่มใน ~/.hermes/.env:

DISCORD_REQUIRE_MENTION=false

หรือกำหนด channels เฉพาะให้เป็น free-response (ไม่จำเป็นต้อง mention):

DISCORD_FREE_RESPONSE_CHANNELS=123456789,987654321

:::

Commands

คำสั่งเหล่านี้ใช้ได้ทั้งใน Telegram และ Discord (DMs และ text channels):

/voice          Toggle voice mode on/off
/voice on       Voice replies only when you send a voice message
/voice tts      Voice replies for ALL messages
/voice off      Disable voice replies
/voice status   Show current setting

Modes

Mode	Command	Behavior
`off`	`/voice off`	Text only (default)
`voice_only`	`/voice on`	Speaks reply only when you send a voice message
`all`	`/voice tts`	Speaks reply to every message

การตั้งค่า voice mode จะถูกบันทึกไว้แม้จะ restart gateway

Platform Delivery

Platform	Format	Notes
Telegram	Voice bubble (Opus/OGG)	เล่นในแชทโดยตรง ffmpeg จะแปลง MP3 → Opus หากจำเป็น
Discord	Native voice bubble (Opus/OGG)	เล่นในแชทเหมือนข้อความเสียงของผู้ใช้ จะย้อนกลับไปใช้ file attachment หาก API voice bubble ล้มเหลว

Discord Voice Channels

ฟีเจอร์ด้านเสียงที่ดื่มด่ำที่สุด: bot จะเข้าร่วม Discord voice channel, ฟังผู้ใช้พูด, ถอดเสียงคำพูด, ประมวลผลผ่าน agent, และพูดคำตอบกลับไปใน voice channel

Setup

1. Discord Bot Permissions

หากคุณได้ตั้งค่า Discord bot สำหรับข้อความแล้ว (ดู Discord Setup Guide) คุณจำเป็นต้องเพิ่มสิทธิ์ด้านเสียง

ไปที่ Discord Developer Portal → your application → Installation → Default Install Settings → Guild Install:

เพิ่มสิทธิ์เหล่านี้ในการตั้งค่าข้อความที่มีอยู่:

Permission	Purpose	Required
Connect	Join voice channels	Yes
Speak	Play TTS audio in voice channels	Yes
Use Voice Activity	Detect when users are speaking	Recommended

Updated Permissions Integer:

Level	Integer	What's Included
Text only	`274878286912`	View Channels, Send Messages, Read History, Embeds, Attachments, Threads, Reactions
Text + Voice	`274881432640`	All above + Connect, Speak

เชิญ bot ใหม่ ด้วย URL สิทธิ์ที่อัปเดต:

https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274881432640

แทนที่ YOUR_APP_ID ด้วย Application ID ของคุณจาก Developer Portal

:::warning การเชิญ bot ใหม่ไปยัง server ที่มีอยู่แล้วจะอัปเดตสิทธิ์โดยไม่ลบ bot คุณจะไม่สูญเสียข้อมูลหรือการกำหนดค่าใดๆ :::

2. Privileged Gateway Intents

ใน Developer Portal → your application → Bot → Privileged Gateway Intents ให้เปิดใช้งานทั้งสามตัว:

Intent	Purpose
Presence Intent	Detect user online/offline status
Server Members Intent	Map voice SSRC identifiers to Discord user IDs
Message Content Intent	Read text message content in channels

ทั้งสามตัวจำเป็นสำหรับการทำงานเต็มรูปแบบของ voice channel Server Members Intent มีความสำคัญอย่างยิ่ง - หากไม่มีสิ่งนี้ bot จะไม่สามารถระบุได้ว่าใครกำลังพูดใน voice channel

3. Opus Codec

ไลบรารี Opus codec ต้องติดตั้งบนเครื่องที่รัน gateway:

# macOS (Homebrew)
brew install opus

# Ubuntu/Debian
sudo apt install libopus0

bot จะโหลด codec โดยอัตโนมัติจาก:

macOS: /opt/homebrew/lib/libopus.dylib
Linux: libopus.so.0

4. Environment Variables

# ~/.hermes/.env

# Discord bot (already configured for text)
DISCORD_BOT_TOKEN=your-bot-token
DISCORD_ALLOWED_USERS=your-user-id

# STT — local provider needs no key (pip install faster-whisper)
# GROQ_API_KEY=your-key            # Alternative: cloud-based, fast, free tier

# TTS — optional. Edge TTS and NeuTTS need no key.
# ELEVENLABS_API_KEY=***      # Premium quality
# VOICE_TOOLS_OPENAI_KEY=***  # OpenAI TTS / Whisper

Start the Gateway

hermes gateway        # Start with existing configuration

bot ควรจะออนไลน์ใน Discord ภายในไม่กี่วินาที

Commands

ใช้คำสั่งเหล่านี้ใน Discord text channel ที่ bot อยู่:

/voice join      Bot joins your current voice channel
/voice channel   Alias for /voice join
/voice leave     Bot disconnects from voice channel
/voice status    Show voice mode and connected channel

:::info คุณต้องอยู่ใน voice channel ก่อนรัน /voice join bot จะเข้าร่วม VC เดียวกับที่คุณอยู่ :::

How It Works

เมื่อ bot เข้าร่วม voice channel:

Listens to each user's audio stream independently
Detects silence - 1.5s of silence after at least 0.5s of speech triggers processing
Transcribes the audio via Whisper STT (local, Groq, or OpenAI)
Processes through the full agent pipeline (session, tools, memory)
Speaks the reply back in the voice channel via TTS

Text Channel Integration

เมื่อ bot อยู่ใน voice channel:

Transcripts จะปรากฏใน text channel: [Voice] @user: what you said
การตอบกลับของ agent จะถูกส่งเป็นข้อความในช่อง และพูดใน VC
text channel คือช่องที่ออกคำสั่ง /voice join

Echo Prevention

bot จะหยุดฟังเสียงโดยอัตโนมัติขณะเล่น TTS replies เพื่อป้องกันไม่ให้ได้ยินและประมวลผล output ของตัวเองซ้ำ

Access Control

เฉพาะผู้ใช้ที่ระบุใน DISCORD_ALLOWED_USERS เท่านั้นที่สามารถโต้ตอบผ่านเสียงได้ เสียงของผู้ใช้อื่นจะถูกเพิกเฉยโดยเงียบๆ

# ~/.hermes/.env
DISCORD_ALLOWED_USERS=284102345871466496

Configuration Reference

config.yaml

# Voice recording (CLI)
voice:
  record_key: "ctrl+b"            # Key to start/stop recording
  max_recording_seconds: 120       # Maximum recording length
  auto_tts: false                  # Auto-enable TTS when voice mode starts
  beep_enabled: true               # Play record start/stop beeps
  silence_threshold: 200           # RMS level (0-32767) below which counts as silence
  silence_duration: 3.0            # Seconds of silence before auto-stop

# Speech-to-Text
stt:
  provider: "local"                  # "local" (free) | "groq" | "openai"
  local:
    model: "base"                    # tiny, base, small, medium, large-v3
  # model: "whisper-1"              # Legacy: used when provider is not set

# Text-to-Speech
tts:
  provider: "edge"                 # "edge" (free) | "elevenlabs" | "openai" | "neutts" | "minimax"
  edge:
    voice: "en-US-AriaNeural"      # 322 voices, 74 languages
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"    # Adam
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"                 # alloy, echo, fable, onyx, nova, shimmer
    base_url: "https://api.openai.com/v1"  # optional: override for self-hosted or OpenAI-compatible endpoints
  neutts:
    ref_audio: ''
    ref_text: ''
    model: neuphonic/neutts-air-q4-gguf
    device: cpu

Environment Variables

# Speech-to-Text providers (local needs no key)
# pip install faster-whisper        # Free local STT — no API key needed
GROQ_API_KEY=...                    # Groq Whisper (fast, free tier)
VOICE_TOOLS_OPENAI_KEY=...         # OpenAI Whisper (paid)

# STT advanced overrides (optional)
STT_GROQ_MODEL=whisper-large-v3-turbo    # Override default Groq STT model
STT_OPENAI_MODEL=whisper-1               # Override default OpenAI STT model
GROQ_BASE_URL=https://api.groq.com/openai/v1     # Custom Groq endpoint
STT_OPENAI_BASE_URL=https://api.openai.com/v1    # Custom OpenAI STT endpoint

# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
ELEVENLABS_API_KEY=***             # ElevenLabs (premium quality)
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS

# Discord voice channel
DISCORD_BOT_TOKEN=...
DISCORD_ALLOWED_USERS=...

STT Provider Comparison

Provider	Model	Speed	Quality	Cost	API Key
Local	`base`	Fast (depends on CPU/GPU)	Good	Free	No
Local	`small`	Medium	Better	Free	No
Local	`large-v3`	Slow	Best	Free	No
Groq	`whisper-large-v3-turbo`	Very fast (~0.5s)	Good	Free tier	Yes
Groq	`whisper-large-v3`	Fast (~1s)	Better	Free tier	Yes
OpenAI	`whisper-1`	Fast (~1s)	Good	Paid	Yes
OpenAI	`gpt-4o-transcribe`	Medium (~2s)	Best	Paid	Yes

Provider priority (automatic fallback): local > groq > openai

TTS Provider Comparison

Provider	Quality	Cost	Latency	Key Required
Edge TTS	Good	Free	~1s	No
ElevenLabs	Excellent	Paid	~2s	Yes
OpenAI TTS	Good	Paid	~1.5s	Yes
NeuTTS	Good	Free	Depends on CPU/GPU	No

NeuTTS ใช้ config block tts.neutts ด้านบน

Troubleshooting

"No audio device found" (CLI)

PortAudio ไม่ได้ติดตั้ง:

brew install portaudio    # macOS
sudo apt install portaudio19-dev  # Ubuntu

Bot doesn't respond in Discord server channels

โดยค่าเริ่มต้น bot ต้องการ @mention ใน server channels ตรวจสอบให้แน่ใจว่าคุณ:

พิมพ์ @ และเลือก bot user (พร้อม #discriminator) ไม่ใช่ role ที่มีชื่อเดียวกัน
หรือใช้ DMs แทน - ไม่จำเป็นต้อง mention
หรือตั้งค่า DISCORD_REQUIRE_MENTION=false ใน ~/.hermes/.env

Bot joins VC but doesn't hear me

ตรวจสอบว่า Discord user ID ของคุณอยู่ใน DISCORD_ALLOWED_USERS
ตรวจสอบให้แน่ใจว่าคุณไม่ได้ถูก mute ใน Discord
bot ต้องการ event SPEAKING จาก Discord ก่อนที่มันจะสามารถ map เสียงของคุณได้ - เริ่มพูดภายในไม่กี่วินาทีหลังจากเข้าร่วม

Bot hears me but doesn't respond

ตรวจสอบว่า STT พร้อมใช้งาน: ติดตั้ง faster-whisper (ไม่ต้องใช้ key) หรือตั้งค่า GROQ_API_KEY / VOICE_TOOLS_OPENAI_KEY
ตรวจสอบว่า LLM model ถูกกำหนดค่าและเข้าถึงได้
ตรวจสอบ gateway logs: tail -f ~/.hermes/logs/gateway.log

Bot responds in text but not in voice channel

TTS provider อาจล้มเหลว - ตรวจสอบ API key และ quota
Edge TTS (ฟรี, ไม่ต้องใช้ key) เป็นค่า fallback เริ่มต้น
ตรวจสอบ logs สำหรับข้อผิดพลาดของ TTS

Whisper returns garbage text

hallucination filter จะดักจับกรณีส่วนใหญ่โดยอัตโนมัติ หากคุณยังคงได้รับ transcript ผี:

ใช้สภาพแวดล้อมที่เงียบกว่า
ปรับ silence_threshold ใน config (สูงขึ้น = ไวต่อความเงียน้อยลง)
ลองใช้ STT model อื่น

📄 user-guide/features/web-dashboard.md

sidebar_position: 15 title: "Web Dashboard" description: "Browser-based dashboard for managing configuration, API keys, sessions, logs, analytics, cron jobs, and skills"

Web Dashboard

Web dashboard คือ UI ที่ทำงานบนเบราว์เซอร์สำหรับจัดการการติดตั้ง Hermes Agent ของคุณ แทนที่จะต้องแก้ไขไฟล์ YAML หรือรันคำสั่ง CLI คุณสามารถตั้งค่า (configure) จัดการ API keys และตรวจสอบ sessions ได้จากอินเทอร์เฟซบนเว็บที่สะอาดตา

Quick Start

hermes dashboard

คำสั่งนี้จะเริ่ม web server ท้องถิ่น (local) และเปิดที่ http://127.0.0.1:9119 ในเบราว์เซอร์ของคุณ Dashboard จะทำงานบนเครื่องของคุณทั้งหมด - ไม่มีข้อมูลใดออกจาก localhost

Options

Flag	Default	Description
`--port`	`9119`	Port ที่จะใช้รัน web server
`--host`	`127.0.0.1`	Bind address
`--no-open`	—	ไม่เปิดเบราว์เซอร์โดยอัตโนมัติ

# Custom port
hermes dashboard --port 8080

# Bind to all interfaces (use with caution on shared networks)
hermes dashboard --host 0.0.0.0

# Start without opening browser
hermes dashboard --no-open

Prerequisites

Web dashboard ต้องการ FastAPI และ Uvicorn ติดตั้งด้วยคำสั่ง:

pip install hermes-agent[web]

หากคุณติดตั้งด้วย pip install hermes-agent[all], dependencies สำหรับ web จะถูกรวมมาให้แล้ว

เมื่อคุณรัน hermes dashboard โดยที่ไม่มี dependencies, ระบบจะแจ้งให้คุณทราบว่าต้องติดตั้งอะไรบ้าง หาก frontend ยังไม่ได้ถูก build และมี npm อยู่ ระบบจะทำการ build ให้โดยอัตโนมัติในการเปิดครั้งแรก

Pages

Status

หน้า Landing page จะแสดงภาพรวมแบบเรียลไทม์ของการติดตั้งของคุณ:

Agent version และวันที่เผยแพร่
Gateway status - สถานะ (running/stopped), PID, แพลตฟอร์มที่เชื่อมต่อ และสถานะของแพลตฟอร์มเหล่านั้น
Active sessions - จำนวน sessions ที่ใช้งานอยู่ใน 5 นาทีที่ผ่านมา
Recent sessions - รายการ 20 sessions ล่าสุด พร้อม model, จำนวนข้อความ, การใช้ token, และตัวอย่างการสนทนา

หน้าสถานะจะรีเฟรชโดยอัตโนมัติทุก 5 วินาที

Config

เป็น editor ที่ใช้ฟอร์มสำหรับไฟล์ config.yaml ฟิลด์การตั้งค่ากว่า 150+ ฟิลด์จะถูกค้นพบโดยอัตโนมัติจาก DEFAULT_CONFIG และจัดระเบียบเป็นหมวดหมู่แบบแท็บ:

model - model เริ่มต้น, provider, base URL, การตั้งค่าการให้เหตุผล (reasoning settings)
terminal - backend (local/docker/ssh/modal), timeout, shell preferences
display - skin, tool progress, resume display, spinner settings
agent - max iterations, gateway timeout, service tier
delegation - subagent limits, reasoning effort
memory - provider selection, context injection settings
approvals - โหมดอนุมัติคำสั่งอันตราย (ask/yolo/deny)
และอื่น ๆ - ทุกส่วนของ config.yaml มีฟิลด์ฟอร์มที่สอดคล้องกัน

ฟิลด์ที่มีค่าที่ถูกต้องที่ทราบ (เช่น terminal backend, skin, approval mode) จะแสดงเป็น dropdowns ค่า Boolean จะแสดงเป็น toggles ส่วนอื่น ๆ ทั้งหมดจะเป็น text input.

Actions:

Save - เขียนการเปลี่ยนแปลงไปยัง config.yaml ทันที
Reset to defaults - ย้อนค่าฟิลด์ทั้งหมดกลับไปเป็นค่าเริ่มต้น (จะไม่บันทึกจนกว่าคุณจะคลิก Save)
Export - ดาวน์โหลด config ปัจจุบันเป็น JSON
Import - อัปโหลดไฟล์ config JSON เพื่อแทนที่ค่าปัจจุบัน

:::tip การเปลี่ยนแปลง Config จะมีผลใน agent session ถัดไป หรือเมื่อ gateway รีสตาร์ท Web dashboard จะแก้ไขไฟล์ config.yaml เดียวกันกับที่ hermes config set และ gateway อ่านค่ามา :::

API Keys

จัดการไฟล์ .env ซึ่งเป็นที่เก็บ API keys และ credentials Keys จะถูกจัดกลุ่มตามหมวดหมู่:

LLM Providers - OpenRouter, Anthropic, OpenAI, DeepSeek, และอื่น ๆ
Tool API Keys - Browserbase, Firecrawl, Tavily, ElevenLabs, และอื่น ๆ
Messaging Platforms - Telegram, Discord, Slack bot tokens, และอื่น ๆ
Agent Settings - env vars ที่ไม่ใช่ secret เช่น API_SERVER_ENABLED

แต่ละ key จะแสดง:

สถานะว่ามีการตั้งค่าหรือไม่ (พร้อมตัวอย่างค่าที่ถูกปกปิด)
คำอธิบายว่าใช้ทำอะไร
ลิงก์ไปยังหน้า signup/key ของ provider
ช่อง input สำหรับตั้งค่าหรืออัปเดตค่า
ปุ่มลบเพื่อลบ key นั้น

Keys ขั้นสูง/ที่ใช้น้อยจะถูกซ่อนไว้โดยค่าเริ่มต้นหลัง toggle.

Sessions

เรียกดูและตรวจสอบ agent sessions ทั้งหมด แต่ละแถวจะแสดงชื่อ session, ไอคอนแพลตฟอร์มต้นทาง (CLI, Telegram, Discord, Slack, cron), ชื่อ model, จำนวนข้อความ, จำนวน tool call, และระยะเวลาที่ใช้งานล่าสุด Session ที่กำลังใช้งานอยู่จะถูกทำเครื่องหมายด้วย badge ที่กะพริบ.

Search - ค้นหาข้อความแบบ full-text ทั่วทั้งเนื้อหาข้อความโดยใช้ FTS5 ผลลัพธ์จะแสดง snippets ที่ถูกไฮไลต์ และจะ auto-scroll ไปยังข้อความที่ตรงกันแรกเมื่อขยาย
Expand - คลิกที่ session เพื่อโหลดประวัติข้อความทั้งหมด ข้อความจะถูก color-code ตามบทบาท (user, assistant, system, tool) และแสดงผลเป็น Markdown พร้อม syntax highlighting.
Tool calls - ข้อความจาก assistant ที่มีการเรียกใช้ tool จะแสดงเป็นบล็อกที่ยุบได้พร้อมชื่อฟังก์ชันและ JSON arguments.
Delete - ลบ session และประวัติข้อความด้วยไอคอนถังขยะ.

Logs

ดูไฟล์ log ของ agent, gateway, และ error พร้อมฟังก์ชันการกรองและ live tailing.

File - สลับระหว่างไฟล์ log agent, errors, และ gateway
Level - กรองตามระดับ log: ALL, DEBUG, INFO, WARNING, หรือ ERROR
Component - กรองตาม source component: all, gateway, agent, tools, cli, หรือ cron
Lines - เลือกจำนวนบรรทัดที่ต้องการแสดง (50, 100, 200, หรือ 500)
Auto-refresh - สลับ live tailing ที่จะ poll เพื่อหา log lines ใหม่ทุก 5 วินาที
Color-coded - log lines จะถูกใส่สีตามความรุนแรง (สีแดงสำหรับ errors, สีเหลืองสำหรับ warnings, สีทึมสำหรับ debug)

Analytics

การวิเคราะห์การใช้งานและค่าใช้จ่ายที่คำนวณจากประวัติ session เลือกช่วงเวลา (7, 30, หรือ 90 วัน) เพื่อดู:

Summary cards - total tokens (input/output), cache hit percentage, total estimated or actual cost, และ total session count พร้อมค่าเฉลี่ยรายวัน
Daily token chart - stacked bar chart แสดงการใช้ input และ output token ต่อวัน พร้อม tooltips เมื่อเลื่อนเมาส์เพื่อดูรายละเอียดและการคำนวณค่าใช้จ่าย
Daily breakdown table - วันที่, จำนวน session, input tokens, output tokens, cache hit rate, และค่าใช้จ่ายสำหรับแต่ละวัน
Per-model breakdown - ตารางที่แสดงแต่ละ model ที่ใช้, จำนวน session, การใช้ token, และค่าใช้จ่ายโดยประมาณ

Cron

สร้างและจัดการ cron jobs ที่กำหนดเวลา ซึ่งจะรัน agent prompts ตามตารางเวลาที่กำหนด

Create - กรอกชื่อ (ไม่บังคับ), prompt, cron expression (เช่น 0 9 * * *), และ delivery target (local, Telegram, Discord, Slack, หรือ email)
Job list - แต่ละ job จะแสดงชื่อ, ตัวอย่าง prompt, schedule expression, state badge (enabled/paused/error), delivery target, last run time, และ next run time
Pause / Resume - สลับสถานะของ job ระหว่าง active และ paused
Trigger now - สั่งรัน job ทันทีนอกตารางเวลาปกติ
Delete - ลบ cron job อย่างถาวร

Skills

เรียกดู ค้นหา และสลับ (toggle) skills และ toolsets Skills จะถูกโหลดจาก ~/.hermes/skills/ และจัดกลุ่มตามหมวดหมู่.

Search - กรอง skills และ toolsets ตามชื่อ, คำอธิบาย, หรือหมวดหมู่
Category filter - คลิก category pills เพื่อจำกัดรายการ (เช่น MLOps, MCP, Red Teaming, AI)
Toggle - เปิดหรือปิด skills ทีละตัวด้วยสวิตช์ การเปลี่ยนแปลงจะมีผลใน session ถัดไป
Toolsets - ส่วนแยกที่แสดง toolsets ที่ติดตั้งมาให้ (เช่น file operations, web browsing) พร้อมสถานะ active/inactive, ข้อกำหนดในการตั้งค่า, และรายการเครื่องมือที่รวมอยู่

:::warning Security Web dashboard จะอ่านและเขียนไฟล์ .env ของคุณ ซึ่งมี API keys และ secrets มันจะ bind กับ 127.0.0.1 โดยค่าเริ่มต้น - สามารถเข้าถึงได้จากเครื่องท้องถิ่นของคุณเท่านั้น หากคุณ bind ไปที่ 0.0.0.0, ใครก็ตามในเครือข่ายของคุณสามารถดูและแก้ไข credentials ของคุณได้ Dashboard ไม่มีระบบ authentication ของตัวเอง :::

/reload Slash Command

PR ของ dashboard ยังเพิ่ม slash command /reload ให้กับ CLI แบบ interactive หลังจากเปลี่ยน API keys ผ่าน web dashboard (หรือโดยการแก้ไข .env โดยตรง) ให้ใช้ /reload ใน session CLI ที่กำลังใช้งานอยู่เพื่อรับการเปลี่ยนแปลงโดยไม่ต้องรีสตาร์ท:

You → /reload
  Reloaded .env (3 var(s) updated)

สิ่งนี้จะอ่าน ~/.hermes/.env เข้าไปใน environment ของ process ที่กำลังรันอยู่ มีประโยชน์เมื่อคุณเพิ่ม provider key ใหม่ผ่าน dashboard และต้องการใช้งานทันที.

REST API

Web dashboard เปิดเผย REST API ที่ frontend ใช้ consume คุณยังสามารถเรียกใช้ endpoints เหล่านี้โดยตรงเพื่อทำ automation ได้:

GET /api/status

ส่งคืน agent version, gateway status, platform states, และจำนวน active session.

GET /api/sessions

ส่งคืน 20 sessions ล่าสุดพร้อม metadata (model, token counts, timestamps, preview).

GET /api/config

ส่งคืนเนื้อหา config.yaml ปัจจุบันในรูปแบบ JSON.

GET /api/config/defaults

ส่งคืนค่า config เริ่มต้น.

GET /api/config/schema

ส่งคืน schema ที่อธิบายทุก config field - type, description, category, และ select options ตามความเหมาะสม Frontend ใช้สิ่งนี้เพื่อ render input widget ที่ถูกต้องสำหรับแต่ละ field.

PUT /api/config

บันทึก configuration ใหม่. Body: {"config": {...}}.

GET /api/env

ส่งคืน environment variables ที่ทราบทั้งหมดพร้อมสถานะ set/unset, ค่าที่ถูกปกปิด, description, และ category.

PUT /api/env

ตั้งค่า environment variable. Body: {"key": "VAR_NAME", "value": "secret"}.

DELETE /api/env

ลบ environment variable. Body: {"key": "VAR_NAME"}.

GET /api/sessions/{session_id}

ส่งคืน metadata สำหรับ session เดียว.

GET /api/sessions/{session_id}/messages

ส่งคืนประวัติข้อความทั้งหมดสำหรับ session รวมถึง tool calls และ timestamps.

GET /api/sessions/search

ค้นหาข้อความแบบ full-text ทั่วทั้งเนื้อหาข้อความ Query parameter: q. ส่งคืน session IDs ที่ตรงกันพร้อม snippets ที่ถูกไฮไลต์.

DELETE /api/sessions/{session_id}

ลบ session และประวัติข้อความ.

GET /api/logs

ส่งคืน log lines. Query parameters: file (agent/errors/gateway), lines (count), level, component.

GET /api/analytics/usage

ส่งคืน token usage, cost, และ session analytics. Query parameter: days (ค่าเริ่มต้น 30). Response รวมถึง daily breakdowns และ per-model aggregates.

GET /api/cron/jobs

ส่งคืน cron jobs ที่กำหนดค่าทั้งหมดพร้อมสถานะ, schedule, และ run history.

POST /api/cron/jobs

สร้าง cron job ใหม่. Body: {"prompt": "...", "schedule": "0 9 * * *", "name": "...", "deliver": "local"}.

POST /api/cron/jobs/{job_id}/pause

หยุด cron job.

POST /api/cron/jobs/{job_id}/resume

เริ่ม cron job ที่ถูกหยุดชั่วคราว.

POST /api/cron/jobs/{job_id}/trigger

สั่งรัน cron job ทันทีนอกตารางเวลา.

DELETE /api/cron/jobs/{job_id}

ลบ cron job.

GET /api/skills

ส่งคืน skills ทั้งหมดพร้อมชื่อ, คำอธิบาย, category, และสถานะ enabled.

PUT /api/skills/toggle

เปิดหรือปิด skill. Body: {"name": "skill-name", "enabled": true}.

GET /api/tools/toolsets

ส่งคืน toolsets ทั้งหมดพร้อม label, description, tools list, และสถานะ active/configured.

CORS

Web server จำกัด CORS ให้กับ localhost origins เท่านั้น:

http://localhost:9119 / http://127.0.0.1:9119 (production)
http://localhost:3000 / http://127.0.0.1:3000
http://localhost:5173 / http://127.0.0.1:5173 (Vite dev server)

หากคุณรัน server บน custom port, origin นั้นจะถูกเพิ่มโดยอัตโนมัติ.

Development

หากคุณกำลังมีส่วนร่วมในการพัฒนา frontend ของ web dashboard:

# Terminal 1: start the backend API
hermes dashboard --no-open

# Terminal 2: start the Vite dev server with HMR
cd web/
npm install
npm run dev

Vite dev server ที่ http://localhost:5173 จะทำหน้าที่ proxy คำขอ /api ไปยัง FastAPI backend ที่ http://127.0.0.1:9119.

Frontend ถูกสร้างด้วย React 19, TypeScript, Tailwind CSS v4, และ components สไตล์ shadcn/ui-style การ build สำหรับ Production จะส่ง output ไปที่ hermes_cli/web_dist/ ซึ่ง FastAPI server ใช้เป็น static SPA.

Automatic Build on Update

เมื่อคุณรัน hermes update, web frontend จะถูก build ใหม่โดยอัตโนมัติหากมี npm ติดตั้งอยู่ สิ่งนี้ช่วยให้ dashboard อัปเดตตามการอัปเดตโค้ด หากไม่มี npm ติดตั้ง, การอัปเดตจะข้ามการ build frontend และ hermes dashboard จะทำการ build ในการเปิดครั้งแรก.

Themes

Dashboard รองรับ visual themes ที่เปลี่ยนสี, overlay effects, และความรู้สึกโดยรวม คุณสามารถสลับ themes ได้แบบ live จาก header bar - คลิกที่ไอคอน palette ถัดจาก language switcher.

Built-in Themes

Theme	Description
Hermes Teal	สีเขียวน้ำทะเลเข้มแบบคลาสสิก (ค่าเริ่มต้น)
Midnight	สีน้ำเงิน-ม่วงเข้มพร้อม accents ที่ดูเย็นตา
Ember	สีแดงเข้มและบรอนซ์ที่อบอุ่น
Mono	สีเทาแบบสะอาดตา, มินิมอล
Cyberpunk	สีเขียวนีออนบนพื้นหลังสีดำ
Rosé	สีชมพูอ่อนและสีงาช้างที่อบอุ่น

การเลือก Theme จะถูกบันทึกใน config.yaml ภายใต้ dashboard.theme และจะถูกกู้คืนเมื่อโหลดหน้า.

Custom Themes

สร้างไฟล์ YAML ใน ~/.hermes/dashboard-themes/:

# ~/.hermes/dashboard-themes/ocean.yaml
name: ocean
label: Ocean
description: Deep sea blues with coral accents

colors:
  background: "#0a1628"
  foreground: "#e0f0ff"
  card: "#0f1f35"
  card-foreground: "#e0f0ff"
  primary: "#ff6b6b"
  primary-foreground: "#0a1628"
  secondary: "#152540"
  secondary-foreground: "#e0f0ff"
  muted: "#1a2d4a"
  muted-foreground: "#7899bb"
  accent: "#1f3555"
  accent-foreground: "#e0f0ff"
  destructive: "#fb2c36"
  destructive-foreground: "#fff"
  success: "#4ade80"
  warning: "#fbbf24"
  border: "color-mix(in srgb, #ff6b6b 15%, transparent)"
  input: "color-mix(in srgb, #ff6b6b 15%, transparent)"
  ring: "#ff6b6b"
  popover: "#0f1f35"
  popover-foreground: "#e0f0ff"

overlay:
  noiseOpacity: 0.08
  noiseBlendMode: color-dodge
  warmGlowOpacity: 0.15
  warmGlowColor: "rgba(255,107,107,0.2)"

21 color tokens จะแมปโดยตรงกับ CSS custom properties ที่ใช้ทั่วทั้ง dashboard ทุกฟิลด์จำเป็นสำหรับ custom themes ส่วน overlay เป็นทางเลือก - มันควบคุม grain texture และ ambient glow effects.

ให้รีเฟรช dashboard หลังจากสร้างไฟล์ Custom themes จะปรากฏใน theme picker เคียงข้าง built-ins.

Theme API

Endpoint	Method	Description
`/api/dashboard/themes`	GET	รายการ themes ที่มีอยู่ + ชื่อที่ใช้งานอยู่
`/api/dashboard/theme`	PUT	ตั้งค่า theme ที่ใช้งาน Body: `{"name": "midnight"}`

extent analysis

TL;DR

To resolve the issue, ensure that the hermes command is properly installed and configured, and then try running hermes dashboard to access the web dashboard.

Guidance

Verify Installation: Confirm that hermes-agent is installed by running pip install hermes-agent.
Check Configuration: Ensure that the configuration files, such as config.yaml and .env, are correctly set up and contain the necessary API keys and settings.
Run Hermes Dashboard: Execute hermes dashboard to start the web server and access the dashboard.
Troubleshoot: If issues persist, review the logs for errors and check the documentation for specific troubleshooting guides related to your problem.

Example

No specific code snippet is provided as the issue seems to be related to the setup and configuration of the Hermes Agent rather than a coding problem.

Notes

Ensure all dependencies, including hermes-agent[web], are installed.
The web dashboard is accessible at http://127.0.0.1:9119 by default.
For development, you can start the Vite dev server with npm run dev in the web/ directory.

Recommendation

Apply the workaround by reinstalling hermes-agent and ensuring that all dependencies are correctly installed, then attempt to run hermes dashboard again. If the issue persists, refer to the official documentation or seek support from the Hermes Agent community.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - 💡(How to fix) Fix [i18n] Thai Translation: Features Part 2d - Tools, TTS, Vision, Voice, Dashboard [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

📄 user-guide/features/tool-gateway.md

title: "Nous Tool Gateway" description: "Route web search, image generation, text-to-speech, and browser automation through your Nous subscription - no extra API keys needed" sidebar_label: "Tool Gateway" sidebar_position: 2

Nous Tool Gateway

What's Included

Eligibility

Enabling the Tool Gateway

During model setup

Via hermes tools

Manual configuration

How It Works

Precedence

Switching Back to Direct Keys

Checking Status

Advanced: Self-Hosted Gateway

FAQ

Do I need to delete my existing API keys?

Can I use the gateway for some tools and direct keys for others?

What if my subscription expires?

Does the gateway work with the messaging gateway?

Is Modal included?

📄 user-guide/features/tools.md

sidebar_position: 1 title: "Tools & Toolsets" description: "ภาพรวมเครื่องมือของ Hermes Agent - สิ่งที่มีให้ใช้, วิธีการทำงานของ toolsets, และ terminal backends"

เครื่องมือและชุดเครื่องมือ (Tools & Toolsets)

Available Tools

Using Toolsets

Terminal Backends

Configuration

Docker Backend

SSH Backend

Singularity/Apptainer

Modal (Serverless Cloud)

Container Resources

Container Security

Background Process Management

Sudo Support

📄 user-guide/features/tts.md

sidebar_position: 9 title: "Voice & TTS" description: "การแปลงข้อความเป็นเสียงพูดและการถอดเสียงข้อความเสียงข้ามแพลตฟอร์มทั้งหมด"

Voice & TTS

Text-to-Speech

Platform Delivery

Configuration

Telegram Voice Bubbles & ffmpeg

Voice Message Transcription (STT)

Configuration

Provider Details

Fallback Behavior

📄 user-guide/features/vision.md

title: Vision & Image Paste description: Paste images from your clipboard into the Hermes CLI for multimodal vision analysis. sidebar_label: Vision & Image Paste sidebar_position: 7

การวางภาพและรูปภาพ (Vision & Image Paste)

วิธีการทำงาน

วิธีการวาง (Paste Methods)

คำสั่ง /paste

Ctrl+V / Cmd+V

/terminal-setup สำหรับ VS Code / Cursor / Windsurf

ความเข้ากันได้ของแพลตฟอร์ม

การตั้งค่าเฉพาะแพลตฟอร์ม

macOS

Linux (X11)

Linux (Wayland)

WSL2

ตรวจสอบการเข้าถึง clipboard ของ WSL2

SSH & Remote Sessions

วิธีแก้ไขสำหรับ SSH

เหตุผลที่ Terminal ไม่สามารถวางรูปภาพได้

Supported Models

📄 user-guide/features/voice-mode.md

sidebar_position: 10 title: "Voice Mode" description: "Real-time voice conversations with Hermes Agent - CLI, Telegram, Discord (DMs, text channels, and voice channels)"

Voice Mode

Prerequisites

Overview

Requirements

Python Packages

Via `hermes tools`

คำสั่ง `/paste`

`/terminal-setup` สำหรับ VS Code / Cursor / Windsurf