hermes - ✅(Solved) Fix [Bug]: Possible WSL2 TTS audio routing issue — playback doesn't reach Windows speakers? [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17573Fetched 2026-04-30 06:46:41
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×4commented ×1cross-referenced ×1referenced ×1

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

Me + AI created a workaround that seems to work for me:

#!/bin/bash
# WSL TTS Audio Playback Workaround
# Converts MP3 → WAV and plays via PowerShell SoundPlayer on Windows

# Check for custom Hermes play command first (WSL workaround)
    hermes_play = shutil.which("hermes-play")
    if hermes_play:
        players.insert(0, [hermes_play, file_path])

PR fix notes

PR #17608: fix(voice): add WSL2 PowerShell audio fallback for TTS playback

Description (problem / solution / changelog)

Problem

WSL2 does not expose Linux audio devices by default, causing TTS playback to fail silently — files are generated but no sound plays through Windows speakers.

Fix

When running in WSL2 and powershell.exe is available, add a PowerShell SoundPlayer fallback:

  1. Detect Windows %TEMP% dir dynamically via cmd.exe + wslpath (no hardcoded username)
  2. Convert MP3 → WAV via ffmpeg to a Windows-accessible temp path
  3. Play via PowerShell SoundPlayer.PlaySync()
  4. Clean up temp WAV after playback

Falls back to ffplay/aplay if PowerShell is unavailable. No new dependencies required.

Fixes #17573

Changed files

  • tools/voice_mode.py (modified, +41/-0)

Code Example

Debug report uploaded:
  Report     https://paste.rs/wtryX
  agent.log  https://paste.rs/v0Lae

---



---

sudo apt install ffmpeg libportaudio2 portaudio19-dev pulseaudio-utils alsa-utils

# Python packages (install in Hermes venv)
pip install sounddevice numpy edge-tts

---

#!/bin/bash
# WSL TTS Audio Playback Workaround
# Converts MP3WAV and plays via PowerShell SoundPlayer on Windows

WIN_TMP="/mnt/c/Users/Eternal/AppData/Local/Temp/hermes-tts.wav"
ffmpeg -i "$1" -f wav "$WIN_TMP" -loglevel quiet -y 2>/dev/null
WIN_PATH=$(wslpath -w "$WIN_TMP")
powershell.exe -NoProfile -Command "(New-Object Media.SoundPlayer '$WIN_PATH').PlaySync()"
rm -f "$WIN_TMP"

---

sudo chmod +x /usr/local/bin/hermes-play

---

if system == "Darwin":
        players.append(["afplay", file_path])
    
    # Check for custom Hermes play command first (WSL workaround)
    hermes_play = shutil.which("hermes-play")
    if hermes_play:
        players.insert(0, [hermes_play, file_path])
    
    players.append(["ffplay", "-nodisp", "-autoexit", "-loglevel", "quiet", file_path])
    if system == "Linux":
        players.append(["aplay", "-q", file_path])

---

try:
            devices = sd.query_devices()
            if not devices:
                if termux_capture:
                    notices.append("No PortAudio devices detected, but Termux:API microphone capture is available")
                elif os.environ.get('PULSE_SERVER'):
                    notices.append("No PortAudio devices detected, but WSL PulseAudio bridge is configured -- TTS playback works")
                else:
                    warnings.append("No audio input/output devices detected")

---

export PULSE_SERVER=unix:/mnt/wslg/PulseServer
RAW_BUFFERClick to expand / collapse

Bug Description

I'm not 100% sure if this is a bug or just expected WSL2 behavior, but /voice tts doesn't seem to produce audible output in my setup. The audio files are generated correctly, but I'm not hearing anything through my Windows speakers/earbuds.

It might be that WSL2 doesn't expose audio devices the same way native Linux does, but I wanted to flag it in case others are experiencing the same thing.

Steps to Reproduce

  1. Running Hermes in WSL2 (Ubuntu) on Windows 11
  2. Enable voice: /voice on → /voice tts
  3. Send a message like "say something to test audio"
  4. The CLI shows the TTS tool ran successfully and files appear in ~/.hermes/audio_cache/, but no sound plays through Windows

Expected Behavior

If TTS is enabled, I would expect to hear the audio through whatever audio device Windows is using (speakers, Bluetooth earbuds, etc.). Though maybe this needs special WSL configuration?

Actual Behavior

Audio files are generated but nothing plays. When I try ffplay manually, I see:

• ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM default • sounddevice finds 0 audio devices

It looks like WSL2 might not be routing audio to Windows properly, but I'm not certain if this is a Hermes issue or a WSL limitation.

Affected Component

CLI (interactive chat), Other

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Debug report uploaded:
  Report     https://paste.rs/wtryX
  agent.log  https://paste.rs/v0Lae

Operating System

Ubuntu 24.04 via WSL2 on Windows 11

Python Version

3.11.15

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

I'm guessing the issue is that play_audio_file() in tools/voice_mode.py tries ffplay and aplay, which probably expect native ALSA/PulseAudio devices. WSL2 might not expose those by default?

I'm not super familiar with Linux audio subsystems, but it seems like the playback step doesn't have a fallback for WSL environments where audio needs to route through Windows instead.

Proposed Fix (optional)

Me + AI created a workaround that seems to work for me:

• Created a script that converts MP3→WAV and plays it via PowerShell's SoundPlayer • Saves to a Windows temp folder so PowerShell can access it • Audio now routes to my Bluetooth earbuds correctly

I'm not sure if this is the "right" way to handle it, but it got things working on my end. If it helps.

Here's what I changed and what dependencies are needed:

Dependencies

The following packages need to be installed in WSL:

sudo apt install ffmpeg libportaudio2 portaudio19-dev pulseaudio-utils alsa-utils

# Python packages (install in Hermes venv)
pip install sounddevice numpy edge-tts

Note: ffmpeg is already installed by the Hermes installer, but the others need to be added manually for voice mode to work properly in WSL2.

File 1: /usr/local/bin/hermes-play (new script)

#!/bin/bash
# WSL TTS Audio Playback Workaround
# Converts MP3 → WAV and plays via PowerShell SoundPlayer on Windows

WIN_TMP="/mnt/c/Users/Eternal/AppData/Local/Temp/hermes-tts.wav"
ffmpeg -i "$1" -f wav "$WIN_TMP" -loglevel quiet -y 2>/dev/null
WIN_PATH=$(wslpath -w "$WIN_TMP")
powershell.exe -NoProfile -Command "(New-Object Media.SoundPlayer '$WIN_PATH').PlaySync()"
rm -f "$WIN_TMP"

Make it executable:

sudo chmod +x /usr/local/bin/hermes-play

File 2: tools/voice_mode.py (modified play_audio_file() function)

Added hermes-play as the first player option on Linux:

    if system == "Darwin":
        players.append(["afplay", file_path])
    
    # Check for custom Hermes play command first (WSL workaround)
    hermes_play = shutil.which("hermes-play")
    if hermes_play:
        players.insert(0, [hermes_play, file_path])
    
    players.append(["ffplay", "-nodisp", "-autoexit", "-loglevel", "quiet", file_path])
    if system == "Linux":
        players.append(["aplay", "-q", file_path])

File 3: tools/voice_mode.py (modified detect_audio_environment() function)

Updated the device check to not block when WSL has PulseAudio configured:

        try:
            devices = sd.query_devices()
            if not devices:
                if termux_capture:
                    notices.append("No PortAudio devices detected, but Termux:API microphone capture is available")
                elif os.environ.get('PULSE_SERVER'):
                    notices.append("No PortAudio devices detected, but WSL PulseAudio bridge is configured -- TTS playback works")
                else:
                    warnings.append("No audio input/output devices detected")

Environment Setup

Add to ~/.bashrc for PulseAudio bridge detection:

export PULSE_SERVER=unix:/mnt/wslg/PulseServer

How it works:

  1. The script converts MP3 → WAV via ffmpeg and saves to a Windows-accessible temp folder
  2. Uses PowerShell's SoundPlayer to play through Windows audio stack
  3. play_audio_file() checks for hermes-play first, falling back to ffplay/aplay if not found
  4. The environment check now recognizes WSL+PulseAudio as a valid setup

Important notes:

  • This is a workaround, not a proper fix. A real solution would need to handle WSL audio routing more gracefully in the codebase
  • The script uses a hardcoded Windows path — a better implementation would detect the Windows temp folder dynamically
  • Audio playback works on my setup (CMF Buds Pro 2 via Bluetooth), but I haven't tested with other audio devices
  • Voice input (microphone) still doesn't work in WSL2 — this only fixes TTS playback

Happy to help test any official fixes or refine this approach!

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

To fix the issue of no audible output from /voice tts in WSL2, use a workaround script that converts MP3 to WAV and plays it via PowerShell's SoundPlayer on Windows.

Guidance

  1. Install required dependencies: In WSL, run sudo apt install ffmpeg libportaudio2 portaudio19-dev pulseaudio-utils alsa-utils and pip install sounddevice numpy edge-tts in the Hermes virtual environment.
  2. Create a playback script: Add a script like /usr/local/bin/hermes-play that converts MP3 to WAV and plays it via PowerShell's SoundPlayer.
  3. Modify play_audio_file(): Update the play_audio_file() function in tools/voice_mode.py to check for the custom hermes-play command first.
  4. Configure PulseAudio bridge detection: Add export PULSE_SERVER=unix:/mnt/wslg/PulseServer to ~/.bashrc for PulseAudio bridge detection.

Example

The provided script /usr/local/bin/hermes-play demonstrates how to convert MP3 to WAV and play it via PowerShell's SoundPlayer:

#!/bin/bash
# WSL TTS Audio Playback Workaround
# Converts MP3 → WAV and plays via PowerShell SoundPlayer on Windows

WIN_TMP="/mnt/c/Users/Eternal/AppData/Local/Temp/hermes-tts.wav"
ffmpeg -i "$1" -f wav "$WIN_TMP" -loglevel quiet -y 2>/dev/null
WIN_PATH=$(wslpath -w "$WIN_TMP")
powershell.exe -NoProfile -Command "(New-Object Media.SoundPlayer '$WIN_PATH').PlaySync()"
rm -f "$WIN_TMP"

Notes

This workaround may not work for all audio devices or setups, and a proper fix would require handling WSL audio routing more gracefully in the codebase.

Recommendation

Apply the provided workaround

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: Possible WSL2 TTS audio routing issue — playback doesn't reach Windows speakers? [1 pull requests, 1 comments, 2 participants]