hermes - ✅(Solved) Fix STT: faster-whisper CUBLAS_STATUS_NOT_SUPPORTED on RTX 5090 (Blackwell sm_120) bypasses CPU fallback [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17526Fetched 2026-04-30 06:47:01
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Timeline (top)
labeled ×3cross-referenced ×1

Error Message

ERROR tools.transcription_tools: Local transcription failed: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED File ".../faster_whisper/transcribe.py", line 1400, in encode return self.model.encode(features, to_cpu=to_cpu) RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED

Root Cause

Root cause: tools/transcription_tools.py:325-334 defines _CUDA_LIB_ERROR_MARKERS to catch dlopen-style failures and missing-runtime errors (libcublas, libcudnn, libcudart, cannot be loaded, etc.) but doesn't catch architecture-mismatch errors. On Blackwell (sm_120) the bundled CTranslate2 wheel's cuBLAS calls return CUBLAS_STATUS_NOT_SUPPORTED because the GEMM kernels weren't compiled with sm_120 support. The library loads fine, but every actual GPU op fails.

Fix Action

Fix / Workaround

Confirmation: Patched locally; voice message that previously crashed now transcribes correctly with the documented warning:

faster-whisper CUDA runtime failed mid-transcribe (cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED) — evicting cached model and retrying on CPU (int8).

PR fix notes

PR #17559: fix(stt): fall back on CUBLAS_STATUS_NOT_SUPPORTED

Description (problem / solution / changelog)

Summary

  • Treat CUBLAS_STATUS_NOT_SUPPORTED as a CUDA library/runtime error in local STT.
  • Add regression coverage for Blackwell/faster-whisper failures falling back to CPU.

Root cause

_looks_like_cuda_lib_error() only recognized missing CUDA libraries and related runtime messages. On RTX 5090 / Blackwell, faster-whisper can raise cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED, which bypassed the existing CPU fallback path.

Fix

Add the exact cuBLAS status marker to _CUDA_LIB_ERROR_MARKERS so the existing load/transcribe fallback logic retries on CPU int8 instead of surfacing the CUDA runtime error.

Regression coverage

Added test_cublas_status_not_supported_retries_on_cpu, which simulates the runtime cuBLAS error and verifies that _transcribe_local() reloads on CPU and returns the recovered transcript.

Testing

  • scripts/run_tests.sh tests/tools/test_transcription_tools.py tests/tools/test_transcription.py tests/gateway/test_stt_config.py -q — 124 passed

Closes #17526

Changed files

  • tests/tools/test_transcription_tools.py (modified, +36/-0)
  • tools/transcription_tools.py (modified, +1/-0)

Code Example

ERROR tools.transcription_tools: Local transcription failed: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
  File ".../faster_whisper/transcribe.py", line 1400, in encode
    return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED

---

_CUDA_LIB_ERROR_MARKERS = (
    "libcublas",
    "libcudnn",
    "libcudart",
    "cannot be loaded",
    "cannot open shared object",
    "no kernel image is available",
    "no CUDA-capable device",
    "CUDA driver version is insufficient",
    "CUBLAS_STATUS_NOT_SUPPORTED",   # NEW: Blackwell sm_120
    "CUBLAS_STATUS_",                 # NEW: catch other arch-mismatch variants
)

---

faster-whisper CUDA runtime failed mid-transcribe (cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED) — evicting cached model and retrying on CPU (int8).
RAW_BUFFERClick to expand / collapse

Bug: Voice transcription crashes with CUBLAS_STATUS_NOT_SUPPORTED on RTX 5090 (Blackwell, sm_120). The _CUDA_LIB_ERROR_MARKERS fallback list in tools/transcription_tools.py doesn't include this Blackwell-specific cuBLAS error, so the load fails outright instead of falling back to CPU as designed.

Environment:

  • Hermes Agent v0.11.0 (v2026.4.23)
  • Ubuntu 24.04, Python 3.11.15
  • RTX 5090 (32 GB, sm_120 / Blackwell)
  • faster-whisper bundled in voice extra (CTranslate2 backend)

Repro:

  1. Fresh Hermes install on a host with a Blackwell GPU
  2. Send any voice message via Telegram → Hermes caches the .ogg
  3. Transcription tries default device="auto" → loads on CUDA → at first transcribe call, CTranslate2 raises RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
  4. The mid-transcribe fallback in _transcribe_local calls _looks_like_cuda_lib_error() to decide whether to fall back to CPU, but CUBLAS_STATUS_NOT_SUPPORTED is not in _CUDA_LIB_ERROR_MARKERS, so the exception is re-raised and the voice message dies silently from the user's perspective.
  5. Manually running WhisperModel("base", device="cpu", compute_type="int8") works fine and transcribes correctly.

Stack trace excerpt:

ERROR tools.transcription_tools: Local transcription failed: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
  File ".../faster_whisper/transcribe.py", line 1400, in encode
    return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED

Root cause: tools/transcription_tools.py:325-334 defines _CUDA_LIB_ERROR_MARKERS to catch dlopen-style failures and missing-runtime errors (libcublas, libcudnn, libcudart, cannot be loaded, etc.) but doesn't catch architecture-mismatch errors. On Blackwell (sm_120) the bundled CTranslate2 wheel's cuBLAS calls return CUBLAS_STATUS_NOT_SUPPORTED because the GEMM kernels weren't compiled with sm_120 support. The library loads fine, but every actual GPU op fails.

Suggested fix: Add CUBLAS_STATUS_NOT_SUPPORTED (or the broader CUBLAS_STATUS_) to _CUDA_LIB_ERROR_MARKERS. The mid-transcribe fallback at _transcribe_local() will then evict the broken model and reload on CPU, identical to how it currently handles dlopen failures.

_CUDA_LIB_ERROR_MARKERS = (
    "libcublas",
    "libcudnn",
    "libcudart",
    "cannot be loaded",
    "cannot open shared object",
    "no kernel image is available",
    "no CUDA-capable device",
    "CUDA driver version is insufficient",
    "CUBLAS_STATUS_NOT_SUPPORTED",   # NEW: Blackwell sm_120
    "CUBLAS_STATUS_",                 # NEW: catch other arch-mismatch variants
)

Confirmation: Patched locally; voice message that previously crashed now transcribes correctly with the documented warning:

faster-whisper CUDA runtime failed mid-transcribe (cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED) — evicting cached model and retrying on CPU (int8).

Affected users: anyone with a Blackwell GPU (RTX 5090, B100, B200, etc.) and the bundled CTranslate2 wheel. Will become more common as 5090 adoption increases.

extent analysis

TL;DR

Add CUBLAS_STATUS_NOT_SUPPORTED to the _CUDA_LIB_ERROR_MARKERS list in tools/transcription_tools.py to enable fallback to CPU for Blackwell GPUs.

Guidance

  • Update the _CUDA_LIB_ERROR_MARKERS tuple to include CUBLAS_STATUS_NOT_SUPPORTED and potentially other architecture-mismatch variants by adding CUBLAS_STATUS_ to catch broader errors.
  • Verify the fix by testing voice transcription on a Blackwell GPU (e.g., RTX 5090) with the updated _CUDA_LIB_ERROR_MARKERS.
  • Confirm that the transcription falls back to CPU and completes successfully with a warning message indicating the CUDA runtime failure.
  • Be aware that this fix applies to users with Blackwell GPUs and the bundled CTranslate2 wheel, which may become more common as RTX 5090 adoption increases.

Example

_CUDA_LIB_ERROR_MARKERS = (
    "libcublas",
    "libcudnn",
    "libcudart",
    "cannot be loaded",
    "cannot open shared object",
    "no kernel image is available",
    "no CUDA-capable device",
    "CUDA driver version is insufficient",
    "CUBLAS_STATUS_NOT_SUPPORTED",   
    "CUBLAS_STATUS_",                 
)

Notes

This fix assumes that the issue is solely due to the missing error marker and that adding it will correctly trigger the fallback to CPU. Further testing may be necessary to ensure that other aspects of the transcription process are not affected.

Recommendation

Apply the workaround by updating the _CUDA_LIB_ERROR_MARKERS list, as this is a targeted fix for the specific error encountered on Blackwell GPUs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING