hermes - ✅(Solved) Fix STT: faster-whisper CUBLAS_STATUS_NOT_SUPPORTED on RTX 5090 (Blackwell sm_120) bypasses CPU fallback [1 pull requests, 1 participants]

hermes2026-04-29 15:36:08

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#17526•Fetched 2026-04-30 06:47:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

070freebird070-ctrl

Participants

070freebird070-ctrl

Timeline (top)

labeled ×3cross-referenced ×1

Error Message

ERROR tools.transcription_tools: Local transcription failed: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED File ".../faster_whisper/transcribe.py", line 1400, in encode return self.model.encode(features, to_cpu=to_cpu) RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED

Root Cause

Root cause: tools/transcription_tools.py:325-334 defines _CUDA_LIB_ERROR_MARKERS to catch dlopen-style failures and missing-runtime errors (libcublas, libcudnn, libcudart, cannot be loaded, etc.) but doesn't catch architecture-mismatch errors. On Blackwell (sm_120) the bundled CTranslate2 wheel's cuBLAS calls return CUBLAS_STATUS_NOT_SUPPORTED because the GEMM kernels weren't compiled with sm_120 support. The library loads fine, but every actual GPU op fails.

Fix Action

Fix / Workaround

Confirmation: Patched locally; voice message that previously crashed now transcribes correctly with the documented warning:

faster-whisper CUDA runtime failed mid-transcribe (cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED) — evicting cached model and retrying on CPU (int8).

PR fix notes

PR #17559: fix(stt): fall back on CUBLAS_STATUS_NOT_SUPPORTED

Repository: NousResearch/hermes-agent
Author: liuhao1024
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17559

Description (problem / solution / changelog)

Summary

Treat CUBLAS_STATUS_NOT_SUPPORTED as a CUDA library/runtime error in local STT.
Add regression coverage for Blackwell/faster-whisper failures falling back to CPU.

Root cause

_looks_like_cuda_lib_error() only recognized missing CUDA libraries and related runtime messages. On RTX 5090 / Blackwell, faster-whisper can raise cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED, which bypassed the existing CPU fallback path.

Fix

Add the exact cuBLAS status marker to _CUDA_LIB_ERROR_MARKERS so the existing load/transcribe fallback logic retries on CPU int8 instead of surfacing the CUDA runtime error.

Regression coverage

Added test_cublas_status_not_supported_retries_on_cpu, which simulates the runtime cuBLAS error and verifies that _transcribe_local() reloads on CPU and returns the recovered transcript.

Testing

scripts/run_tests.sh tests/tools/test_transcription_tools.py tests/tools/test_transcription.py tests/gateway/test_stt_config.py -q — 124 passed

Closes #17526

Changed files

tests/tools/test_transcription_tools.py (modified, +36/-0)
tools/transcription_tools.py (modified, +1/-0)

Code Example

ERROR tools.transcription_tools: Local transcription failed: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
  File ".../faster_whisper/transcribe.py", line 1400, in encode
    return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED

---

_CUDA_LIB_ERROR_MARKERS = (
    "libcublas",
    "libcudnn",
    "libcudart",
    "cannot be loaded",
    "cannot open shared object",
    "no kernel image is available",
    "no CUDA-capable device",
    "CUDA driver version is insufficient",
    "CUBLAS_STATUS_NOT_SUPPORTED",   # NEW: Blackwell sm_120
    "CUBLAS_STATUS_",                 # NEW: catch other arch-mismatch variants
)

---

faster-whisper CUDA runtime failed mid-transcribe (cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED) — evicting cached model and retrying on CPU (int8).

RAW_BUFFERClick to expand / collapse

Bug: Voice transcription crashes with CUBLAS_STATUS_NOT_SUPPORTED on RTX 5090 (Blackwell, sm_120). The _CUDA_LIB_ERROR_MARKERS fallback list in tools/transcription_tools.py doesn't include this Blackwell-specific cuBLAS error, so the load fails outright instead of falling back to CPU as designed.

Environment:

Hermes Agent v0.11.0 (v2026.4.23)
Ubuntu 24.04, Python 3.11.15
RTX 5090 (32 GB, sm_120 / Blackwell)
faster-whisper bundled in voice extra (CTranslate2 backend)

Repro:

Fresh Hermes install on a host with a Blackwell GPU
Send any voice message via Telegram → Hermes caches the .ogg
Transcription tries default device="auto" → loads on CUDA → at first transcribe call, CTranslate2 raises RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
The mid-transcribe fallback in _transcribe_local calls _looks_like_cuda_lib_error() to decide whether to fall back to CPU, but CUBLAS_STATUS_NOT_SUPPORTED is not in _CUDA_LIB_ERROR_MARKERS, so the exception is re-raised and the voice message dies silently from the user's perspective.
Manually running WhisperModel("base", device="cpu", compute_type="int8") works fine and transcribes correctly.

Stack trace excerpt:

ERROR tools.transcription_tools: Local transcription failed: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED
  File ".../faster_whisper/transcribe.py", line 1400, in encode
    return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED

Suggested fix: Add CUBLAS_STATUS_NOT_SUPPORTED (or the broader CUBLAS_STATUS_) to _CUDA_LIB_ERROR_MARKERS. The mid-transcribe fallback at _transcribe_local() will then evict the broken model and reload on CPU, identical to how it currently handles dlopen failures.

_CUDA_LIB_ERROR_MARKERS = (
    "libcublas",
    "libcudnn",
    "libcudart",
    "cannot be loaded",
    "cannot open shared object",
    "no kernel image is available",
    "no CUDA-capable device",
    "CUDA driver version is insufficient",
    "CUBLAS_STATUS_NOT_SUPPORTED",   # NEW: Blackwell sm_120
    "CUBLAS_STATUS_",                 # NEW: catch other arch-mismatch variants
)

Confirmation: Patched locally; voice message that previously crashed now transcribes correctly with the documented warning:

faster-whisper CUDA runtime failed mid-transcribe (cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED) — evicting cached model and retrying on CPU (int8).

Affected users: anyone with a Blackwell GPU (RTX 5090, B100, B200, etc.) and the bundled CTranslate2 wheel. Will become more common as 5090 adoption increases.

extent analysis

TL;DR

Add CUBLAS_STATUS_NOT_SUPPORTED to the _CUDA_LIB_ERROR_MARKERS list in tools/transcription_tools.py to enable fallback to CPU for Blackwell GPUs.

Guidance

Update the _CUDA_LIB_ERROR_MARKERS tuple to include CUBLAS_STATUS_NOT_SUPPORTED and potentially other architecture-mismatch variants by adding CUBLAS_STATUS_ to catch broader errors.
Verify the fix by testing voice transcription on a Blackwell GPU (e.g., RTX 5090) with the updated _CUDA_LIB_ERROR_MARKERS.
Confirm that the transcription falls back to CPU and completes successfully with a warning message indicating the CUDA runtime failure.
Be aware that this fix applies to users with Blackwell GPUs and the bundled CTranslate2 wheel, which may become more common as RTX 5090 adoption increases.

Example

_CUDA_LIB_ERROR_MARKERS = (
    "libcublas",
    "libcudnn",
    "libcudart",
    "cannot be loaded",
    "cannot open shared object",
    "no kernel image is available",
    "no CUDA-capable device",
    "CUDA driver version is insufficient",
    "CUBLAS_STATUS_NOT_SUPPORTED",   
    "CUBLAS_STATUS_",                 
)

Notes

This fix assumes that the issue is solely due to the missing error marker and that adding it will correctly trigger the fallback to CPU. Further testing may be necessary to ensure that other aspects of the transcription process are not affected.

Recommendation

Apply the workaround by updating the _CUDA_LIB_ERROR_MARKERS list, as this is a targeted fix for the specific error encountered on Blackwell GPUs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#generation error #database connection #vector store #embedding generation #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix STT: faster-whisper CUBLAS_STATUS_NOT_SUPPORTED on RTX 5090 (Blackwell sm_120) bypasses CPU fallback [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #17559: fix(stt): fall back on CUBLAS_STATUS_NOT_SUPPORTED

Description (problem / solution / changelog)

Summary

Root cause

Fix

Regression coverage

Testing

Changed files

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix STT: faster-whisper CUBLAS_STATUS_NOT_SUPPORTED on RTX 5090 (Blackwell sm_120) bypasses CPU fallback [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #17559: fix(stt): fall back on CUBLAS_STATUS_NOT_SUPPORTED

Description (problem / solution / changelog)

Summary

Root cause

Fix

Regression coverage

Testing

Changed files

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING