hermes - 💡(How to fix) Fix [vision] torchvision import initializes CUDA context, causing gateway/agent to reserve ~7GB VRAM each

hermes2026-05-20 11:28:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

v0.14.0 introduced vision_analyze pixel-through to vision-capable models (#22955), which added torchvision as a dependency. When torchvision is imported, it loads libcudart and initializes a CUDA context, which reserves ~7GB per process on NVIDIA GPUs.

Fix Action

Fix / Workaround

Upgrade to v0.14.0+
Start hermes gateway run
Run nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader
Observe gateway process reserving ~7GB VRAM

RAW_BUFFERClick to expand / collapse

Describe the bug

After upgrading to v0.14.0, both hermes gateway and hermes agent processes each reserve ~7GB of GPU VRAM at startup, even when idle and not performing any vision tasks. This is caused by torchvision being installed in the hermes venv — importing torchvision initializes a CUDA context that pre-allocates memory.

Impact

On systems with limited GPU memory (e.g., single GPU shared between inference services and Hermes), this ~14GB reservation significantly reduces available VRAM for other workloads (sglang, ComfyUI, etc.)
Users who rely on external multimodal LLMs (GPT-4o, Claude, etc.) do not benefit from the local vision_analyze pixel-through feature, making this VRAM cost purely wasteful

Root cause

Reproduction

Upgrade to v0.14.0+
Start hermes gateway run
Run nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader
Observe gateway process reserving ~7GB VRAM

Environment

Hermes Agent: v0.14.0
GPU: NVIDIA RTX PRO 6000 Blackwell 96GB
PyTorch: 2.11.0+cu130
torchvision: 0.26.0

Suggested fix

Lazy-import torchvision only when vision_analyze is actually called with a local vision model, not at gateway/agent startup
Or add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce pre-allocation
Or make torchvision an optional dependency that is only loaded when the active model supports local vision

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering