hermes - 💡(How to fix) Fix [vision] torchvision import initializes CUDA context, causing gateway/agent to reserve ~7GB VRAM each

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

v0.14.0 introduced vision_analyze pixel-through to vision-capable models (#22955), which added torchvision as a dependency. When torchvision is imported, it loads libcudart and initializes a CUDA context, which reserves ~7GB per process on NVIDIA GPUs.

Fix Action

Fix / Workaround

  1. Upgrade to v0.14.0+
  2. Start hermes gateway run
  3. Run nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader
  4. Observe gateway process reserving ~7GB VRAM
RAW_BUFFERClick to expand / collapse

Describe the bug

After upgrading to v0.14.0, both hermes gateway and hermes agent processes each reserve ~7GB of GPU VRAM at startup, even when idle and not performing any vision tasks. This is caused by torchvision being installed in the hermes venv — importing torchvision initializes a CUDA context that pre-allocates memory.

Impact

  • On systems with limited GPU memory (e.g., single GPU shared between inference services and Hermes), this ~14GB reservation significantly reduces available VRAM for other workloads (sglang, ComfyUI, etc.)
  • Users who rely on external multimodal LLMs (GPT-4o, Claude, etc.) do not benefit from the local vision_analyze pixel-through feature, making this VRAM cost purely wasteful

Root cause

v0.14.0 introduced vision_analyze pixel-through to vision-capable models (#22955), which added torchvision as a dependency. When torchvision is imported, it loads libcudart and initializes a CUDA context, which reserves ~7GB per process on NVIDIA GPUs.

Reproduction

  1. Upgrade to v0.14.0+
  2. Start hermes gateway run
  3. Run nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader
  4. Observe gateway process reserving ~7GB VRAM

Environment

  • Hermes Agent: v0.14.0
  • GPU: NVIDIA RTX PRO 6000 Blackwell 96GB
  • PyTorch: 2.11.0+cu130
  • torchvision: 0.26.0

Suggested fix

  • Lazy-import torchvision only when vision_analyze is actually called with a local vision model, not at gateway/agent startup
  • Or add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce pre-allocation
  • Or make torchvision an optional dependency that is only loaded when the active model supports local vision

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING