hermes - 💡(How to fix) Fix [Feature]: Make vision pre-analysis prompt configurable via auxiliary.vision.user_prompt

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

Describe everything visible in this image in thorough detail.
Include any text, code, data, objects, people, layout, colors,
and any other notable visual information.

---

auxiliary:
  vision:
    provider: ...
    model: ...
    user_prompt: ""    # NEW — empty means use built-in default
RAW_BUFFERClick to expand / collapse

Problem or Use Case

The prompt used for vision pre-analysis is hardcoded in two places, while every other aspect of auxiliary.vision is configurable. Both call sites use the same English string:

  • gateway/run.py — auto-analysis when a non-vision main model receives an image from the user
  • run_agent.py::_describe_image_for_anthropic_fallback — Anthropic Messages fallback path
Describe everything visible in this image in thorough detail.
Include any text, code, data, objects, people, layout, colors,
and any other notable visual information.

Three concrete problems for users running custom auxiliary vision models:

  1. Hallucination control — vision models like Qwen-VL / MiMo-VL are prone to inventing text content, brand names, and locations. A structured prompt with explicit confidence levels ("verified" vs "likely" vs "uncertain") materially reduces this. Users running their own vision aux models often have a tuned prompt for their specific model.
  2. Output structure — downstream main models benefit from consistent section headers (image type / scene / OCR text / etc.) for reliable extraction. Hardcoded "describe in detail" produces wildly different formats across calls.
  3. DRY violation — the prompt is duplicated in two files. If a maintainer ever wants to improve it, they must remember both. Centralizing into config (or at minimum a single constant) prevents drift.

Proposed Solution

Add an optional config field:

auxiliary:
  vision:
    provider: ...
    model: ...
    user_prompt: ""    # NEW — empty means use built-in default

Behavior:

  • Empty / unset → fall back to the current hardcoded English string (no breaking change for existing users)
  • Non-empty → use it verbatim as the user_prompt argument to vision_analyze_tool

Both call sites read the same config field via a small helper like _get_vision_user_prompt() in agent/auxiliary_client.py. The helper also centralizes the default string so future maintainers only update one place.

Alternatives Considered

  • Per-call override at the tool level — would require API changes to vision_analyze_tool and threading the option through the gateway path. The single config field covers most customization without API churn.
  • Multiple named prompt presets — premature; nobody has asked for it. Add later if demand emerges.
  • System prompt customization for vision_analyze_tool itself — separate concern; out of scope for this issue.

Feature Type

Configuration option

Scope

Small (single file, < 50 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING