hermes - 💡(How to fix) Fix [Feature]: Make vision pre-analysis prompt configurable via auxiliary.vision.user_prompt

StepCodex · 2026-05-08T11:03:37Z

[hermes] Problem or Use Case The prompt used for vision pre-analysis is hardcoded in two places, while every other aspect of auxiliary.vision is configurable.… ### Problem or Use Case The prompt used for vision pre-analysis is hardcoded in two places, while every other aspect of `auxiliary.vision` is configurable. Both call sites use the same English string: - `gateway/run.py` — auto-analysis when a non-vision main model receives an image from the user - `run_agent.py::_describe_image_for_anthropic_fallback` — Anthropic Messages fallback path ``` Describe everything visible in this image in thorough detail. Include any text, code, data, objects, people, layout, colors, and any other notable visual information. ``` Three concrete problems for users running custom auxiliary vision models: 1. **Hallucination control** — vision models like Qwen-VL / MiMo-VL are prone to inventing text content, brand names, and locations. A structured prompt with explicit confidence levels ("verified" vs "likely" vs "uncertain") materially reduces this. Users running their own vision aux models often have a tuned prompt for their specific model. 2. **Output structure** — downstream main models benefit from consistent section headers (image type / scene / OCR text / etc.) for reliable extraction. Hardcoded "describe in detail" produces wildly different formats across calls. 3. **DRY violation** — the prompt is duplicated in two files. If a maintainer ever wants to improve it, they must remember both. Centralizing into config (or at minimum a single constant) prevents drift. ### Proposed Solution Add an optional config field: ```yaml auxiliary: vision: provider: ... model: ... user_prompt: "" # NEW — empty means use built-in default ``` Behavior: - Empty / unset → fall back to the current hardcoded English string (no breaking change for existing users) - Non-empty → use it verbatim as the `user_prompt` argument to `vision_analyze_tool` Both call sites read the same config field via a small helper like `_get_vision_user_prompt()` in `agent/auxiliary_client.py`. The helper also centralizes the default string so future maintainers only update one place. ### Alternatives Considered - **Per-call override at the tool level** — would require API changes to `vision_analyze_tool` and threading the option through the gateway path. The single config field covers most customization without API churn. - **Multiple named prompt presets** — premature; nobody has asked for it. Add later if demand emerges. - **System prompt customization for `vision_analyze_tool` itself** — separate concern; out of scope for this issue. ### Feature Type Configuration option ### Scope Small (single file, < 50 lines) ### Contribution - [x] I'd like to implement this myself and submit a PR

hermes2026-05-08 11:03:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

Describe everything visible in this image in thorough detail.
Include any text, code, data, objects, people, layout, colors,
and any other notable visual information.

---

auxiliary:
  vision:
    provider: ...
    model: ...
    user_prompt: ""    # NEW — empty means use built-in default

RAW_BUFFERClick to expand / collapse

Problem or Use Case

The prompt used for vision pre-analysis is hardcoded in two places, while every other aspect of auxiliary.vision is configurable. Both call sites use the same English string:

gateway/run.py — auto-analysis when a non-vision main model receives an image from the user
run_agent.py::_describe_image_for_anthropic_fallback — Anthropic Messages fallback path

Describe everything visible in this image in thorough detail.
Include any text, code, data, objects, people, layout, colors,
and any other notable visual information.

Three concrete problems for users running custom auxiliary vision models:

Hallucination control — vision models like Qwen-VL / MiMo-VL are prone to inventing text content, brand names, and locations. A structured prompt with explicit confidence levels ("verified" vs "likely" vs "uncertain") materially reduces this. Users running their own vision aux models often have a tuned prompt for their specific model.
Output structure — downstream main models benefit from consistent section headers (image type / scene / OCR text / etc.) for reliable extraction. Hardcoded "describe in detail" produces wildly different formats across calls.
DRY violation — the prompt is duplicated in two files. If a maintainer ever wants to improve it, they must remember both. Centralizing into config (or at minimum a single constant) prevents drift.

Proposed Solution

Add an optional config field:

auxiliary:
  vision:
    provider: ...
    model: ...
    user_prompt: ""    # NEW — empty means use built-in default

Behavior:

Empty / unset → fall back to the current hardcoded English string (no breaking change for existing users)
Non-empty → use it verbatim as the user_prompt argument to vision_analyze_tool

Both call sites read the same config field via a small helper like _get_vision_user_prompt() in agent/auxiliary_client.py. The helper also centralizes the default string so future maintainers only update one place.

Alternatives Considered

Per-call override at the tool level — would require API changes to vision_analyze_tool and threading the option through the gateway path. The single config field covers most customization without API churn.
Multiple named prompt presets — premature; nobody has asked for it. Add later if demand emerges.
System prompt customization for vision_analyze_tool itself — separate concern; out of scope for this issue.

Feature Type

Configuration option

Scope

Small (single file, < 50 lines)

Contribution

I'd like to implement this myself and submit a PR

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Make vision pre-analysis prompt configurable via auxiliary.vision.user_prompt

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Make vision pre-analysis prompt configurable via auxiliary.vision.user_prompt

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Still need to ship something?

RELATED_DISCOVERY

TRENDING