hermes - ✅(Solved) Fix [Feature]: Allow declaring supports_vision for custom-provider models (vision-only subset of #8731) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17940Fetched 2026-05-01 05:55:00
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Fix Action

Fix / Workaround

There is currently no config-level way to declare "this custom model supports vision." The only options are (a) patch the source, or (b) inject a synthetic entry into the local models.dev cache file.

This is a strict-minimum patch. It only fixes the strip path in run_agent.py. The auto-mode routing decision in agent/image_routing.py:_lookup_supports_vision is not changed, which means in default agent.image_input_mode: auto, images are still pre-processed through vision_analyze even after the override is set. To get the override to drive routing too, the user must also set:

  • #8731 / #8942 propose the broader feature: declare vision + reasoning + tools + streaming together, plus /model UI integration and a synthetic capability path in models_dev.py. PR #8942 has been waiting on review since 2026-04-13 (509 / -62 across 9 files). This issue/PR is the vision-only minimal subset that unblocks the most common pain point without making four design decisions at once. The broader change can land later as a superset.
  • Patching the local models.dev cache file works but doesn't survive cache refresh and isn't discoverable to other users of the same config.
  • Setting agent.image_input_mode: native alone is insufficient on its own — decide_image_input_mode() honors it and attaches images natively, but _model_supports_vision() still returns False and _prepare_messages_for_non_vision_model() strips the images downstream. With this PR, declaring supports_vision: true makes the strip path agree with the routing path.

PR fix notes

PR #17936: feat(agent): allow declaring supports_vision via user config

Description (problem / solution / changelog)

Closes #17940. Refs #8731.

What

Allow users to declare that a custom-provider model supports native image input, via either of:

# Top-level shortcut for the active model
model:
  provider: custom
  default: my-llava
  supports_vision: true

# Or per-model under providers (matches the schema in #8731)
model:
  provider: my-vllm     # name of the custom provider entry
  default: my-llava
providers:
  my-vllm:
    base_url: http://localhost:8000/v1
    models:
      my-llava:
        supports_vision: true

Why

Model capability is currently sourced exclusively from the models.dev catalog. For a custom provider — local vLLM, an internal proxy, an OpenAI-compatible endpoint with a non-public model — get_model_capabilities() returns None, _model_supports_vision() returns False, and _prepare_messages_for_non_vision_model() strips every image part out of the user turn before the request leaves the process. The user uploads a screenshot to a perfectly capable LLaVA fine-tune and the model never sees it.

With the override set, the strip path is bypassed and the existing provider adapters attach the image natively (OpenAI image_url, Anthropic image block, Gemini inlineData, etc.) on whatever transport the user has configured.

Scope (please read before reviewing)

This patch only fixes the strip path. It does not change the auto-mode routing in agent/image_routing.py:_lookup_supports_vision. So the override declared above only takes full effect when the user also pins:

agent:
  image_input_mode: native    # required for the override to drive routing

Without this, default auto mode still pre-processes images through vision_analyze even after the override is set. If reviewers prefer, I'm happy to extend _lookup_supports_vision in this same PR with an identical fallback — let me know.

Narrow scope on purpose — see #8942 for a broader, multi-capability take that's been waiting on review since 2026-04-13. This patch is the smallest change that unblocks the vision-only use case.

Named custom providers

Named entries under providers.<name> work even though _resolve_named_custom_runtime rewrites self.provider to the literal string "custom" at runtime: the lookup tries both self.provider and cfg.model.provider as candidate provider keys, so a config like the second example above resolves correctly. Covered by test_named_custom_provider_resolved_via_config_provider.

How to test

uv run pytest tests/run_agent/test_vision_aware_preprocessing.py -o addopts= -q

Four new tests:

  • test_top_level_model_override_winsmodel.supports_vision: true shortcut
  • test_per_provider_per_model_override_winsproviders.<id>.models.<id>.supports_vision: true
  • test_named_custom_provider_resolved_via_config_provider — runtime self.provider == "custom" while config holds the original name
  • test_override_false_disables_vision_for_models_dev_models — explicit false overrides a vision-capable models.dev entry

Notes

  • No change to _normalize_custom_provider_entry schema — models.<id> dicts already pass through unchanged so per-model supports_vision reaches cfg_get without a warning.
  • bool(override) coerces the YAML value as-is. YAML's native true/false (no quotes) round-trip to Python booleans, which is the supported form. Quoted strings like "false" would coerce to truthy — open to a stricter coercion if reviewers want.

Changed files

  • run_agent.py (modified, +24/-4)
  • tests/run_agent/test_vision_aware_preprocessing.py (modified, +40/-0)

Code Example

# Top-level shortcut (the legacy single-model config style)
model:
  provider: custom
  default: my-llava
  base_url: http://localhost:8000/v1
  supports_vision: true

# Per-model under providers (matches the schema proposed in #8731)
model:
  provider: my-vllm     # name of the custom provider entry
  default: my-llava
providers:
  my-vllm:
    base_url: http://localhost:8000/v1
    models:
      my-llava:
        supports_vision: true

---

agent:
  image_input_mode: native
RAW_BUFFERClick to expand / collapse

Problem or Use Case

For custom/local provider models that aren't in the models.dev catalog (local vLLM, internal proxy, OpenAI-compatible endpoint with a private fine-tune), agent.models_dev.get_model_capabilities() returns None. run_agent.AIAgent._model_supports_vision() then returns False, and _prepare_messages_for_non_vision_model() strips every image part out of the user turn before the request leaves the process.

Result: a vision-capable LLaVA / Qwen-VL / private fine-tune behind a custom provider never sees the user's image. The user gets a degraded text-only response with no warning.

There is currently no config-level way to declare "this custom model supports vision." The only options are (a) patch the source, or (b) inject a synthetic entry into the local models.dev cache file.

Proposed Solution

Accept a supports_vision: true flag in two places, both consulted before falling back to models.dev:

# Top-level shortcut (the legacy single-model config style)
model:
  provider: custom
  default: my-llava
  base_url: http://localhost:8000/v1
  supports_vision: true

# Per-model under providers (matches the schema proposed in #8731)
model:
  provider: my-vllm     # name of the custom provider entry
  default: my-llava
providers:
  my-vllm:
    base_url: http://localhost:8000/v1
    models:
      my-llava:
        supports_vision: true

Resolution order in _model_supports_vision():

  1. model.supports_vision
  2. providers.<self.provider>.models.<model>.supports_vision
  3. providers.<cfg.model.provider>.models.<model>.supports_vision — covers named custom providers, where runtime self.provider is rewritten to "custom" by _resolve_named_custom_runtime while the config still carries the user-declared name
  4. existing models_dev.get_model_capabilities() lookup

No schema change to _normalize_custom_provider_entrymodels.<id> dicts already pass through unchanged, so the new field reaches cfg_get without warnings.

Scope caveat (please read)

This is a strict-minimum patch. It only fixes the strip path in run_agent.py. The auto-mode routing decision in agent/image_routing.py:_lookup_supports_vision is not changed, which means in default agent.image_input_mode: auto, images are still pre-processed through vision_analyze even after the override is set. To get the override to drive routing too, the user must also set:

agent:
  image_input_mode: native

If reviewers prefer, the same fallback can be added to _lookup_supports_vision in this PR or a follow-up. Spelled out so reviewers can decide before merge.

Alternatives Considered

  • #8731 / #8942 propose the broader feature: declare vision + reasoning + tools + streaming together, plus /model UI integration and a synthetic capability path in models_dev.py. PR #8942 has been waiting on review since 2026-04-13 (509 / -62 across 9 files). This issue/PR is the vision-only minimal subset that unblocks the most common pain point without making four design decisions at once. The broader change can land later as a superset.
  • Patching the local models.dev cache file works but doesn't survive cache refresh and isn't discoverable to other users of the same config.
  • Setting agent.image_input_mode: native alone is insufficient on its own — decide_image_input_mode() honors it and attaches images natively, but _model_supports_vision() still returns False and _prepare_messages_for_non_vision_model() strips the images downstream. With this PR, declaring supports_vision: true makes the strip path agree with the routing path.

Feature Type

Configuration option

Scope

Small (single file, < 50 lines) — see #17936.

Contribution

  • I'd like to implement this myself and submit a PR — done in #17936.

Notes

  • Refs #8731. Does NOT close that issue — vision-only is a strict subset of what #8731 asks for.

extent analysis

TL;DR

To fix the issue where custom provider models are not recognized as vision-capable, add a supports_vision: true flag in the model configuration.

Guidance

  • Add the supports_vision flag to the top-level model configuration or per-model under providers, as shown in the proposed solution.
  • Set agent.image_input_mode: native to ensure the override drives routing.
  • Verify that the model is recognized as vision-capable by checking the return value of _model_supports_vision().
  • Note that this fix only addresses the strip path in run_agent.py and does not change the auto-mode routing decision in agent/image_routing.py.

Example

model:
  provider: custom
  default: my-llava
  base_url: http://localhost:8000/v1
  supports_vision: true

Notes

This fix is a strict-minimum patch and only fixes the strip path in run_agent.py. The broader feature proposed in #8731 is not addressed in this fix.

Recommendation

Apply the workaround by adding the supports_vision flag to the model configuration, as this is a simple and effective solution to the immediate problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Feature]: Allow declaring supports_vision for custom-provider models (vision-only subset of #8731) [1 pull requests, 1 participants]