hermes - 💡(How to fix) Fix [Setup]: Clarify whether `AIAgent` supports image payloads for custom/Ollama vision models [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20376Fetched 2026-05-06 06:36:59
View on GitHub
Comments
2
Participants
2
Timeline
10
Reactions
0
Author
Participants
Timeline (top)
labeled ×5commented ×2closed ×1mentioned ×1

Error Message

Full Error Output

Root Cause

I also saw internal support for image_url / input_image blocks, but the image path appears to depend on _model_supports_vision(). For provider="custom" and model="gemma4", that returns False because model capabilities are not found for custom/gemma4. As a result, image content seems to be converted into a text fallback rather than passed through natively to the custom endpoint.

Fix Action

Fix / Workaround

  1. Is AIAgent intended to support native image payloads through chat() or run_conversation()?
  2. If yes, what is the supported public API for sending an image alongside text?
  3. How should custom/OpenAI-compatible providers such as local Ollama declare that a model supports vision?
  4. Is there a recommended workaround for local Ollama vision models, or should callers bypass AIAgent and use the Ollama/OpenAI-compatible API directly for image inputs?

Code Example

from run_agent import AIAgent

agent = AIAgent(
    provider="custom",
    base_url="http://127.0.0.1:11434/v1",
    api_key="ollama",
    model="gemma4",
    quiet_mode=True,
)

---

AIAgent.chat(self, message: str, stream_callback=None) -> str
AIAgent.run_conversation(self, user_message: str, ...)

---

pixi run pip install git+https://github.com/NousResearch/hermes-agent.git

---



---

NA
RAW_BUFFERClick to expand / collapse

What's Going Wrong?

I am trying to understand whether run_agent.AIAgent supports native image message payloads when using an OpenAI-compatible custom endpoint, specifically local Ollama with a vision-capable model.

Example setup:

from run_agent import AIAgent

agent = AIAgent(
    provider="custom",
    base_url="http://127.0.0.1:11434/v1",
    api_key="ollama",
    model="gemma4",
    quiet_mode=True,
)

From local inspection, AIAgent.chat() appears to accept only a string:

AIAgent.chat(self, message: str, stream_callback=None) -> str
AIAgent.run_conversation(self, user_message: str, ...)

I also saw internal support for image_url / input_image blocks, but the image path appears to depend on _model_supports_vision(). For provider="custom" and model="gemma4", that returns False because model capabilities are not found for custom/gemma4. As a result, image content seems to be converted into a text fallback rather than passed through natively to the custom endpoint.

Could you clarify:

  1. Is AIAgent intended to support native image payloads through chat() or run_conversation()?
  2. If yes, what is the supported public API for sending an image alongside text?
  3. How should custom/OpenAI-compatible providers such as local Ollama declare that a model supports vision?
  4. Is there a recommended workaround for local Ollama vision models, or should callers bypass AIAgent and use the Ollama/OpenAI-compatible API directly for image inputs?

Steps Taken

pixi run pip install git+https://github.com/NousResearch/hermes-agent.git

Installation Method

Install script (curl | bash)

Operating System

CentOS

Python Version

3.14.4

Hermes Version

Python library

Debug Report

Full Error Output

NA

What I've Already Tried

No response

extent analysis

TL;DR

The AIAgent class may not natively support image payloads for custom providers, and a workaround might be necessary to use vision-capable models with local Ollama.

Guidance

  • Check the AIAgent documentation and source code for any hidden or undocumented features that might allow native image payload support.
  • Investigate the _model_supports_vision() method to understand how model capabilities are determined and if there's a way to override or extend this for custom providers.
  • Consider using the Ollama/OpenAI-compatible API directly for image inputs as a potential workaround, bypassing the AIAgent class.
  • Look into the image_url / input_image blocks and how they are handled internally to see if there's a way to leverage this functionality for custom providers.

Example

No code example is provided due to the lack of clear information on how to modify the AIAgent class to support native image payloads.

Notes

The AIAgent class seems to have limitations when used with custom providers, and the vision capabilities of models are not properly detected. Without further information or documentation, it's challenging to provide a definitive solution.

Recommendation

Apply workaround: Use the Ollama/OpenAI-compatible API directly for image inputs, as the AIAgent class may not support native image payloads for custom providers. This approach allows leveraging the vision capabilities of local Ollama models without relying on the AIAgent class.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING