hermes - 💡(How to fix) Fix [Bug]: MiniMax VLM (Vision) Not Working — Uses Wrong API Endpoint [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15715Fetched 2026-04-26 05:25:37
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Error Message

<head></head><h2 cid="n296" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Bug Description</span></h2><p cid="n297" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="strong" class="md-pair-s" style="box-sizing: border-box;"><strong style="box-sizing: border-box;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">A clear description of what's broken. Include error messages, tracebacks, or screenshots if relevant.</span></strong></span></p><p cid="n298" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">When using MiniMax-M2.7 as the vision model (</span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">imageModel</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">), Hermes Agent fails to analyze images. The same model works correctly in OpenClaw, which also uses MiniMax-M2.7.</span></p><h3 cid="n299" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.5em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.43; cursor: text; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Root Cause</span></h3><p cid="n300" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">MiniMax operates </span><span md-inline="strong" class="md-pair-s " style="box-sizing: border-box;"><strong style="box-sizing: border-box;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">two completely separate API endpoints</span></strong></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">:</span></p><figure class="md-table-fig table-figure" cid="n301" mdtype="table" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; margin: 1.2em 0px; overflow-x: auto; max-width: calc(100% + 16px); padding: 0px; cursor: default; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"> ### Additional Logs / Traceback (optional)

Root Cause

<head></head><h2 cid="n296" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Bug Description</span></h2><p cid="n297" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="strong" class="md-pair-s" style="box-sizing: border-box;"><strong style="box-sizing: border-box;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">A clear description of what's broken. Include error messages, tracebacks, or screenshots if relevant.</span></strong></span></p><p cid="n298" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">When using MiniMax-M2.7 as the vision model (</span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">imageModel</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">), Hermes Agent fails to analyze images. The same model works correctly in OpenClaw, which also uses MiniMax-M2.7.</span></p><h3 cid="n299" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.5em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.43; cursor: text; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Root Cause</span></h3><p cid="n300" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">MiniMax operates </span><span md-inline="strong" class="md-pair-s " style="box-sizing: border-box;"><strong style="box-sizing: border-box;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">two completely separate API endpoints</span></strong></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">:</span></p><figure class="md-table-fig table-figure" cid="n301" mdtype="table" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; margin: 1.2em 0px; overflow-x: auto; max-width: calc(100% + 16px); padding: 0px; cursor: default; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"> Endpoint | Purpose | Used by Hermes Agent? -- | -- | -- /anthropic/v1/messages | Standard chat (Anthropic-compatible format). Images are silently ignored — the model returns 200 but produces no visual reasoning. | Yes (incorrect) /v1/coding_plan/vlm | Dedicated vision endpoint. Accepts {"prompt": str, "image_url": "data:image/..."} and returns {"content": str}. | No

Code Example

Report:https://paste.rs/t1LtG
agent.log:https://paste.rs/xPd6L
gateway.log:https://paste.rs/UxfUQ

---

Verification using MiniMax VLM endpoint directly succeeds:


POST /v1/coding_plan/vlm
Request:  {"prompt": "What color is this image?", "image_url": "data:image/png;base64,..."}
Response: {"content": "The color of this image is red.", "base_resp": {"status_code": 0}}


The same image sent via `/anthropic/v1/messages` (current Hermes behavior) returns 200 but produces no visual analysis.
RAW_BUFFERClick to expand / collapse

Bug Description

<head></head><h2 cid="n296" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.75em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.225; cursor: text; border-bottom: 1px solid rgb(238, 238, 238); caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Bug Description</span></h2><p cid="n297" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="strong" class="md-pair-s" style="box-sizing: border-box;"><strong style="box-sizing: border-box;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">A clear description of what's broken. Include error messages, tracebacks, or screenshots if relevant.</span></strong></span></p><p cid="n298" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">When using MiniMax-M2.7 as the vision model (</span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">imageModel</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">), Hermes Agent fails to analyze images. The same model works correctly in OpenClaw, which also uses MiniMax-M2.7.</span></p><h3 cid="n299" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.5em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.43; cursor: text; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Root Cause</span></h3><p cid="n300" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">MiniMax operates </span><span md-inline="strong" class="md-pair-s " style="box-sizing: border-box;"><strong style="box-sizing: border-box;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">two completely separate API endpoints</span></strong></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">:</span></p><figure class="md-table-fig table-figure" cid="n301" mdtype="table" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; margin: 1.2em 0px; overflow-x: auto; max-width: calc(100% + 16px); padding: 0px; cursor: default; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"> Endpoint | Purpose | Used by Hermes Agent? -- | -- | -- /anthropic/v1/messages | Standard chat (Anthropic-compatible format). Images are silently ignored — the model returns 200 but produces no visual reasoning. | Yes (incorrect) /v1/coding_plan/vlm | Dedicated vision endpoint. Accepts {"prompt": str, "image_url": "data:image/..."} and returns {"content": str}. | No </figure><p cid="n314" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Hermes Agent sends images via the Anthropic-compatible format to </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">/anthropic/v1/messages</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">. The MiniMax server returns 200, but the image blocks are ignored (this endpoint only does text chat). The result is an empty or useless response, causing </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">vision_analyze</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> to fail.</span></p><p cid="n315" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">OpenClaw correctly routes vision requests to </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">/v1/coding_plan/vlm</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">.</span></p><h3 cid="n316" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.5em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.43; cursor: text; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Proposed Fix</span></h3><p cid="n317" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">In </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">agent/auxiliary_client.py</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">, add a MiniMax VLM code path that bypasses the standard </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">client.chat.completions.create</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> and calls </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">/v1/coding_plan/vlm</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> directly via HTTP.</span></p><p cid="n318" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Key implementation notes:</span></p><ul class="ul-list" cid="n319" mdtype="list" data-mark="-" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; margin: 0.8em 0px; padding-left: 30px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><li class="md-list-item" cid="n320" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><p cid="n321" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 1; margin: 0px 0px 0.5rem; white-space: pre-wrap; position: relative;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">API key resolution: check </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">minimax</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> and </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">minimax-cn</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> providers via </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">hermes_cli.auth</code></span></p></li><li class="md-list-item" cid="n322" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><p cid="n323" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 1; margin: 0px 0px 0.5rem; white-space: pre-wrap; position: relative;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Base URL: </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">api.minimaxi.com</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> for CN keys (prefix </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">sk-cp-</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> or </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">eyJ</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">), </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">api.minimax.io</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> for others</span></p></li><li class="md-list-item" cid="n324" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><p cid="n325" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 1; margin: 0px 0px 0.5rem; white-space: pre-wrap; position: relative;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Request: </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">POST /v1/coding_plan/vlm</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> with </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">{"prompt": "...", "image_url": "data:image/png;base64,..."}</code></span></p></li><li class="md-list-item" cid="n326" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><p cid="n327" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 1; margin: 0px 0px 0.5rem; white-space: pre-wrap; position: relative;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Response: </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">{"content": "...", "base_resp": {"status_code": 0, "status_msg": "success"}}</code></span></p></li><li class="md-list-item" cid="n328" mdtype="list_item" style="box-sizing: border-box; margin: 0px; position: relative;"><p cid="n329" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 1; margin: 0px 0px 0.5rem; white-space: pre-wrap; position: relative;"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Return a mock wrapper object with </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">.choices[0].message.content</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> to be compatible with existing </span><span md-inline="code" spellcheck="false" class="md-pair-s" style="box-sizing: border-box;"><code style="box-sizing: border-box; font-family: var(--monospace); text-align: left; vertical-align: initial; border: 1px solid rgb(231, 234, 237); background-color: rgb(243, 244, 244); border-radius: 3px; padding: 0px 2px; font-size: 0.9em;">extract_content_or_reasoning</code></span><span md-inline="plain" class="md-plain" style="box-sizing: border-box;"> logic</span></p></li></ul><h3 cid="n330" mdtype="heading" class="md-end-block md-heading" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; break-after: avoid-page; break-inside: avoid; orphans: 4; font-size: 1.5em; margin-top: 1rem; margin-bottom: 1rem; position: relative; font-weight: bold; line-height: 1.43; cursor: text; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Severity</span></h3><p cid="n331" mdtype="paragraph" class="md-end-block md-p" style="font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-line: none; text-decoration-thickness: auto; text-decoration-style: solid; box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; position: relative; caret-color: rgb(0, 122, 255); color: rgb(51, 51, 51); font-family: &quot;Open Sans&quot;, &quot;Clear Sans&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, &quot;Segoe UI Emoji&quot;, &quot;SF Pro&quot;, sans-serif; background-color: rgb(255, 255, 255);"><span md-inline="plain" class="md-plain" style="box-sizing: border-box;">Medium — vision is a core feature, and MiniMax-M2.7 is a capable multimodal model that should work.</span></p>

Steps to Reproduce

  1. Set imageModel: minimax-portal/MiniMax-M2.7 in config.yaml
  2. Send an image to Hermes Agent for vision analysis (e.g. "describe this image")
  3. Agent returns an empty or meaningless response — the image is not analyzed
  4. The MiniMax API returns 200 but the vision推理 does not occur

Expected Behavior

MiniMax-M2.7 should correctly analyze images when configured as the vision model, returning a text description of the image content.

Actual Behavior

Hermes Agent calls /anthropic/v1/messages with image content blocks. MiniMax's Anthropic-compatible endpoint ignores the image blocks and returns a text-only response, so no visual understanding occurs.

Affected Component

Other, Tools (terminal, file ops, web, code execution, etc.)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Report:https://paste.rs/t1LtG
agent.log:https://paste.rs/xPd6L
gateway.log:https://paste.rs/UxfUQ

Operating System

macOS 26.4.1 (BuildVersion 25E253)

Python Version

No response

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Verification using MiniMax VLM endpoint directly succeeds:


POST /v1/coding_plan/vlm
Request:  {"prompt": "What color is this image?", "image_url": "data:image/png;base64,..."}
Response: {"content": "The color of this image is red.", "base_resp": {"status_code": 0}}


The same image sent via `/anthropic/v1/messages` (current Hermes behavior) returns 200 but produces no visual analysis.

Root Cause Analysis (optional)

The issue is architectural: Hermes Agent routes all vision requests through call_llm with task="vision", which eventually calls client.chat.completions.create. For MiniMax, this hits the Anthropic-compatible endpoint which does not support multimodal input. OpenClaw solves this by routing MiniMax vision requests to the dedicated /v1/coding_plan/vlm endpoint instead.

Proposed Fix (optional)

Add a MiniMax VLM path in agent/auxiliary_client.py:

  1. Detect MiniMax as the vision provider in resolve_vision_provider_client
  2. Return a sentinel object instead of an OpenAI client
  3. In call_llm / async_call_llm, detect the sentinel and bypass client.chat.completions.create
  4. Call /v1/coding_plan/vlm directly with {"prompt": ..., "image_url": ...}
  5. Wrap the response in a mock ChatCompletion object for compatibility

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The issue can be fixed by adding a MiniMax VLM code path in agent/auxiliary_client.py to bypass the standard client.chat.completions.create and call /v1/coding_plan/vlm directly.

Guidance

  • Identify the vision provider in resolve_vision_provider_client and return a sentinel object for MiniMax.
  • Modify call_llm / async_call_llm to detect the sentinel and bypass client.chat.completions.create.
  • Call /v1/coding_plan/vlm directly with the prompt and image URL.
  • Wrap the response in a mock ChatCompletion object for compatibility.

Example

# agent/auxiliary_client.py
def resolve_vision_provider_client(provider):
    if provider == 'minimax':
        return SentinelObject()  # Return a sentinel object for MiniMax

def call_llm(task, prompt, image_url):
    if task == 'vision' and isinstance(client, SentinelObject):
        # Bypass client.chat.completions.create and call /v1/coding_plan/vlm directly
        response = requests.post('/v1/coding_plan/vlm', json={'prompt': prompt, 'image_url': image_url})
        return MockChatCompletion(response.json())

Notes

  • The proposed fix requires modifying the agent/auxiliary_client.py file to add a MiniMax VLM code path.
  • The fix assumes that the /v1/coding_plan/vlm endpoint is available and functional.

Recommendation

Apply the proposed workaround by adding a MiniMax VLM code path in agent/auxiliary_client.py to fix the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: MiniMax VLM (Vision) Not Working — Uses Wrong API Endpoint [1 participants]