hermes - 💡(How to fix) Fix [Feature Request]: Support vision/image input for kimi-k2.6 via Kimi Code CLI endpoint or Moonshot Open Platform

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Kimi Code CLI officially supports capabilities = ["image_in"] in its model configuration (docs):

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

However, the Kimi Code API endpoint (https://api.kimi.com/coding/v1) does not expose image input through the API — this is a client-side feature for the CLI's interactive terminal mode only.

The Moonshot Open Platform (https://api.moonshot.cn/v1) does support vision/multimodal input, as confirmed by the official platform comparison:

PlatformBase URLBest For
Kimi Codehttps://api.kimi.com/coding/v1Terminal/IDE Agent coding
Moonshot Open Platformhttps://api.moonshot.cn/v1Product integration, multimodal app development

Root Cause

Kimi Code CLI officially supports capabilities = ["image_in"] in its model configuration (docs):

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

However, the Kimi Code API endpoint (https://api.kimi.com/coding/v1) does not expose image input through the API — this is a client-side feature for the CLI's interactive terminal mode only.

The Moonshot Open Platform (https://api.moonshot.cn/v1) does support vision/multimodal input, as confirmed by the official platform comparison:

PlatformBase URLBest For
Kimi Codehttps://api.kimi.com/coding/v1Terminal/IDE Agent coding
Moonshot Open Platformhttps://api.moonshot.cn/v1Product integration, multimodal app development

Code Example

_PROVIDERS_WITHOUT_VISION = frozenset({"kimi-coding", "kimi-coding-cn"})

---

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]
RAW_BUFFERClick to expand / collapse

Problem

Hermes Agent currently hardcodes kimi-coding provider as vision-disabled in vision_tools.py:

_PROVIDERS_WITHOUT_VISION = frozenset({"kimi-coding", "kimi-coding-cn"})

This prevents users from using image input even when the underlying model (kimi-k2.6) supports vision capabilities.

Context

Kimi Code CLI officially supports capabilities = ["image_in"] in its model configuration (docs):

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

However, the Kimi Code API endpoint (https://api.kimi.com/coding/v1) does not expose image input through the API — this is a client-side feature for the CLI's interactive terminal mode only.

The Moonshot Open Platform (https://api.moonshot.cn/v1) does support vision/multimodal input, as confirmed by the official platform comparison:

PlatformBase URLBest For
Kimi Codehttps://api.kimi.com/coding/v1Terminal/IDE Agent coding
Moonshot Open Platformhttps://api.moonshot.cn/v1Product integration, multimodal app development

Current Behavior

When a user sends an image to Hermes with model.provider: kimi-coding, the agent responds:

"vision 工具未配置" / "No LLM provider configured for task=vision"

Desired Behavior

Option A (preferred): Allow kimi-coding provider to support vision by querying the model's actual capabilities dynamically, rather than hardcoding it as disabled. If the endpoint doesn't support it, fall back gracefully.

Option B: Add native support for Moonshot Open Platform (moonshot provider) with vision enabled, so users can configure it as either:

  • Primary provider (replace kimi-coding)
  • Auxiliary vision provider (auxiliary.vision.provider: moonshot)

Environment

  • Hermes version: latest
  • Provider: kimi-coding
  • Model: kimi-k2.6
  • OS: macOS

Additional Notes

  • The models_dev_cache.json currently marks all kimi-for-coding models with attachment=False
  • Users who only have a Kimi Code CLI API Key cannot access vision capabilities without applying for a separate Moonshot Open Platform Key
  • This creates friction for users who expect the same model (kimi-k2.6) to have consistent capabilities across different endpoints

Would love to see this addressed in a future release. Happy to provide more details or test beta builds.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature Request]: Support vision/image input for kimi-k2.6 via Kimi Code CLI endpoint or Moonshot Open Platform