hermes - 💡(How to fix) Fix [Feature Request]: Support vision/image input for kimi-k2.6 via Kimi Code CLI endpoint or Moonshot Open Platform

hermes2026-05-16 13:04:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Kimi Code CLI officially supports capabilities = ["image_in"] in its model configuration (docs):

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

However, the Kimi Code API endpoint (https://api.kimi.com/coding/v1) does not expose image input through the API — this is a client-side feature for the CLI's interactive terminal mode only.

The Moonshot Open Platform (https://api.moonshot.cn/v1) does support vision/multimodal input, as confirmed by the official platform comparison:

Platform	Base URL	Best For
Kimi Code	`https://api.kimi.com/coding/v1`	Terminal/IDE Agent coding
Moonshot Open Platform	`https://api.moonshot.cn/v1`	Product integration, multimodal app development

Root Cause

Kimi Code CLI officially supports capabilities = ["image_in"] in its model configuration (docs):

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

However, the Kimi Code API endpoint (https://api.kimi.com/coding/v1) does not expose image input through the API — this is a client-side feature for the CLI's interactive terminal mode only.

The Moonshot Open Platform (https://api.moonshot.cn/v1) does support vision/multimodal input, as confirmed by the official platform comparison:

Platform	Base URL	Best For
Kimi Code	`https://api.kimi.com/coding/v1`	Terminal/IDE Agent coding
Moonshot Open Platform	`https://api.moonshot.cn/v1`	Product integration, multimodal app development

Code Example

_PROVIDERS_WITHOUT_VISION = frozenset({"kimi-coding", "kimi-coding-cn"})

---

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

RAW_BUFFERClick to expand / collapse

Problem

Hermes Agent currently hardcodes kimi-coding provider as vision-disabled in vision_tools.py:

_PROVIDERS_WITHOUT_VISION = frozenset({"kimi-coding", "kimi-coding-cn"})

This prevents users from using image input even when the underlying model (kimi-k2.6) supports vision capabilities.

Context

Kimi Code CLI officially supports capabilities = ["image_in"] in its model configuration (docs):

[models.kimi-for-coding]
provider = "kimi-for-coding"
model = "kimi-for-coding"
max_context_size = 262144
capabilities = ["thinking", "image_in"]

However, the Kimi Code API endpoint (https://api.kimi.com/coding/v1) does not expose image input through the API — this is a client-side feature for the CLI's interactive terminal mode only.

The Moonshot Open Platform (https://api.moonshot.cn/v1) does support vision/multimodal input, as confirmed by the official platform comparison:

Platform	Base URL	Best For
Kimi Code	`https://api.kimi.com/coding/v1`	Terminal/IDE Agent coding
Moonshot Open Platform	`https://api.moonshot.cn/v1`	Product integration, multimodal app development

Current Behavior

When a user sends an image to Hermes with model.provider: kimi-coding, the agent responds:

"vision 工具未配置" / "No LLM provider configured for task=vision"

Desired Behavior

Option A (preferred): Allow kimi-coding provider to support vision by querying the model's actual capabilities dynamically, rather than hardcoding it as disabled. If the endpoint doesn't support it, fall back gracefully.

Option B: Add native support for Moonshot Open Platform (moonshot provider) with vision enabled, so users can configure it as either:

Primary provider (replace kimi-coding)
Auxiliary vision provider (auxiliary.vision.provider: moonshot)

Environment

Hermes version: latest
Provider: kimi-coding
Model: kimi-k2.6
OS: macOS

Additional Notes

The models_dev_cache.json currently marks all kimi-for-coding models with attachment=False
Users who only have a Kimi Code CLI API Key cannot access vision capabilities without applying for a separate Moonshot Open Platform Key
This creates friction for users who expect the same model (kimi-k2.6) to have consistent capabilities across different endpoints

Would love to see this addressed in a future release. Happy to provide more details or test beta builds.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature Request]: Support vision/image input for kimi-k2.6 via Kimi Code CLI endpoint or Moonshot Open Platform

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Problem

Context

Current Behavior

Desired Behavior

Environment

Additional Notes

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature Request]: Support vision/image input for kimi-k2.6 via Kimi Code CLI endpoint or Moonshot Open Platform

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Problem

Context

Current Behavior

Desired Behavior

Environment

Additional Notes

Still need to ship something?

RELATED_DISCOVERY

TRENDING