ollama - 💡(How to fix) Fix Huge difference in image input tokens with local Qwen3.5 versions when format="json" specified [3 comments, 2 participants]

Code Example

IMG=$(base64 < Image_001.jpeg | tr -d '\n')

echo "{\"model\": \"qwen3.5:27b\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type:
 application/json" -d @-
{"model":"qwen3.5:27b","created_at":"2026-04-14T12:57:14.954993416Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":160473406601,"load_duration":6219169752,"prompt_eval_count":432,"prompt_eval_duration":724455268,"eval_count":2003,"eval_duration":152220636272}


echo "{\"model\": \"qwen3.5:27b-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -d @-
{"model":"qwen3.5:27b-q4_K_M","created_at":"2026-04-14T21:27:56.795489648Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":156916654288,"load_duration":8222672080,"prompt_eval_count":1789,"prompt_eval_duration":2405578208,"eval_count":300,"eval_duration":26354827648}

echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer $OLLAMA_API_KEY" -d @-
{"model":"qwen3.5:397b","created_at":"2026-04-14T12:53:46.888392524Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15905907345,"prompt_eval_count":432,"eval_count":775}

echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"qwen3.5:397b","created_at":"2026-04-14T21:51:49.78707483Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15994634694,"prompt_eval_count":432,"eval_count":889}

echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:56:58.18721552Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":90482110976,"load_duration":285760912,"prompt_eval_count":281,"prompt_eval_duration":99862896,"eval_count":915,"eval_duration":8977992150

echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:59:13.743806688Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":56701847024,"load_duration":220037664,"prompt_eval_count":286,"prompt_eval_duration":116230352,"eval_count":28,"eval_duration":2635753616}

---

What is the issue?

With local versions of Qwen3.5 (9b, 27b, 35b, 122b), using a blank prompt and a single image (attached) as input, I notice a huge difference in input tokens depending whether format is unspecified or assigned to "json" (prompt_eval_count increases from 432 to 1789).

This does not occur with the cloud version (397b-cloud) (prompt_eval_count remains unchanged at 432)

This does not occur either with other models such as gemma4 (prompt_eval_count slightly increases from 281 to 286)

This phenomenon can be easily reproduced using the attached image an the curl commands below.

What is the reason for that?

IMG=$(base64 < Image_001.jpeg | tr -d '\n')

echo "{\"model\": \"qwen3.5:27b\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type:
 application/json" -d @-
{"model":"qwen3.5:27b","created_at":"2026-04-14T12:57:14.954993416Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":160473406601,"load_duration":6219169752,"prompt_eval_count":432,"prompt_eval_duration":724455268,"eval_count":2003,"eval_duration":152220636272}


echo "{\"model\": \"qwen3.5:27b-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -d @-
{"model":"qwen3.5:27b-q4_K_M","created_at":"2026-04-14T21:27:56.795489648Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":156916654288,"load_duration":8222672080,"prompt_eval_count":1789,"prompt_eval_duration":2405578208,"eval_count":300,"eval_duration":26354827648}

echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer $OLLAMA_API_KEY" -d @-
{"model":"qwen3.5:397b","created_at":"2026-04-14T12:53:46.888392524Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15905907345,"prompt_eval_count":432,"eval_count":775}

echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"qwen3.5:397b","created_at":"2026-04-14T21:51:49.78707483Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15994634694,"prompt_eval_count":432,"eval_count":889}

echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:56:58.18721552Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":90482110976,"load_duration":285760912,"prompt_eval_count":281,"prompt_eval_duration":99862896,"eval_count":915,"eval_duration":8977992150

echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:59:13.743806688Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":56701847024,"load_duration":220037664,"prompt_eval_count":286,"prompt_eval_duration":116230352,"eval_count":28,"eval_duration":2635753616}

(note; the attached image is a fake bank transfer order with synthetic imaginary data)

Relevant log output

OS

Linux

GPU

NVIDIA GB10 (DGX SPARK)

CPU

No response

Ollama version

0.20.4

extent analysis

TL;DR

The issue can be mitigated by removing the "format": "json" parameter from the API request.

Guidance

The large difference in prompt_eval_count is observed only when the "format" is specified as "json" in the API request for local versions of Qwen3.5.
The issue does not occur with the cloud version of Qwen3.5 or with other models like gemma4, suggesting a potential model-specific or version-specific bug.
To verify, try removing the "format": "json" parameter from the API request and check if the prompt_eval_count remains consistent.
If the issue persists, further investigation into the model's implementation or the API's handling of the "format" parameter may be necessary.

Example

No code snippet is provided as the issue seems to be related to the API request parameters rather than code implementation.

Notes

The root cause of the issue is unclear, but it appears to be related to the interaction between the Qwen3.5 model and the "format": "json" parameter. Further debugging or investigation into the model's implementation or the API's handling of this parameter may be necessary to fully resolve the issue.

Recommendation

Apply workaround: Remove the "format": "json" parameter from the API request, as it seems to cause the discrepancy in prompt_eval_count for local versions of Qwen3.5. This change can help mitigate the issue until a more permanent fix is found.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Huge difference in image input tokens with local Qwen3.5 versions when format="json" specified [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Huge difference in image input tokens with local Qwen3.5 versions when format="json" specified [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING