ollama - 💡(How to fix) Fix /v1/chat/completions not working with qwen3.5 multimodal model [2 comments, 2 participants]

ollama2026-03-09 08:04:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14728•Fetched 2026-04-08 00:32:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sweihub

Participants

rick-github

sweihub

Timeline (top)

commented ×2closed ×1labeled ×1

Error Message

[2026-03-09 16:03:19.776] [DEBUG] [ureq::stream:395] connecting to localhost:11434 at 127.0.0.1:11434 [2026-03-09 16:03:19.776] [DEBUG] [ureq::stream:202] created stream: Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 }) [2026-03-09 16:03:19.776] [DEBUG] [ureq::unit:261] sending request POST http://localhost:11434/v1/chat/completions [2026-03-09 16:03:19.776] [DEBUG] [ureq::unit:480] writing prelude: POST /v1/chat/completions HTTP/1.1 Host: localhost:11434 User-Agent: ureq/2.12.1 Accept: / Content-Type: application/json accept-encoding: gzip Content-Length: 3783642 [2026-03-09 16:03:45.965] [DEBUG] [ureq::response:396] Body entirely buffered (length: 206) [2026-03-09 16:03:45.965] [DEBUG] [ureq::pool:130] adding stream to pool: http|localhost|11434 -> Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 }) [2026-03-09 16:03:45.965] [DEBUG] [ureq::unit:314] response 500 to POST http://localhost:11434/v1/chat/completions [2026-03-09 16:03:45.965] [DEBUG] [ureq::stream:322] dropping stream: Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 }) Error: Failed to OCR page 1

Caused by: 0: Failed to connect to VML server at http://localhost:11434/v1 1: http://localhost:11434/v1/chat/completions: status code 500

Root Cause

Caused by: 0: Failed to connect to VML server at http://localhost:11434/v1 1: http://localhost:11434/v1/chat/completions: status code 500

Code Example

[2026-03-09 16:03:19.776] [DEBUG] [ureq::stream:395] connecting to localhost:11434 at 127.0.0.1:11434
[2026-03-09 16:03:19.776] [DEBUG] [ureq::stream:202] created stream: Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 })
[2026-03-09 16:03:19.776] [DEBUG] [ureq::unit:261] sending request POST http://localhost:11434/v1/chat/completions
[2026-03-09 16:03:19.776] [DEBUG] [ureq::unit:480] writing prelude: POST /v1/chat/completions HTTP/1.1
Host: localhost:11434
User-Agent: ureq/2.12.1
Accept: */*
Content-Type: application/json
accept-encoding: gzip
Content-Length: 3783642
[2026-03-09 16:03:45.965] [DEBUG] [ureq::response:396] Body entirely buffered (length: 206)
[2026-03-09 16:03:45.965] [DEBUG] [ureq::pool:130] adding stream to pool: http|localhost|11434 -> Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 })
[2026-03-09 16:03:45.965] [DEBUG] [ureq::unit:314] response 500 to POST http://localhost:11434/v1/chat/completions
[2026-03-09 16:03:45.965] [DEBUG] [ureq::stream:322] dropping stream: Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 })
Error: Failed to OCR page 1

Caused by:
    0: Failed to connect to VML server at http://localhost:11434/v1
    1: http://localhost:11434/v1/chat/completions: status code 500

---

RAW_BUFFERClick to expand / collapse

What is the issue?

Example: https://docs.ollama.com/api/openai-compatibility#/v1/chat/completions-with-vision-example

Not working with qwen3.5:27b which is a multimodal LLM.

[2026-03-09 16:03:19.776] [DEBUG] [ureq::stream:395] connecting to localhost:11434 at 127.0.0.1:11434
[2026-03-09 16:03:19.776] [DEBUG] [ureq::stream:202] created stream: Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 })
[2026-03-09 16:03:19.776] [DEBUG] [ureq::unit:261] sending request POST http://localhost:11434/v1/chat/completions
[2026-03-09 16:03:19.776] [DEBUG] [ureq::unit:480] writing prelude: POST /v1/chat/completions HTTP/1.1
Host: localhost:11434
User-Agent: ureq/2.12.1
Accept: */*
Content-Type: application/json
accept-encoding: gzip
Content-Length: 3783642
[2026-03-09 16:03:45.965] [DEBUG] [ureq::response:396] Body entirely buffered (length: 206)
[2026-03-09 16:03:45.965] [DEBUG] [ureq::pool:130] adding stream to pool: http|localhost|11434 -> Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 })
[2026-03-09 16:03:45.965] [DEBUG] [ureq::unit:314] response 500 to POST http://localhost:11434/v1/chat/completions
[2026-03-09 16:03:45.965] [DEBUG] [ureq::stream:322] dropping stream: Stream(TcpStream { addr: 127.0.0.1:57784, peer: 127.0.0.1:11434, fd: 3 })
Error: Failed to OCR page 1

Caused by:
    0: Failed to connect to VML server at http://localhost:11434/v1
    1: http://localhost:11434/v1/chat/completions: status code 500

Relevant log output

OS

Ubuntu 22.04.4 LTS

GPU

4 x A100

CPU

No response

Ollama version

ollama version is 0.17.4

extent analysis

Fix Plan

To resolve the issue with the multimodal LLM qwen3.5:27b, we need to adjust the configuration and potentially modify the code handling the POST /v1/chat/completions request. Here are the steps:

Increase the request timeout: The default timeout might be too low for large requests. Increase the timeout value in your ureq configuration.
Check VML server status: Ensure the VML server at http://localhost:11434/v1 is running and accessible.
Modify the request payload: The Content-Length header indicates a large payload. Consider compressing the request body or splitting it into smaller chunks.
Handle 500 status code: Implement error handling for the 500 status code response from the /v1/chat/completions endpoint.

Example code snippet in Rust to increase the request timeout and handle errors:

use ureq::Agent;

let agent = Agent::new();
agent.timeout(30000); // 30 seconds

let res = agent.post("http://localhost:11434/v1/chat/completions")
    .set("Content-Type", "application/json")
    .send_string(&request_body);

match res {
    Ok(response) => {
        if response.status() == 500 {
            // Handle 500 status code error
            println!("Error: {}", response.into_string().unwrap());
        } else {
            // Process successful response
        }
    }
    Err(err) => {
        // Handle request error
        println!("Error: {}", err);
    }
}

Verification

To verify the fix, send a test request to the /v1/chat/completions endpoint with a large payload and check the response status code. If the response is successful, the fix has worked.

Extra Tips

Monitor the VML server logs for any errors or issues.
Consider implementing retries for failed requests with a backoff strategy.
Review the ureq documentation for any configuration options that may help with large requests.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix /v1/chat/completions not working with qwen3.5 multimodal model [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix /v1/chat/completions not working with qwen3.5 multimodal model [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING