litellm - 💡(How to fix) Fix [Bug]: minimax-m2.7 via Ollama Cloud fails on 2nd+ request with Internal Server Error [1 comments, 2 participants]

litellm2026-03-24 18:58:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24533•Fetched 2026-04-08 01:27:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

orrinwitt

Participants

ariqpradipa

orrinwitt

Timeline (top)

cross-referenced ×2commented ×1labeled ×1

When using minimax-m2.7:cloud (Ollama's naming — the :cloud suffix denotes the cloud-hosted version) through the Ollama Cloud API, the first request succeeds but every subsequent request fails with Internal Server Error. This includes starting a brand new session — still only one response before it breaks. This rules out conversation history as the cause.

Error Message

litellm.APIConnectionError: Ollama_chatException - {"error":"Internal Server Error (ref: <uuid>)"}

Root Cause

Code Example

import litellm
   litellm.api_base = "https://ollama.com/api"
   litellm.api_key = "your-ollama-cloud-api-key"

---

litellm.APIConnectionError: Ollama_chatException - {"error":"Internal Server Error (ref: <uuid>)"}

---

# thinking is set from reasoning_content in transform_request
if reasoning_content is not None:
    ollama_message["thinking"] = reasoning_content

# response remaps 'thinking' field to 'reasoning_content'
response_json_message["reasoning_content"] = response_json_message.get("thinking")

RAW_BUFFERClick to expand / collapse

Python version: 3.12 LiteLLM version: 1.82.4 OS: Linux

Description

Steps to Reproduce

Configure LiteLLM with the Ollama provider pointing to Ollama Cloud:

import litellm
litellm.api_base = "https://ollama.com/api"
litellm.api_key = "your-ollama-cloud-api-key"

Send a first chat completion request with model minimax-m2.7:cloud — succeeds
Send a second request (same or new conversation) — fails with Internal Server Error

Expected Behavior

Both requests should succeed.

Actual Behavior

First request: ✅ succeeds Second request: ❌ fails with:

litellm.APIConnectionError: Ollama_chatException - {"error":"Internal Server Error (ref: <uuid>)"}

Retries (3/3) also fail. Even starting a brand new session exhibits the same pattern.

Additional Context

Key observation: The same underlying model is available via OpenRouter as minimax/minimax-m2.7 and works correctly there. This suggests the model itself is fine — the bug is in LiteLLM's Ollama adapter handling of this specific cloud model's streaming response.

Working models via Ollama Cloud:

GLM-5 (also a thinking model) — ✅ works perfectly, multiple requests
Non-thinking models — ✅ work fine

Not working:

minimax-m2.7:cloud via Ollama Cloud — ❌ fails after first request
Same model via OpenRouter (minimax/minimax-m2.7) — ✅ works

Hypothesis: minimax-m2.7:cloud may stream its thinking/reasoning content differently from other thinking models (e.g., GLM-5), causing the Ollama chat transformation or streaming iterator to corrupt subsequent requests. This could be related to how the thinking content field is handled differently between the two models.

Relevant Code

The Ollama chat transformation in litellm/llms/ollama/chat/transformation.py has reasoning content handling:

# thinking is set from reasoning_content in transform_request
if reasoning_content is not None:
    ollama_message["thinking"] = reasoning_content

# response remaps 'thinking' field to 'reasoning_content'
response_json_message["reasoning_content"] = response_json_message.get("thinking")

The streaming iterator OllamaChatCompletionResponseIterator tracks started_reasoning_content and finished_reasoning_content flags to strip <think> XML tags from content. If the model streams thinking differently, these flags could get into a bad state.

Related Issues

Similar: litellm#15399 — "Ollama Cloud Models Streaming Chunk Parsing Failure" (closed, about deepseek-v3.1:671b-cloud)

Investigation Needed

A raw streaming log of minimax-m2.7:cloud vs GLM-5 chunks from Ollama Cloud would help identify the difference. The issue likely requires comparing the actual SSE chunk format between the two models to see if minimax-m2.7:cloud uses a non-standard thinking field format or streams it differently.

extent analysis

Fix Plan

To address the issue with minimax-m2.7:cloud failing after the first request, we need to modify the Ollama chat transformation and streaming iterator in litellm to correctly handle the thinking field for this specific model.

Step 1: Modify `transform_request` in `transformation.py`

Update the thinking field handling to accommodate potential differences in streaming formats:

if reasoning_content is not None:
    ollama_message["thinking"] = [reasoning_content]  # Ensure it's a list

Step 2: Update `OllamaChatCompletionResponseIterator`

Modify the flags and parsing logic to handle potential variations in thinking field streaming:

class OllamaChatCompletionResponseIterator:
    def __init__(self, response):
        # ...
        self.thinking_content = []  # Initialize an empty list
        self.in_thinking = False

    def parse_chunk(self, chunk):
        # ...
        if "thinking" in chunk:
            self.in_thinking = True
            self.thinking_content.append(chunk["thinking"])
        # ...
        if self.in_thinking and "end_thinking" in chunk:
            self.in_thinking = False
            # Process the accumulated thinking content
            response_json_message["reasoning_content"] = "\n".join(self.thinking_content)
            self.thinking_content = []

Step 3: Test with `minimax-m2.7:cloud`

Send multiple requests with the updated litellm code to verify that the issue is resolved.

Verification

To verify the fix, run the following test:

import litellm

# Configure LiteLLM with the Ollama provider
litellm.api_base = "https://ollama.com/api"
litellm.api_key = "your-ollama-cloud-api-key"

# Send multiple requests with minimax-m2.7:cloud
for _ in range(5):
    response = litellm.chat_completion("Hello, how are you?", model="minimax-m2.7:cloud")
    print(response)

If all requests succeed without errors, the fix is successful.

Extra Tips

Monitor the thinking field format in the streaming response from minimax-m2.7:cloud to ensure it aligns with the updated parsing logic.
Consider adding additional logging or debugging statements to help identify any future issues with the thinking field handling.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #conversation history #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: minimax-m2.7 via Ollama Cloud fails on 2nd+ request with Internal Server Error [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Relevant Code

Related Issues

Investigation Needed

extent analysis

Fix Plan

Step 1: Modify `transform_request` in `transformation.py`

Step 2: Update `OllamaChatCompletionResponseIterator`

Step 3: Test with `minimax-m2.7:cloud`

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: minimax-m2.7 via Ollama Cloud fails on 2nd+ request with Internal Server Error [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Relevant Code

Related Issues

Investigation Needed

extent analysis

Fix Plan

Step 1: Modify transform_request in transformation.py

Step 2: Update OllamaChatCompletionResponseIterator

Step 3: Test with minimax-m2.7:cloud

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Step 1: Modify `transform_request` in `transformation.py`

Step 2: Update `OllamaChatCompletionResponseIterator`

Step 3: Test with `minimax-m2.7:cloud`