crewai - ✅(Solved) Fix [BUG] GeminiCompletion: thought output from thinking models is not accessible [2 pull requests, 1 participants]

schuay · 2026-02-28T12:02:34Z

[crewai] When using a Gemini thinking model e.g. gemini-2.5-pro, gemini-3.1-pro-preview with stream=True, crewai emits a warning and silently discards all thou… When using a Gemini thinking model (e.g. gemini-2.5-pro, gemini-3.1-pro-preview) with stream=True, crewai emits a warning and silently discards all thought content. There is currently no supported path to access the model's reasoning output. # PR #4648: fix: capture thought output from Gemini thinking models (#4647) - Repository: crewAIInc/crewAI - Author: devin-ai-integration[bot] - State: closed | merged: False - Link: https://github.com/crewAIInc/crewAI/pull/4648 ## Description (problem / solution / changelog) # fix: capture thought output from Gemini thinking models (#4647) ## Summary Gemini thinking models (e.g. `gemini-2.5-pro`, `gemini-2.5-flash`) produce "thought" parts alongside text parts in their responses. Previously, these thought parts were silently discarded, and using `chunk.text` on streaming responses containing non-text parts triggered SDK warnings. This PR: - Adds a `thinking_config` parameter to `GeminiCompletion` (accepts `ThinkingConfig` or dict), passed through to the generation config - Rewrites `_process_stream_chunk` to iterate over candidate parts directly instead of calling `chunk.text`, which avoids SDK warnings when non-text parts are present - Converts `_extract_text_from_response` from a `@staticmethod` to an instance method so it can store thought content - Captures thought parts in `self.previous_thoughts` (both streaming and non-streaming paths) - Adds 11 unit tests covering initialization, config propagation, thought extraction, and streaming behavior ## Review & Testing Checklist for Human - [ ] **Verify `_process_stream_chunk` rewrite doesn't break non-thinking streaming.** This is the highest-risk change — the old code used `chunk.text` then separately looped over parts for function calls. The new code uses a single loop over parts for everything. Test with regular (non-thinking) models with streaming enabled, especially with tool calling. - [ ] **Check the text part guard `not part.function_call`** in the streaming loop (line ~980). Is it possible for a real SDK `Part` to have both `.text` and `.function_call` set? If so, this guard is correct; if not, it's harmless but worth confirming. - [ ] **Confirm `previous_thoughts` is actually accessible downstream.** Thoughts are captured in `self.previous_thoughts` but there is no consumer shown in this diff that surfaces them via the LLM event system or return values. Verify this is sufficient for the issue reporter's use case, or whether an additional integration point is needed. - [ ] **Verify no callers of `_extract_text_from_response` use it as a static/class method.** It was changed from `@staticmethod` to an instance method — any `GeminiCompletion._extract_text_from_response(response)` calls would break. - [ ] **`previous_thoughts` accumulates indefinitely** — it is never cleared between `call()` invocations. Confirm this is acceptable behavior for multi-turn conversations or whether it should be reset per-call. ### Notes - The one pre-existing test failure (`test_gemini_raises_error_when_model_not_supported`) is unrelated to these changes - All 11 new tests pass locally; they use mocked `Part` objects rather than real SDK responses Requested by: João Link to Devin run: https://app.devin.ai/sessions/5d1a2a24e1e84fb7b3056281f054fc5c ## Changed files - `lib/crewai/src/crewai/llms/providers/gemini/completion.py` (modified, +64/-17) - `lib/crewai/tests/llms/google/test_google.py` (modified, +284/-0) --- # PR #4676: fix(gemini): surface thought output from thinking models - Repository: crewAIInc/crewAI - Author: greysonlalonde - State: closed | merged: True - Link: https://github.com/crewAIInc/crewAI/pull/4676 ## Description (problem / solution / changelog) Closes #4647 Iterates response parts directly instead of using `chunk.text`, enabling thought content to be emitted via `LLMThinkingChunkEvent` and eliminating the `thought_signature` warning. ## Changed files - `lib/crewai/src/crewai/events/types/llm_events.py` (modified, +8/-0) - `lib/crewai/src/crewai/llms/base_llm.py` (modified, +27/-9) - `lib/crewai/src/crewai/llms/providers/gemini/completion.py` (modified, +32/-10) ## Fix / Workaround Workaround (monkey-patch) Until this is fixed, both issues can be patched at runtime before any LLM() instantiation: def _patched_config(self, system_instruction=None, tools=None, response_model=None): config = _orig_config(self, system_instruction, tools, response_model) from google.genai import types config.thinking_config = types.ThinkingConfig(include_thoughts=True) return config ### Description When using a Gemini thinking model (e.g. gemini-2.5-pro, gemini-3.1-pro-preview) with stream=True, crewai emits a warning and silently discards all thought content. There is currently no supported path to access the model's reasoning output. ### Steps to Reproduce crewai[google-g

crewai2026-02-28 12:02:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

crewAIInc/crewAI#4647•Fetched 2026-04-08 00:40:55

View on GitHub

Comments

Participants

Timeline

Reactions

Author

schuay

Participants

schuay

Assignees

greysonlalonde

Timeline (top)

cross-referenced ×2assigned ×1closed ×1labeled ×1

When using a Gemini thinking model (e.g. gemini-2.5-pro, gemini-3.1-pro-preview) with stream=True, crewai emits a warning and silently discards all thought content. There is currently no supported path to access the model's reasoning output.

Root Cause

Fix Action

Fix / Workaround

Workaround (monkey-patch)

Until this is fixed, both issues can be patched at runtime before any LLM() instantiation:

def _patched_config(self, system_instruction=None, tools=None, response_model=None): config = _orig_config(self, system_instruction, tools, response_model) from google.genai import types config.thinking_config = types.ThinkingConfig(include_thoughts=True) return config

PR fix notes

PR #4648: fix: capture thought output from Gemini thinking models (#4647)

Repository: crewAIInc/crewAI
Author: devin-ai-integration[bot]
State: closed | merged: False
Link: https://github.com/crewAIInc/crewAI/pull/4648

Description (problem / solution / changelog)

fix: capture thought output from Gemini thinking models (#4647)

Summary

Gemini thinking models (e.g. gemini-2.5-pro, gemini-2.5-flash) produce "thought" parts alongside text parts in their responses. Previously, these thought parts were silently discarded, and using chunk.text on streaming responses containing non-text parts triggered SDK warnings.

This PR:

Adds a thinking_config parameter to GeminiCompletion (accepts ThinkingConfig or dict), passed through to the generation config
Rewrites _process_stream_chunk to iterate over candidate parts directly instead of calling chunk.text, which avoids SDK warnings when non-text parts are present
Converts _extract_text_from_response from a @staticmethod to an instance method so it can store thought content
Captures thought parts in self.previous_thoughts (both streaming and non-streaming paths)
Adds 11 unit tests covering initialization, config propagation, thought extraction, and streaming behavior

Review & Testing Checklist for Human

Verify _process_stream_chunk rewrite doesn't break non-thinking streaming. This is the highest-risk change — the old code used chunk.text then separately looped over parts for function calls. The new code uses a single loop over parts for everything. Test with regular (non-thinking) models with streaming enabled, especially with tool calling.
Check the text part guard not part.function_call in the streaming loop (line ~980). Is it possible for a real SDK Part to have both .text and .function_call set? If so, this guard is correct; if not, it's harmless but worth confirming.
Confirm previous_thoughts is actually accessible downstream. Thoughts are captured in self.previous_thoughts but there is no consumer shown in this diff that surfaces them via the LLM event system or return values. Verify this is sufficient for the issue reporter's use case, or whether an additional integration point is needed.
Verify no callers of _extract_text_from_response use it as a static/class method. It was changed from @staticmethod to an instance method — any GeminiCompletion._extract_text_from_response(response) calls would break.
previous_thoughts accumulates indefinitely — it is never cleared between call() invocations. Confirm this is acceptable behavior for multi-turn conversations or whether it should be reset per-call.

Notes

The one pre-existing test failure (test_gemini_raises_error_when_model_not_supported) is unrelated to these changes
All 11 new tests pass locally; they use mocked Part objects rather than real SDK responses

Requested by: João Link to Devin run: https://app.devin.ai/sessions/5d1a2a24e1e84fb7b3056281f054fc5c

Changed files

lib/crewai/src/crewai/llms/providers/gemini/completion.py (modified, +64/-17)
lib/crewai/tests/llms/google/test_google.py (modified, +284/-0)

PR #4676: fix(gemini): surface thought output from thinking models

Repository: crewAIInc/crewAI
Author: greysonlalonde
State: closed | merged: True
Link: https://github.com/crewAIInc/crewAI/pull/4676

Description (problem / solution / changelog)

Closes #4647

Iterates response parts directly instead of using chunk.text, enabling thought content to be emitted via LLMThinkingChunkEvent and eliminating the thought_signature warning.

Changed files

lib/crewai/src/crewai/events/types/llm_events.py (modified, +8/-0)
lib/crewai/src/crewai/llms/base_llm.py (modified, +27/-9)
lib/crewai/src/crewai/llms/providers/gemini/completion.py (modified, +32/-10)

Code Example

# completion.py line ~934
  if chunk.text:                  # <-- calls .text property → triggers warning
      full_response += chunk.text

---

from crewai.llms.providers.gemini.completion import GeminiCompletion

  _orig_config  = GeminiCompletion._prepare_generation_config
  _orig_chunk   = GeminiCompletion._process_stream_chunk

  def _patched_config(self, system_instruction=None, tools=None, response_model=None):
      config = _orig_config(self, system_instruction, tools, response_model)
      from google.genai import types
      config.thinking_config = types.ThinkingConfig(include_thoughts=True)
      return config

  def _patched_chunk(self, chunk, full_response, function_calls, usage_data,
                     from_task=None, from_agent=None):
      # Capture thought parts before _orig_chunk calls chunk.text
      if chunk.candidates:
          candidate = chunk.candidates[0]
          if candidate.content and candidate.content.parts:
              for part in candidate.content.parts:
                  if getattr(part, "thought", False) and part.text:
                      # Replace with your preferred sink: logger, event bus, callback, etc.
                      print(f"[thought] {part.text}", end="", flush=True)

      return _orig_chunk(self, chunk, full_response, function_calls,
                         usage_data, from_task, from_agent)

  GeminiCompletion._prepare_generation_config = _patched_config
  GeminiCompletion._process_stream_chunk      = _patched_chunk

RAW_BUFFERClick to expand / collapse

Description

Steps to Reproduce

crewai[google-genai] >= 0.28.8 google-genai (native provider path via GeminiCompletion) model: gemini/gemini-3.1-pro-preview (or any gemini-2.5+ thinking model) stream=True

Every streaming response involving a tool call produces:

WARNING:google_genai.types:Warning: there are non-text parts in the response: ['function_call', 'thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.

Thought content is never surfaced to the caller — not via events, not via callbacks, not via any public API.

Expected behavior

Thoughts should be surfaced.

Screenshots/Code snippets

None

Operating System

Ubuntu 20.04

Python Version

3.12

crewAI Version

1.10.0

crewAI Tools Version

1.10.0

Virtual Environment

Venv

Evidence

None

Possible Solution

Two issues in src/crewai/llms/providers/gemini/completion.py:

_prepare_generation_config does not set thinking_config

The method builds a types.GenerateContentConfig but never sets thinking_config. Without thinking_config=ThinkingConfig(include_thoughts=True), the Gemini API does not return thought text parts — only the opaque thought_signature metadata. This means thought content is never requested, let alone captured.

_process_stream_chunk uses chunk.text and ignores thought parts

  # completion.py line ~934
  if chunk.text:                  # <-- calls .text property → triggers warning
      full_response += chunk.text

The .text property on a GenerateContentResponse raises the warning whenever non-text parts (function_call, thought_signature) are present. The method then iterates candidate.content.parts but only handles part.function_call, skipping any parts where part.thought == True.

Workaround (monkey-patch)

Until this is fixed, both issues can be patched at runtime before any LLM() instantiation:

  from crewai.llms.providers.gemini.completion import GeminiCompletion

  _orig_config  = GeminiCompletion._prepare_generation_config
  _orig_chunk   = GeminiCompletion._process_stream_chunk

  def _patched_config(self, system_instruction=None, tools=None, response_model=None):
      config = _orig_config(self, system_instruction, tools, response_model)
      from google.genai import types
      config.thinking_config = types.ThinkingConfig(include_thoughts=True)
      return config

  def _patched_chunk(self, chunk, full_response, function_calls, usage_data,
                     from_task=None, from_agent=None):
      # Capture thought parts before _orig_chunk calls chunk.text
      if chunk.candidates:
          candidate = chunk.candidates[0]
          if candidate.content and candidate.content.parts:
              for part in candidate.content.parts:
                  if getattr(part, "thought", False) and part.text:
                      # Replace with your preferred sink: logger, event bus, callback, etc.
                      print(f"[thought] {part.text}", end="", flush=True)

      return _orig_chunk(self, chunk, full_response, function_calls,
                         usage_data, from_task, from_agent)

  GeminiCompletion._prepare_generation_config = _patched_config
  GeminiCompletion._process_stream_chunk      = _patched_chunk

Additional context

None

extent analysis

Fix Plan

To fix the issue of thought content not being surfaced when using a Gemini thinking model with stream=True, we need to modify the GeminiCompletion class.

Here are the steps:

Patch the _prepare_generation_config method to include thinking_config with include_thoughts=True.
Patch the _process_stream_chunk method to capture thought parts from the response.

Code Changes

from crewai.llms.providers.gemini.completion import GeminiCompletion
from google.genai import types

# Original methods
_orig_config  = GeminiCompletion._prepare_generation_config
_orig_chunk   = GeminiCompletion._process_stream_chunk

# Patched methods
def _patched_config(self, system_instruction=None, tools=None, response_model=None):
    config = _orig_config(self, system_instruction, tools, response_model)
    config.thinking_config = types.ThinkingConfig(include_thoughts=True)
    return config

def _patched_chunk(self, chunk, full_response, function_calls, usage_data,
                   from_task=None, from_agent=None):
    if chunk.candidates:
        candidate = chunk.candidates[0]
        if candidate.content and candidate.content.parts:
            for part in candidate.content.parts:
                if getattr(part, "thought", False) and part.text:
                    print(f"[thought] {part.text}", end="", flush=True)

    return _orig_chunk(self, chunk, full_response, function_calls,
                       usage_data, from_task, from_agent)

# Apply patches
GeminiCompletion._prepare_generation_config = _patched_config
GeminiCompletion._process_stream_chunk      = _patched_chunk

Verification

To verify that the fix worked, run your application with the patched GeminiCompletion class and check if thought content is being printed to the console.

Extra Tips

Make sure to apply the patches before instantiating any LLM objects.
You can replace the print statement in the _patched_chunk method with your preferred method of handling thought content, such as logging or sending it to an event bus.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Thoughts should be surfaced.

#api #ssr #installation #tensor shape #autograd error #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

crewai - ✅(Solved) Fix [BUG] GeminiCompletion: thought output from thinking models is not accessible [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #4648: fix: capture thought output from Gemini thinking models (#4647)

Description (problem / solution / changelog)

fix: capture thought output from Gemini thinking models (#4647)

Summary

Review & Testing Checklist for Human

Notes

Changed files

PR #4676: fix(gemini): surface thought output from thinking models

Description (problem / solution / changelog)

Changed files

Code Example

Description

Steps to Reproduce

Expected behavior

Screenshots/Code snippets

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Possible Solution

Additional context

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING