crewai - ✅(Solved) Fix [BUG] GeminiCompletion: thought output from thinking models is not accessible [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#4647Fetched 2026-04-08 00:40:55
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2assigned ×1closed ×1labeled ×1

When using a Gemini thinking model (e.g. gemini-2.5-pro, gemini-3.1-pro-preview) with stream=True, crewai emits a warning and silently discards all thought content. There is currently no supported path to access the model's reasoning output.

Root Cause

When using a Gemini thinking model (e.g. gemini-2.5-pro, gemini-3.1-pro-preview) with stream=True, crewai emits a warning and silently discards all thought content. There is currently no supported path to access the model's reasoning output.

Fix Action

Fix / Workaround

Workaround (monkey-patch)

Until this is fixed, both issues can be patched at runtime before any LLM() instantiation:

def _patched_config(self, system_instruction=None, tools=None, response_model=None): config = _orig_config(self, system_instruction, tools, response_model) from google.genai import types config.thinking_config = types.ThinkingConfig(include_thoughts=True) return config

PR fix notes

PR #4648: fix: capture thought output from Gemini thinking models (#4647)

Description (problem / solution / changelog)

fix: capture thought output from Gemini thinking models (#4647)

Summary

Gemini thinking models (e.g. gemini-2.5-pro, gemini-2.5-flash) produce "thought" parts alongside text parts in their responses. Previously, these thought parts were silently discarded, and using chunk.text on streaming responses containing non-text parts triggered SDK warnings.

This PR:

  • Adds a thinking_config parameter to GeminiCompletion (accepts ThinkingConfig or dict), passed through to the generation config
  • Rewrites _process_stream_chunk to iterate over candidate parts directly instead of calling chunk.text, which avoids SDK warnings when non-text parts are present
  • Converts _extract_text_from_response from a @staticmethod to an instance method so it can store thought content
  • Captures thought parts in self.previous_thoughts (both streaming and non-streaming paths)
  • Adds 11 unit tests covering initialization, config propagation, thought extraction, and streaming behavior

Review & Testing Checklist for Human

  • Verify _process_stream_chunk rewrite doesn't break non-thinking streaming. This is the highest-risk change — the old code used chunk.text then separately looped over parts for function calls. The new code uses a single loop over parts for everything. Test with regular (non-thinking) models with streaming enabled, especially with tool calling.
  • Check the text part guard not part.function_call in the streaming loop (line ~980). Is it possible for a real SDK Part to have both .text and .function_call set? If so, this guard is correct; if not, it's harmless but worth confirming.
  • Confirm previous_thoughts is actually accessible downstream. Thoughts are captured in self.previous_thoughts but there is no consumer shown in this diff that surfaces them via the LLM event system or return values. Verify this is sufficient for the issue reporter's use case, or whether an additional integration point is needed.
  • Verify no callers of _extract_text_from_response use it as a static/class method. It was changed from @staticmethod to an instance method — any GeminiCompletion._extract_text_from_response(response) calls would break.
  • previous_thoughts accumulates indefinitely — it is never cleared between call() invocations. Confirm this is acceptable behavior for multi-turn conversations or whether it should be reset per-call.

Notes

  • The one pre-existing test failure (test_gemini_raises_error_when_model_not_supported) is unrelated to these changes
  • All 11 new tests pass locally; they use mocked Part objects rather than real SDK responses

Requested by: João Link to Devin run: https://app.devin.ai/sessions/5d1a2a24e1e84fb7b3056281f054fc5c

Changed files

  • lib/crewai/src/crewai/llms/providers/gemini/completion.py (modified, +64/-17)
  • lib/crewai/tests/llms/google/test_google.py (modified, +284/-0)

PR #4676: fix(gemini): surface thought output from thinking models

Description (problem / solution / changelog)

Closes #4647

Iterates response parts directly instead of using chunk.text, enabling thought content to be emitted via LLMThinkingChunkEvent and eliminating the thought_signature warning.

Changed files

  • lib/crewai/src/crewai/events/types/llm_events.py (modified, +8/-0)
  • lib/crewai/src/crewai/llms/base_llm.py (modified, +27/-9)
  • lib/crewai/src/crewai/llms/providers/gemini/completion.py (modified, +32/-10)

Code Example

# completion.py line ~934
  if chunk.text:                  # <-- calls .text property → triggers warning
      full_response += chunk.text

---

from crewai.llms.providers.gemini.completion import GeminiCompletion

  _orig_config  = GeminiCompletion._prepare_generation_config
  _orig_chunk   = GeminiCompletion._process_stream_chunk

  def _patched_config(self, system_instruction=None, tools=None, response_model=None):
      config = _orig_config(self, system_instruction, tools, response_model)
      from google.genai import types
      config.thinking_config = types.ThinkingConfig(include_thoughts=True)
      return config

  def _patched_chunk(self, chunk, full_response, function_calls, usage_data,
                     from_task=None, from_agent=None):
      # Capture thought parts before _orig_chunk calls chunk.text
      if chunk.candidates:
          candidate = chunk.candidates[0]
          if candidate.content and candidate.content.parts:
              for part in candidate.content.parts:
                  if getattr(part, "thought", False) and part.text:
                      # Replace with your preferred sink: logger, event bus, callback, etc.
                      print(f"[thought] {part.text}", end="", flush=True)

      return _orig_chunk(self, chunk, full_response, function_calls,
                         usage_data, from_task, from_agent)

  GeminiCompletion._prepare_generation_config = _patched_config
  GeminiCompletion._process_stream_chunk      = _patched_chunk
RAW_BUFFERClick to expand / collapse

Description

When using a Gemini thinking model (e.g. gemini-2.5-pro, gemini-3.1-pro-preview) with stream=True, crewai emits a warning and silently discards all thought content. There is currently no supported path to access the model's reasoning output.

Steps to Reproduce

crewai[google-genai] >= 0.28.8 google-genai (native provider path via GeminiCompletion) model: gemini/gemini-3.1-pro-preview (or any gemini-2.5+ thinking model) stream=True

Every streaming response involving a tool call produces:

WARNING:google_genai.types:Warning: there are non-text parts in the response: ['function_call', 'thought_signature'], returning concatenated text result from text parts. Check the full candidates.content.parts accessor to get the full model response.

Thought content is never surfaced to the caller — not via events, not via callbacks, not via any public API.

Expected behavior

Thoughts should be surfaced.

Screenshots/Code snippets

None

Operating System

Ubuntu 20.04

Python Version

3.12

crewAI Version

1.10.0

crewAI Tools Version

1.10.0

Virtual Environment

Venv

Evidence

None

Possible Solution

Two issues in src/crewai/llms/providers/gemini/completion.py:

  1. _prepare_generation_config does not set thinking_config

The method builds a types.GenerateContentConfig but never sets thinking_config. Without thinking_config=ThinkingConfig(include_thoughts=True), the Gemini API does not return thought text parts — only the opaque thought_signature metadata. This means thought content is never requested, let alone captured.

  1. _process_stream_chunk uses chunk.text and ignores thought parts
  # completion.py line ~934
  if chunk.text:                  # <-- calls .text property → triggers warning
      full_response += chunk.text

The .text property on a GenerateContentResponse raises the warning whenever non-text parts (function_call, thought_signature) are present. The method then iterates candidate.content.parts but only handles part.function_call, skipping any parts where part.thought == True.

Workaround (monkey-patch)

Until this is fixed, both issues can be patched at runtime before any LLM() instantiation:

  from crewai.llms.providers.gemini.completion import GeminiCompletion

  _orig_config  = GeminiCompletion._prepare_generation_config
  _orig_chunk   = GeminiCompletion._process_stream_chunk

  def _patched_config(self, system_instruction=None, tools=None, response_model=None):
      config = _orig_config(self, system_instruction, tools, response_model)
      from google.genai import types
      config.thinking_config = types.ThinkingConfig(include_thoughts=True)
      return config

  def _patched_chunk(self, chunk, full_response, function_calls, usage_data,
                     from_task=None, from_agent=None):
      # Capture thought parts before _orig_chunk calls chunk.text
      if chunk.candidates:
          candidate = chunk.candidates[0]
          if candidate.content and candidate.content.parts:
              for part in candidate.content.parts:
                  if getattr(part, "thought", False) and part.text:
                      # Replace with your preferred sink: logger, event bus, callback, etc.
                      print(f"[thought] {part.text}", end="", flush=True)

      return _orig_chunk(self, chunk, full_response, function_calls,
                         usage_data, from_task, from_agent)

  GeminiCompletion._prepare_generation_config = _patched_config
  GeminiCompletion._process_stream_chunk      = _patched_chunk

Additional context

None

extent analysis

Fix Plan

To fix the issue of thought content not being surfaced when using a Gemini thinking model with stream=True, we need to modify the GeminiCompletion class.

Here are the steps:

  • Patch the _prepare_generation_config method to include thinking_config with include_thoughts=True.
  • Patch the _process_stream_chunk method to capture thought parts from the response.

Code Changes

from crewai.llms.providers.gemini.completion import GeminiCompletion
from google.genai import types

# Original methods
_orig_config  = GeminiCompletion._prepare_generation_config
_orig_chunk   = GeminiCompletion._process_stream_chunk

# Patched methods
def _patched_config(self, system_instruction=None, tools=None, response_model=None):
    config = _orig_config(self, system_instruction, tools, response_model)
    config.thinking_config = types.ThinkingConfig(include_thoughts=True)
    return config

def _patched_chunk(self, chunk, full_response, function_calls, usage_data,
                   from_task=None, from_agent=None):
    if chunk.candidates:
        candidate = chunk.candidates[0]
        if candidate.content and candidate.content.parts:
            for part in candidate.content.parts:
                if getattr(part, "thought", False) and part.text:
                    print(f"[thought] {part.text}", end="", flush=True)

    return _orig_chunk(self, chunk, full_response, function_calls,
                       usage_data, from_task, from_agent)

# Apply patches
GeminiCompletion._prepare_generation_config = _patched_config
GeminiCompletion._process_stream_chunk      = _patched_chunk

Verification

To verify that the fix worked, run your application with the patched GeminiCompletion class and check if thought content is being printed to the console.

Extra Tips

  • Make sure to apply the patches before instantiating any LLM objects.
  • You can replace the print statement in the _patched_chunk method with your preferred method of handling thought content, such as logging or sending it to an event bus.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Thoughts should be surfaced.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

crewai - ✅(Solved) Fix [BUG] GeminiCompletion: thought output from thinking models is not accessible [2 pull requests, 1 participants]