crewai - ✅(Solved) Fix [Bug] Shared LLM stop words mutation causes cross-agent state pollution [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#5141Fetched 2026-04-08 01:44:55
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2referenced ×2

Error Message

  • Silent behavioral degradation — no error raised, just subtly wrong LLM outputs

Root Cause

In crew_agent_executor.py (lines 164-173):

if self.llm:
    # This may be mutating the shared llm object and needs further evaluation
    existing_stop = getattr(self.llm, "stop", [])
    self.llm.stop = list(
        set(
            existing_stop + self.stop
            if isinstance(existing_stop, list)
            else self.stop
        )
    )

The code itself has a comment acknowledging this: "This may be mutating the shared llm object and needs further evaluation".

Fix Action

Fixed

PR fix notes

PR #5142: fix: prevent shared LLM stop words mutation across agents (#5141)

Description (problem / solution / changelog)

Summary

Fixes #5141. When multiple agents share the same LLM instance, the executor's __init__ was directly mutating self.llm.stop on the shared object. This caused stop words to accumulate across agents and across repeated crew.kickoff() calls, leading to incorrect LLM behavior.

The fix creates a shallow copy of the LLM via copy.copy() before merging stop words, so each executor gets an isolated stop word list. The copy is only made when new stop words are actually being added (to avoid unnecessary allocations). Applied identically to both CrewAgentExecutor and experimental AgentExecutor.

Review & Testing Checklist for Human

  • Verify copy.copy() is safe on your LLM subclasses — shallow copy shares all internal references (connection pools, caches, provider state). Confirm no LLM subclass relies on identity (is) checks or has mutable internal state that would break when shared between original and copy. This is the highest-risk aspect of this change.
  • Test with a real multi-agent crew — create a crew with 2+ agents sharing one LLM(model=..., stop=["X"]) instance, run kickoff() multiple times, and confirm stop words don't grow on the shared LLM and each agent behaves correctly.
  • Order-sensitive comparison may cause unnecessary copiesmerged_stop is built from set(), so ["A", "B"] could become ["B", "A"]. The != check against existing_stop is order-sensitive, meaning even when no new stop words are added but order differs, a copy is still made. This is harmless but slightly wasteful. Consider using set() comparison instead if this matters.

Notes

  • The or [] addition (getattr(self.llm, "stop", []) or []) is a minor behavioral change that converts an explicit None on stop to []. This is likely correct since downstream code expects a list.
  • 7 new regression tests added across both test files covering: single executor isolation, multi-executor isolation, no-copy optimization, and repeated kickoff accumulation.
  • CI note: the tests (3.10) failure (test_custom_llm_within_crew) is a pre-existing flaky test unrelated to this change. The tests (3.11/3.12/3.13) runs were cancelled (not failed) due to CI infrastructure behavior. Lint, type-checker (all versions), and CodeQL all pass.

Link to Devin session: https://app.devin.ai/sessions/0807fced3a7b4e4ab5cbe3a6a0d7ed2e

Changed files

  • lib/crewai/src/crewai/agents/crew_agent_executor.py (modified, +12/-3)
  • lib/crewai/src/crewai/experimental/agent_executor.py (modified, +12/-2)
  • lib/crewai/tests/agents/test_agent.py (modified, +81/-0)
  • lib/crewai/tests/agents/test_agent_executor.py (modified, +147/-0)

Code Example

if self.llm:
    # This may be mutating the shared llm object and needs further evaluation
    existing_stop = getattr(self.llm, "stop", [])
    self.llm.stop = list(
        set(
            existing_stop + self.stop
            if isinstance(existing_stop, list)
            else self.stop
        )
    )

---

from crewai import Agent, Crew, Task, LLM

# One shared LLM instance
llm = LLM(model="gpt-4o")

agent_a = Agent(role="Researcher", llm=llm, ...)
agent_b = Agent(role="Writer", llm=llm, ...)

crew = Crew(agents=[agent_a, agent_b], tasks=[...])
crew.kickoff()

# After kickoff, llm.stop contains stop words from BOTH agents,
# and this persists into future crew.kickoff() calls

---

# Option A: Copy the stop words locally, don't mutate the shared LLM
if self.llm:
    existing_stop = getattr(self.llm, "stop", []) or []
    self._effective_stop = list(set(existing_stop + self.stop))
    # Use self._effective_stop when making LLM calls instead of self.llm.stop

# Option B: Reset stop words after execution in a finally block
RAW_BUFFERClick to expand / collapse

Bug Description

When multiple agents in a Crew share the same LLM instance, each CrewAgentExecutor.__init__ mutates the shared LLM's stop attribute, causing stop words to accumulate across agents and pollute each other's generation behavior.

Root Cause

In crew_agent_executor.py (lines 164-173):

if self.llm:
    # This may be mutating the shared llm object and needs further evaluation
    existing_stop = getattr(self.llm, "stop", [])
    self.llm.stop = list(
        set(
            existing_stop + self.stop
            if isinstance(existing_stop, list)
            else self.stop
        )
    )

The code itself has a comment acknowledging this: "This may be mutating the shared llm object and needs further evaluation".

Why this is a problem:

  1. Sequential execution: Agent A adds its stop words to the LLM. Agent B then inherits A's stop words AND adds its own. Agent C gets A+B+C. Stop words grow monotonically — they are never cleaned up after an agent finishes execution.

  2. Concurrent execution (async tasks): Multiple agents mutate self.llm.stop simultaneously on the same object — this is a race condition. The set() deduplication doesn't protect against interleaved reads and writes.

  3. Behavioral impact: Extra stop words cause the LLM to terminate generation prematurely. If Agent A's stop word is "Observation:" and Agent B should not stop on that token, Agent B will nonetheless stop generating when it encounters "Observation:" because A's stop word leaked into the shared LLM config.

Reproduction

from crewai import Agent, Crew, Task, LLM

# One shared LLM instance
llm = LLM(model="gpt-4o")

agent_a = Agent(role="Researcher", llm=llm, ...)
agent_b = Agent(role="Writer", llm=llm, ...)

crew = Crew(agents=[agent_a, agent_b], tasks=[...])
crew.kickoff()

# After kickoff, llm.stop contains stop words from BOTH agents,
# and this persists into future crew.kickoff() calls

Suggested Fix

Instead of mutating the shared LLM's stop words, the executor should use a local copy of the stop configuration:

# Option A: Copy the stop words locally, don't mutate the shared LLM
if self.llm:
    existing_stop = getattr(self.llm, "stop", []) or []
    self._effective_stop = list(set(existing_stop + self.stop))
    # Use self._effective_stop when making LLM calls instead of self.llm.stop

# Option B: Reset stop words after execution in a finally block

Alternatively, consider making LLM stop words immutable at the instance level and passing per-request stop words through the API call parameters.

Impact

  • Affects all crews with 2+ agents sharing an LLM instance (the common/default case)
  • Silent behavioral degradation — no error raised, just subtly wrong LLM outputs
  • Accumulates across multiple crew.kickoff() calls on the same crew
  • Race condition in async execution mode

extent analysis

Fix Plan

To fix the issue of accumulating stop words in the shared LLM instance, we will implement a local copy of the stop configuration in the CrewAgentExecutor class. Here are the steps:

  • Create a new instance variable _effective_stop to store the local copy of stop words.
  • In the __init__ method, copy the stop words from the shared LLM instance and combine them with the agent's stop words.
  • Use the _effective_stop variable when making LLM calls instead of self.llm.stop.

Example code:

class CrewAgentExecutor:
    def __init__(self, llm, stop):
        self.llm = llm
        self.stop = stop
        if self.llm:
            existing_stop = getattr(self.llm, "stop", []) or []
            self._effective_stop = list(set(existing_stop + self.stop))

    # Use self._effective_stop when making LLM calls
    def make_llm_call(self):
        # Example LLM call using self._effective_stop
        self.llm.generate(text="Hello", stop=self._effective_stop)

Verification

To verify that the fix worked, you can:

  • Create a crew with multiple agents sharing the same LLM instance.
  • Call crew.kickoff() and check that the LLM output is correct for each agent.
  • Verify that the stop words are not accumulating across agents by checking the _effective_stop variable for each agent.

Extra Tips

  • Consider making the LLM stop words immutable at the instance level and passing per-request stop words through the API call parameters to prevent similar issues in the future.
  • When using async execution mode, ensure that the _effective_stop variable is thread-safe to prevent race conditions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING