crewai - ✅(Solved) Fix [FEATURE] Implement Process.consensual with a pluggable ConsensusEngine [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#5708Fetched 2026-05-05 05:53:14
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Fix Action

Fix / Workaround

Implement Process.consensual so that tasks without an explicit agent are dispatched by polling agents for a ranked preference and aggregating ballots, instead of by a manager-LLM call as in Process.hierarchical.

  1. Quorum-failure semantics. If fewer than 50% of agents produce a parseable ballot, the proposal raises RuntimeError — refusing to pick a handler from a tiny minority. Alternatives: dispatch to the first agent in the list, or to the agent with the most votes among the parseable ballots. Hard fail or silent fallback?

PR fix notes

PR #5691: feat(crew): add consensual process with pluggable consensus engine

Description (problem / solution / changelog)

Title: feat(crew): add consensual process with pluggable consensus engine

Labels: llm-generated (required — this PR was authored with AI assistance)

Why merge this

Process.consensual has been a TODO in process.py since the original three-process design. This PR ships it, with three properties that matter to maintainers and users:

  1. Removes a single point of failure. The manager-LLM is the only path in CrewAI today for selecting a handler dynamically; if it picks badly, the whole crew degrades. A vote across the agents themselves is auditable (every ranking is logged), resists single-agent error (a majority outvotes one bad ranking), and the aggregation step is deterministic for a given set of inputs — useful for debugging and replay even though the inputs themselves come from stochastic LLM calls.
  2. Opens a plugin ecosystem with zero new CrewAI dependencies. A ConsensusEngine Protocol + entry-point discovery lets third-party libraries plug in via pip install. The reference engine — Snowveil, a probabilistic Borda-CHB protocol — is published on PyPI today; the same pattern accommodates future plugins (Ranked Pairs, weighted voting, capability-based scoring, etc.). CrewAI itself ships only MajorityVoteConsensus — no new runtime imports, no version constraints to maintain.
  3. Removes the manager-LLM dependency. Process.hierarchical requires configuring a separate manager_llm (typically a stronger, costlier model) to dispatch each unowned task. Process.consensual polls the existing agents in parallel instead — no extra model to configure, audit, or pay separately for. Trade-off: more total LLM calls per task (N agent rankings vs 1 manager call), but on the agents' existing model configs and in parallel — net spend depends on which side has the more expensive model.

Backward-compatible. Existing crews are untouched. The new process is opt-in (process=Process.consensual), the new field is opt-in (consensus=... defaults to None), and unmodified Protocol clients continue to work.

What the user sees

from crewai import Crew, Process

# Default — works out of the box
crew = Crew(agents=..., tasks=..., process=Process.consensual)

# Or plug in a richer engine via string shorthand (after `pip install snowveil`)
crew = Crew(agents=..., tasks=..., process=Process.consensual, consensus="snowveil")

# Or pass an instance for custom config
from snowveil.integrations.crewai import SnowveilConsensus
crew = Crew(..., consensus=SnowveilConsensus(config=...))

Snowveil is the reference third-party engine — a probabilistic ranked-preference protocol from arxiv:2512.18444 (Kotsialou). Already published on PyPI (pip install snowveil); works against this PR today.

Summary of changes

  • Process.consensual — implements the third process mode. Tasks without an explicit agent are dispatched by polling every other agent for a ranked preference and aggregating ballots.
  • ConsensusEngine Protocol + MajorityVoteConsensus default@runtime_checkable, typed, pluggable. CrewAI ships only the trivial baseline; richer engines live in third-party packages.
  • Plugin discovery (discover_engines()) — supports both Python entry points (crewai.consensus_engines group) and a small built-in fallback registry. Crew(consensus="snowveil") resolves automatically when Snowveil is installed; broken plugins log a warning and skip rather than crash an unrelated crew.
  • Prompt-injection hardening — task descriptions are wrapped in <task> tags, length-capped at 2000 chars, and explicitly marked as untrusted input. Centralised in build_handler_ranking_prompt() so all consensus engines share the same hardening.
  • Self-promotion bias removal — voters rank only other agents and pin themselves last to keep ballots complete without skewing the aggregator.

What this PR is not about

Cross-host or cross-organisational crews. CrewAI today is a single-process framework — every Crew runs in one Python address space. Process.consensual operates within one crew's agents and uses Snowveil's in-process mode (InMemoryTransport); it does not touch the network. A future integration could use Snowveil's distributed WebSocketTransport to enable federated decision-making across organisations or hosts, but that's a separate design conversation (likely a FederatedCrew primitive, not a Process.consensual configuration) and is intentionally out of scope here.

Files

FileStatusLinesNotes
lib/crewai/src/crewai/consensus.pynew289Protocol, default engine, plugin discovery, parser, prompt builder. Module docstring includes a Snowveil wiring snippet so help(crewai.consensus) surfaces the integration path.
lib/crewai/src/crewai/crew.py+166 / −1New consensus field + validator (instance or string name), _run_consensual_process, _collect_handler_rankings (parallel via ThreadPoolExecutor), _agent_by_role, _require_unique_agent_roles.
lib/crewai/src/crewai/process.py+1 / −1Enables Process.consensual enum value.
lib/crewai/tests/test_consensus.pynew72845 tests covering the consensus module, plugin discovery, and Process.consensual end-to-end.

Total: ~1,184 insertions, 2 deletions across 4 files (uv.lock excluded).

Design notes

  • consensus: Any field type. Pydantic can't generate a schema for a Protocol, so the field is annotated Any and validated structurally at runtime via isinstance against the @runtime_checkable Protocol. Strings are resolved by name first.
  • Two-pass plugin discovery. discover_engines() iterates importlib.metadata.entry_points(group="crewai.consensus_engines") first (the future path for any plugin), then merges in _KNOWN_ENGINE_IMPORT_PATHS (a small dict — currently just snowveil, which was published before adopting the entry-point convention). Entry points always win. Failed loads log a WARNING and are skipped — a broken third-party engine never crashes an unrelated crew. Cached via functools.cache.
  • Quorum. _MIN_RANKING_RATIO = 0.5. If fewer than half of agents return a parseable ballot, _collect_handler_rankings raises rather than pick a handler from a tiny minority.
  • Parallel ranking. Agents are polled concurrently via ThreadPoolExecutor since agent.execute_task is synchronous.
  • Deliberate duplication. parse_role_ranking is algorithmically equivalent to a parser in Snowveil; CrewAI cannot depend on Snowveil, so the duplication is intentional.

Test plan

45 tests in lib/crewai/tests/test_consensus.py, all passing locally (~1s, no network):

  • MajorityVoteConsensus — single voter, majority winner, candidate-order tie-break (and reversed), empty rankings, empty ballot, unknown candidate, runtime_checkable Protocol matching.
  • _validate_ballots — accepts complete ballot, rejects empty rankings / per-voter ballot / unknown candidate.
  • parse_role_ranking — strict JSON, JSON in surrounding text, first-appearance fallback, partial JSON falls through, unparseable raises, partial text match raises.
  • build_handler_ranking_prompt — task and roles included, marked UNTRUSTED, length-capped, empty description handled.
  • Consensus field validator — default is None; accepts engine instance; accepts string name; rejects non-engine, unknown name (with installed-engines list), and empty string (dedicated error); instance path does not call discover_engines.
  • discover_engines() happy paths — built-in majority always present; entry points discovered; fallback registry resolves when module importable; cache returns same dict until cleared; two named plugins coexist; entry points override fallback for the same name.
  • discover_engines() defensive paths — fallback skipped silently when not installed; fallback raising non-ImportError logs warning; fallback missing attribute logs warning; entry-point load raising logs warning; entry point returning a non-class is rejected with a warning; duplicate entry-point names log a collision warning.
  • Process.consensual — unanimous winner assigned, explicit task.agent not overridden, duplicate roles raise, low quorum raises, custom ConsensusEngine honoured over default.
  • uv run ruff check lib/ clean.
  • uv run ruff format --check lib/ clean.
  • uv run mypy lib/crewai/ — not verified locally; uv sync fails on macOS x86_64 because lancedb (pinned >=0.29.2,<0.30.1) ships no x86_64 macOS wheel. Direct mypy on consensus.py reports no issues; relying on CI mypy across 3.10–3.13 for the rest.
  • pytest lib/crewai/tests/test_consensus.py — 45 passed locally.

Open questions / follow-ups

  • Should consensus become a typed field once Pydantic gains better Protocol support, or stay Any with the runtime validator?
  • Should the default be MajorityVoteConsensus, or should Process.consensual require an explicit engine to avoid surprising users with naive plurality?
  • Mintlify docs under docs/en/concepts/processes.mdx (and translations in ar, ko, pt-BR) are not in this PR — tracked separately.
  • A runnable end-to-end sample using Snowveil will follow as a separate submission to crewAIInc/crewAI-examples, matching upstream's separation of core and samples.

Changed files

  • lib/crewai/src/crewai/consensus.py (added, +289/-0)
  • lib/crewai/src/crewai/crew.py (modified, +166/-1)
  • lib/crewai/src/crewai/process.py (modified, +1/-1)
  • lib/crewai/tests/test_consensus.py (added, +728/-0)
  • uv.lock (modified, +1/-1)

Code Example

crew = Crew(agents=..., tasks=..., process=Process.consensual)
# or
crew = Crew(..., consensus="snowveil")  # a third-party engine, after `pip install snowveil`
RAW_BUFFERClick to expand / collapse

Feature Area

Core functionality

Is your feature request related to a an existing bug? Please link it here.

N/A — this is an unimplemented TODO in lib/crewai/src/crewai/process.py (the consensual enum value has been commented out since the original three-process design).

Describe the solution you'd like

Implement Process.consensual so that tasks without an explicit agent are dispatched by polling agents for a ranked preference and aggregating ballots, instead of by a manager-LLM call as in Process.hierarchical.

The proposed shape is small and additive:

  1. A ConsensusEngine Protocol (@runtime_checkable) with one method: aggregate(candidates: Sequence[str], rankings: Mapping[str, Sequence[str]]) -> str
  2. A built-in default, MajorityVoteConsensus — most-common top-1 wins, ties broken by candidate order. Trivial implementation; the point is to give Process.consensual something working out of the box without committing CrewAI to a specific voting theory.
  3. A consensus= field on Crew that accepts either an instance or a string name resolved via plugin discovery (importlib.metadata.entry_points), so third-party engines can plug in via pip install with zero CrewAI-side coupling.

User-facing surface:

crew = Crew(agents=..., tasks=..., process=Process.consensual)
# or
crew = Crew(..., consensus="snowveil")  # a third-party engine, after `pip install snowveil`

CrewAI itself takes no new runtime dependency; richer engines live in third-party packages.

Describe alternatives you've considered

  • Hard-code Process.consensual to majority-vote without a Protocol. Simplest but locks CrewAI into one voting theory and prevents richer engines from plugging in.
  • Hard-code a stronger algorithm (Borda, Ranked Pairs, etc.). CrewAI takes on an opinionated voting model; same lock-in problem; surprises users who expected plurality.
  • Subclass Crew per consensus algorithm. Forces every third-party engine to ship its own Crew subclass; users have to import the right Crew rather than configure a field.
  • Leave Process.consensual unimplemented. Status quo; the TODO persists, and dynamic-handler-selection remains a single-point-of-failure on the manager-LLM.

Additional context

Why preference-order voting, not just plurality. Plurality (each agent votes for their top choice; most votes wins) discards information — voters had to consider every option but only their top pick is counted. With three or more options this routinely picks divisive winners. Concrete example: 5 voters, profile [A>C>B, A>C>B, B>C>A, B>C>A, C>A>B]. Plurality picks A or B by coin flip (each has 2 first-place votes); but every voter ranks C ≥ 2nd, and Borda totals are A=5, B=4, C=6 — C is the broadly-acceptable compromise. Ranked methods also have stronger coalition-resistance properties: Snowveil's CHB rule (reference implementation) has a provable Ω(n) coalition lower bound (a √n-sized coordinated minority cannot reliably flip the outcome). The MajorityVoteConsensus default ships plurality (it's the obvious baseline); the Protocol is what lets users opt up to richer methods when their decisions matter enough.

Strategic positioning. This proposal is also a foundation. The ConsensusEngine Protocol — a synchronous "ballots in, winner out" contract — is the right primitive for a future FederatedCrew-style abstraction (cross-host, cross-organisational decision-making). Richer engines like Snowveil already ship a distributed transport that's ahead of where CrewAI core is today. Accepting this PR doesn't enable cross-org crews on its own, but it puts the right interface in place: when CrewAI eventually grows a federated-crew abstraction, the consensus mechanism it needs is the one this PR adds.

Out of scope for this proposal. Cross-host or cross-organisational crews themselves. CrewAI is single-process today; Process.consensual operates within one crew's agents and uses Snowveil's in-process mode. The federated-crew conversation is intentionally separate — different primitives, different threat model, different review.

Reference third-party engine. Snowveil — a probabilistic Borda-CHB protocol from arxiv:2512.18444 — is published on PyPI today and works against the proposed Protocol. It's the proof point that the plugin pattern is real, not speculative.

Trade-off worth flagging. Process.consensual makes more LLM calls per task than Process.hierarchical (N agent rankings vs 1 manager call), but on the agents' existing model configs and in parallel. Net spend depends on which side runs the more expensive model. The win is operational simplification, not a guaranteed cost cut.

Design questions where maintainer input would shape the implementation:

  1. Default behaviour when consensus is unset. Should Process.consensual instantiate MajorityVoteConsensus automatically, or raise asking the user to choose? Trade-off: zero-config out of the box vs. not surprising users with naive plurality.

  2. Pydantic field typing. Pydantic can't schema-generate for a structural Protocol, so the proposal types Crew.consensus: Any and validates with isinstance against a @runtime_checkable Protocol. Alternatives I considered: a discriminated union of concrete classes (loses extensibility) or a concrete base class with subclass-only registration (loses the Protocol's structural typing). Are you OK with Any + runtime check, or do you have a preferred pattern from elsewhere in the codebase?

  3. Quorum-failure semantics. If fewer than 50% of agents produce a parseable ballot, the proposal raises RuntimeError — refusing to pick a handler from a tiny minority. Alternatives: dispatch to the first agent in the list, or to the agent with the most votes among the parseable ballots. Hard fail or silent fallback?

  4. discover_engines() visibility. Making it part of the public API (importable from crewai.consensus) commits to maintaining the function signature across versions. Keeping it internal avoids that commitment but means third-party tooling (CLIs, dashboards listing available engines) has to reach into private code. Which side of the trade-off?

  5. Prompt-hardening convention. The proposal wraps task descriptions in <task> tags, length-caps at 2000 chars, and marks them as untrusted. Is there an existing convention elsewhere in CrewAI's agent prompts I should follow instead, or is this a fresh problem?

Working prototype available. I have a complete implementation at PR 5691 with 45 passing tests, ruff/format clean, and the wiring against the Snowveil reference engine verified end-to-end against the published PyPI package. Happy to land or rework based on this discussion.

Willingness to Contribute

Yes, I'd be happy to submit a pull request

extent analysis

TL;DR

Implement the ConsensusEngine Protocol and integrate it with the Process.consensual feature to enable dynamic handler selection based on agent preferences.

Guidance

  • Review the proposed implementation in PR 5691 and verify its correctness, focusing on the ConsensusEngine Protocol and its interaction with the Crew class.
  • Discuss and decide on the default behavior when consensus is unset, choosing between instantiating MajorityVoteConsensus automatically or raising an error to prompt user choice.
  • Evaluate the trade-offs of using Any + runtime check for Pydantic field typing versus alternative approaches, such as discriminated unions or concrete base classes.
  • Determine the desired quorum-failure semantics, selecting from options like raising a RuntimeError, dispatching to the first agent, or using the agent with the most votes among parseable ballots.

Example

from crewai.consensus import ConsensusEngine

class MajorityVoteConsensus(ConsensusEngine):
    def aggregate(self, candidates, rankings):
        # Implement majority vote logic
        pass

crew = Crew(agents=..., tasks=..., process=Process.consensual, consensus=MajorityVoteConsensus())

Notes

The implementation should ensure that the ConsensusEngine Protocol is properly integrated with the Crew class and that the default behavior for unset consensus is well-defined. Additionally, the quorum-failure semantics should be carefully considered to ensure robustness and fairness in the consensus process.

Recommendation

Apply the proposed implementation in PR 5691, addressing the discussed design questions and trade-offs to ensure a robust and extensible consensus mechanism.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

crewai - ✅(Solved) Fix [FEATURE] Implement Process.consensual with a pluggable ConsensusEngine [1 pull requests, 1 participants]