crewai - ✅(Solved) Fix [FEATURE] Implement Process.consensual with a pluggable ConsensusEngine [1 pull requests, 1 participants]

crewai2026-05-04 20:48:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

crewAIInc/crewAI#5708•Fetched 2026-05-05 05:53:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

gkotsia

Participants

gkotsia

Timeline (top)

cross-referenced ×1labeled ×1

Fix Action

Fix / Workaround

Implement Process.consensual so that tasks without an explicit agent are dispatched by polling agents for a ranked preference and aggregating ballots, instead of by a manager-LLM call as in Process.hierarchical.

Quorum-failure semantics. If fewer than 50% of agents produce a parseable ballot, the proposal raises RuntimeError — refusing to pick a handler from a tiny minority. Alternatives: dispatch to the first agent in the list, or to the agent with the most votes among the parseable ballots. Hard fail or silent fallback?

PR fix notes

PR #5691: feat(crew): add consensual process with pluggable consensus engine

Repository: crewAIInc/crewAI
Author: gkotsia
State: closed | merged: False
Link: https://github.com/crewAIInc/crewAI/pull/5691

Description (problem / solution / changelog)

Title: feat(crew): add consensual process with pluggable consensus engine

Labels: llm-generated (required — this PR was authored with AI assistance)

Why merge this

Process.consensual has been a TODO in process.py since the original three-process design. This PR ships it, with three properties that matter to maintainers and users:

Removes a single point of failure. The manager-LLM is the only path in CrewAI today for selecting a handler dynamically; if it picks badly, the whole crew degrades. A vote across the agents themselves is auditable (every ranking is logged), resists single-agent error (a majority outvotes one bad ranking), and the aggregation step is deterministic for a given set of inputs — useful for debugging and replay even though the inputs themselves come from stochastic LLM calls.
Opens a plugin ecosystem with zero new CrewAI dependencies. A ConsensusEngine Protocol + entry-point discovery lets third-party libraries plug in via pip install. The reference engine — Snowveil, a probabilistic Borda-CHB protocol — is published on PyPI today; the same pattern accommodates future plugins (Ranked Pairs, weighted voting, capability-based scoring, etc.). CrewAI itself ships only MajorityVoteConsensus — no new runtime imports, no version constraints to maintain.
Removes the manager-LLM dependency. Process.hierarchical requires configuring a separate manager_llm (typically a stronger, costlier model) to dispatch each unowned task. Process.consensual polls the existing agents in parallel instead — no extra model to configure, audit, or pay separately for. Trade-off: more total LLM calls per task (N agent rankings vs 1 manager call), but on the agents' existing model configs and in parallel — net spend depends on which side has the more expensive model.

Backward-compatible. Existing crews are untouched. The new process is opt-in (process=Process.consensual), the new field is opt-in (consensus=... defaults to None), and unmodified Protocol clients continue to work.

What the user sees

from crewai import Crew, Process

# Default — works out of the box
crew = Crew(agents=..., tasks=..., process=Process.consensual)

# Or plug in a richer engine via string shorthand (after `pip install snowveil`)
crew = Crew(agents=..., tasks=..., process=Process.consensual, consensus="snowveil")

# Or pass an instance for custom config
from snowveil.integrations.crewai import SnowveilConsensus
crew = Crew(..., consensus=SnowveilConsensus(config=...))

Snowveil is the reference third-party engine — a probabilistic ranked-preference protocol from arxiv:2512.18444 (Kotsialou). Already published on PyPI (pip install snowveil); works against this PR today.

Summary of changes

Process.consensual — implements the third process mode. Tasks without an explicit agent are dispatched by polling every other agent for a ranked preference and aggregating ballots.
ConsensusEngine Protocol + MajorityVoteConsensus default — @runtime_checkable, typed, pluggable. CrewAI ships only the trivial baseline; richer engines live in third-party packages.
Plugin discovery (discover_engines()) — supports both Python entry points (crewai.consensus_engines group) and a small built-in fallback registry. Crew(consensus="snowveil") resolves automatically when Snowveil is installed; broken plugins log a warning and skip rather than crash an unrelated crew.
Prompt-injection hardening — task descriptions are wrapped in <task> tags, length-capped at 2000 chars, and explicitly marked as untrusted input. Centralised in build_handler_ranking_prompt() so all consensus engines share the same hardening.
Self-promotion bias removal — voters rank only other agents and pin themselves last to keep ballots complete without skewing the aggregator.

What this PR is not about

Cross-host or cross-organisational crews. CrewAI today is a single-process framework — every Crew runs in one Python address space. Process.consensual operates within one crew's agents and uses Snowveil's in-process mode (InMemoryTransport); it does not touch the network. A future integration could use Snowveil's distributed WebSocketTransport to enable federated decision-making across organisations or hosts, but that's a separate design conversation (likely a FederatedCrew primitive, not a Process.consensual configuration) and is intentionally out of scope here.

Files

File	Status	Lines	Notes
`lib/crewai/src/crewai/consensus.py`	new	289	Protocol, default engine, plugin discovery, parser, prompt builder. Module docstring includes a Snowveil wiring snippet so `help(crewai.consensus)` surfaces the integration path.
`lib/crewai/src/crewai/crew.py`	+166 / −1	—	New `consensus` field + validator (instance or string name), `_run_consensual_process`, `_collect_handler_rankings` (parallel via `ThreadPoolExecutor`), `_agent_by_role`, `_require_unique_agent_roles`.
`lib/crewai/src/crewai/process.py`	+1 / −1	—	Enables `Process.consensual` enum value.
`lib/crewai/tests/test_consensus.py`	new	728	45 tests covering the consensus module, plugin discovery, and `Process.consensual` end-to-end.

Total: ~1,184 insertions, 2 deletions across 4 files (uv.lock excluded).

Design notes

consensus: Any field type. Pydantic can't generate a schema for a Protocol, so the field is annotated Any and validated structurally at runtime via isinstance against the @runtime_checkable Protocol. Strings are resolved by name first.
Two-pass plugin discovery. discover_engines() iterates importlib.metadata.entry_points(group="crewai.consensus_engines") first (the future path for any plugin), then merges in _KNOWN_ENGINE_IMPORT_PATHS (a small dict — currently just snowveil, which was published before adopting the entry-point convention). Entry points always win. Failed loads log a WARNING and are skipped — a broken third-party engine never crashes an unrelated crew. Cached via functools.cache.
Quorum. _MIN_RANKING_RATIO = 0.5. If fewer than half of agents return a parseable ballot, _collect_handler_rankings raises rather than pick a handler from a tiny minority.
Parallel ranking. Agents are polled concurrently via ThreadPoolExecutor since agent.execute_task is synchronous.
Deliberate duplication. parse_role_ranking is algorithmically equivalent to a parser in Snowveil; CrewAI cannot depend on Snowveil, so the duplication is intentional.

Test plan

45 tests in lib/crewai/tests/test_consensus.py, all passing locally (~1s, no network):

Open questions / follow-ups

Should consensus become a typed field once Pydantic gains better Protocol support, or stay Any with the runtime validator?
Should the default be MajorityVoteConsensus, or should Process.consensual require an explicit engine to avoid surprising users with naive plurality?
Mintlify docs under docs/en/concepts/processes.mdx (and translations in ar, ko, pt-BR) are not in this PR — tracked separately.
A runnable end-to-end sample using Snowveil will follow as a separate submission to crewAIInc/crewAI-examples, matching upstream's separation of core and samples.

Changed files

lib/crewai/src/crewai/consensus.py (added, +289/-0)
lib/crewai/src/crewai/crew.py (modified, +166/-1)
lib/crewai/src/crewai/process.py (modified, +1/-1)
lib/crewai/tests/test_consensus.py (added, +728/-0)
uv.lock (modified, +1/-1)

Code Example

crew = Crew(agents=..., tasks=..., process=Process.consensual)
# or
crew = Crew(..., consensus="snowveil")  # a third-party engine, after `pip install snowveil`

RAW_BUFFERClick to expand / collapse

Feature Area

Core functionality

Is your feature request related to a an existing bug? Please link it here.

N/A — this is an unimplemented TODO in lib/crewai/src/crewai/process.py (the consensual enum value has been commented out since the original three-process design).

Describe the solution you'd like

The proposed shape is small and additive:

A ConsensusEngine Protocol (@runtime_checkable) with one method: aggregate(candidates: Sequence[str], rankings: Mapping[str, Sequence[str]]) -> str
A built-in default, MajorityVoteConsensus — most-common top-1 wins, ties broken by candidate order. Trivial implementation; the point is to give Process.consensual something working out of the box without committing CrewAI to a specific voting theory.
A consensus= field on Crew that accepts either an instance or a string name resolved via plugin discovery (importlib.metadata.entry_points), so third-party engines can plug in via pip install with zero CrewAI-side coupling.

User-facing surface:

crew = Crew(agents=..., tasks=..., process=Process.consensual)
# or
crew = Crew(..., consensus="snowveil")  # a third-party engine, after `pip install snowveil`

CrewAI itself takes no new runtime dependency; richer engines live in third-party packages.

Describe alternatives you've considered

Hard-code Process.consensual to majority-vote without a Protocol. Simplest but locks CrewAI into one voting theory and prevents richer engines from plugging in.
Hard-code a stronger algorithm (Borda, Ranked Pairs, etc.). CrewAI takes on an opinionated voting model; same lock-in problem; surprises users who expected plurality.
Subclass Crew per consensus algorithm. Forces every third-party engine to ship its own Crew subclass; users have to import the right Crew rather than configure a field.
Leave Process.consensual unimplemented. Status quo; the TODO persists, and dynamic-handler-selection remains a single-point-of-failure on the manager-LLM.

Additional context

Why preference-order voting, not just plurality. Plurality (each agent votes for their top choice; most votes wins) discards information — voters had to consider every option but only their top pick is counted. With three or more options this routinely picks divisive winners. Concrete example: 5 voters, profile [A>C>B, A>C>B, B>C>A, B>C>A, C>A>B]. Plurality picks A or B by coin flip (each has 2 first-place votes); but every voter ranks C ≥ 2nd, and Borda totals are A=5, B=4, C=6 — C is the broadly-acceptable compromise. Ranked methods also have stronger coalition-resistance properties: Snowveil's CHB rule (reference implementation) has a provable Ω(n) coalition lower bound (a √n-sized coordinated minority cannot reliably flip the outcome). The MajorityVoteConsensus default ships plurality (it's the obvious baseline); the Protocol is what lets users opt up to richer methods when their decisions matter enough.

Strategic positioning. This proposal is also a foundation. The ConsensusEngine Protocol — a synchronous "ballots in, winner out" contract — is the right primitive for a future FederatedCrew-style abstraction (cross-host, cross-organisational decision-making). Richer engines like Snowveil already ship a distributed transport that's ahead of where CrewAI core is today. Accepting this PR doesn't enable cross-org crews on its own, but it puts the right interface in place: when CrewAI eventually grows a federated-crew abstraction, the consensus mechanism it needs is the one this PR adds.

Out of scope for this proposal. Cross-host or cross-organisational crews themselves. CrewAI is single-process today; Process.consensual operates within one crew's agents and uses Snowveil's in-process mode. The federated-crew conversation is intentionally separate — different primitives, different threat model, different review.

Reference third-party engine. Snowveil — a probabilistic Borda-CHB protocol from arxiv:2512.18444 — is published on PyPI today and works against the proposed Protocol. It's the proof point that the plugin pattern is real, not speculative.

Trade-off worth flagging. Process.consensual makes more LLM calls per task than Process.hierarchical (N agent rankings vs 1 manager call), but on the agents' existing model configs and in parallel. Net spend depends on which side runs the more expensive model. The win is operational simplification, not a guaranteed cost cut.

Design questions where maintainer input would shape the implementation:

Default behaviour when consensus is unset. Should Process.consensual instantiate MajorityVoteConsensus automatically, or raise asking the user to choose? Trade-off: zero-config out of the box vs. not surprising users with naive plurality.
Pydantic field typing. Pydantic can't schema-generate for a structural Protocol, so the proposal types Crew.consensus: Any and validates with isinstance against a @runtime_checkable Protocol. Alternatives I considered: a discriminated union of concrete classes (loses extensibility) or a concrete base class with subclass-only registration (loses the Protocol's structural typing). Are you OK with Any + runtime check, or do you have a preferred pattern from elsewhere in the codebase?
Quorum-failure semantics. If fewer than 50% of agents produce a parseable ballot, the proposal raises RuntimeError — refusing to pick a handler from a tiny minority. Alternatives: dispatch to the first agent in the list, or to the agent with the most votes among the parseable ballots. Hard fail or silent fallback?
discover_engines() visibility. Making it part of the public API (importable from crewai.consensus) commits to maintaining the function signature across versions. Keeping it internal avoids that commitment but means third-party tooling (CLIs, dashboards listing available engines) has to reach into private code. Which side of the trade-off?
Prompt-hardening convention. The proposal wraps task descriptions in <task> tags, length-caps at 2000 chars, and marks them as untrusted. Is there an existing convention elsewhere in CrewAI's agent prompts I should follow instead, or is this a fresh problem?

Working prototype available. I have a complete implementation at PR 5691 with 45 passing tests, ruff/format clean, and the wiring against the Snowveil reference engine verified end-to-end against the published PyPI package. Happy to land or rework based on this discussion.

Willingness to Contribute

Yes, I'd be happy to submit a pull request

extent analysis

TL;DR

Implement the ConsensusEngine Protocol and integrate it with the Process.consensual feature to enable dynamic handler selection based on agent preferences.

Guidance

Review the proposed implementation in PR 5691 and verify its correctness, focusing on the ConsensusEngine Protocol and its interaction with the Crew class.
Discuss and decide on the default behavior when consensus is unset, choosing between instantiating MajorityVoteConsensus automatically or raising an error to prompt user choice.
Evaluate the trade-offs of using Any + runtime check for Pydantic field typing versus alternative approaches, such as discriminated unions or concrete base classes.
Determine the desired quorum-failure semantics, selecting from options like raising a RuntimeError, dispatching to the first agent, or using the agent with the most votes among parseable ballots.

Example

from crewai.consensus import ConsensusEngine

class MajorityVoteConsensus(ConsensusEngine):
    def aggregate(self, candidates, rankings):
        # Implement majority vote logic
        pass

crew = Crew(agents=..., tasks=..., process=Process.consensual, consensus=MajorityVoteConsensus())

Notes

The implementation should ensure that the ConsensusEngine Protocol is properly integrated with the Crew class and that the default behavior for unset consensus is well-defined. Additionally, the quorum-failure semantics should be carefully considered to ensure robustness and fairness in the consensus process.

Recommendation

Apply the proposed implementation in PR 5691, addressing the discussed design questions and trade-offs to ensure a robust and extensible consensus mechanism.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.