autogen - 💡(How to fix) Fix Feature proposal: Backpressure contract declarations for multi-agent coordination [3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
microsoft/autogen#7321Fetched 2026-04-08 00:39:54
View on GitHub
Comments
3
Participants
4
Timeline
3
Reactions
0
Author
Timeline (top)
commented ×3

Fix Action

Fix / Workaround

The current workaround is to implement circuit breakers and backpressure logic inside the calling agent — but this means every agent that calls Agent B must independently discover and encode Agent B's capacity limits. When Agent B's capacity changes, every caller needs to update. The coupling is implicit and fragile.

Code Example

# Agent-level capacity declaration
agent = AssistantAgent(
    name="data_processor",
    capacity=AgentCapacity(
        max_concurrent=3,           # max simultaneous tasks
        backoff_strategy="exponential",  # hint to callers: how to back off
        backoff_initial_ms=1000,
        backoff_ceiling_ms=30000,
        max_caller_retries=5        # callers should give up after N attempts
    )
)
RAW_BUFFERClick to expand / collapse

Problem

In multi-agent AutoGen setups, agents that coordinate through message passing or tool calls have no way to express their capacity constraints as part of their definition. This creates a class of cascading failure that's hard to debug: Agent A retries when Agent B is saturated, each retry consumes Agent B capacity and generates more Agent A load, and the cascade amplifies until something times out or hits a hard resource limit.

The current workaround is to implement circuit breakers and backpressure logic inside the calling agent — but this means every agent that calls Agent B must independently discover and encode Agent B's capacity limits. When Agent B's capacity changes, every caller needs to update. The coupling is implicit and fragile.

Proposal

Add an optional capacity declaration to agent definitions, specifiable at the agent or team level:

# Agent-level capacity declaration
agent = AssistantAgent(
    name="data_processor",
    capacity=AgentCapacity(
        max_concurrent=3,           # max simultaneous tasks
        backoff_strategy="exponential",  # hint to callers: how to back off
        backoff_initial_ms=1000,
        backoff_ceiling_ms=30000,
        max_caller_retries=5        # callers should give up after N attempts
    )
)

The capacity declaration serves two purposes:

  1. Introspection: Callers can query it via agent.capacity before making requests, and adapt retry behavior without hard-coding assumptions
  2. Documentation: The declaration is visible in team configs, making capacity constraints auditable before deployment

Why the caller shouldn't own this

The backpressure spec belongs with the provider, not the consumer. If Agent B defines its capacity, Agent A doesn't need to know anything specific about Agent B — it just reads the contract and follows it. This is the same reasoning behind HTTP 429 (rate limiting) being a server-side responsibility: the server knows its limits, the client shouldn't have to guess.

For teams where multiple agents call the same subordinate agent, a shared contract avoids N implementations of the same retry logic.

Scope question

Happy to hear if this belongs in AgentChat's Team interface instead of (or in addition to) the individual agent level. Team-level capacity might make more sense for GroupChat patterns where load is distributed across team members.

Related: the discussion in #7265 about practical reliability patterns surfaces this as a recurring pain point in production setups.

extent analysis

Fix Plan

To address the issue of cascading failures due to capacity constraints in multi-agent AutoGen setups, we will implement an optional capacity declaration in agent definitions. This will allow agents to express their capacity constraints and provide a way for callers to adapt their retry behavior.

Step-by-Step Solution

  1. Add AgentCapacity class: Define a class to represent the capacity constraints of an agent.
class AgentCapacity:
    def __init__(self, max_concurrent, backoff_strategy, backoff_initial_ms, backoff_ceiling_ms, max_caller_retries):
        self.max_concurrent = max_concurrent
        self.backoff_strategy = backoff_strategy
        self.backoff_initial_ms = backoff_initial_ms
        self.backoff_ceiling_ms = backoff_ceiling_ms
        self.max_caller_retries = max_caller_retries
  1. Add capacity attribute to AssistantAgent: Modify the AssistantAgent class to include an optional capacity attribute.
class AssistantAgent:
    def __init__(self, name, capacity=None):
        self.name = name
        self.capacity = capacity
  1. Implement capacity introspection: Allow callers to query the capacity of an agent before making requests.
def get_agent_capacity(agent):
    return agent.capacity
  1. Update caller retry behavior: Modify the retry logic of callers to adapt to the capacity constraints of the called agent.
def retry_with_backoff(agent, max_retries):
    capacity = get_agent_capacity(agent)
    if capacity:
        backoff_strategy = capacity.backoff_strategy
        backoff_initial_ms = capacity.backoff_initial_ms
        backoff_ceiling_ms = capacity.backoff_ceiling_ms
        max_caller_retries = capacity.max_caller_retries
        # Implement retry logic with backoff strategy
    else:
        # Default retry behavior
        pass

Verification

To verify that the fix worked, test the following scenarios:

  • An agent with a defined capacity constraint is called by multiple agents.
  • The caller agents adapt their retry behavior according to the capacity constraints of the called agent.
  • The called agent's capacity constraints are updated, and the caller agents update their retry behavior accordingly.

Extra Tips

  • Consider adding team-level capacity constraints to avoid N implementations of the same retry logic.
  • Use the AgentCapacity class to document capacity constraints in team configurations, making them auditable before deployment.
  • Review the discussion in #7265 about practical reliability patterns to ensure that this solution addresses the recurring pain point in production setups.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING