crewai - ✅(Solved) Fix ChatWithCrewFlow.__init__ makes blocking LLM call at module import, crashes containers on any LLM hiccup [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#5510Fetched 2026-04-17 08:30:31
View on GitHub
Comments
2
Participants
3
Timeline
11
Reactions
0
Author
Timeline (top)
referenced ×4commented ×2cross-referenced ×2assigned ×1

ChatWithCrewFlow.__init__ in ag_ui_crewai.crews triggers synchronous blocking LLM calls at module import time via crewai.cli.crew_chat.generate_crew_chat_inputs, which in turn calls:

For users deploying CrewAI behind a FastAPI server via ag_ui_crewai.endpoint.add_crewai_crew_fastapi_endpoint (the recommended integration for AG-UI / CopilotKit), these LLM calls fire during module import — BEFORE uvicorn binds to its HTTP port.

Error Message

File "/app/agent_server.py", line 27, in <module> add_crewai_crew_fastapi_endpoint(app, LatestAiDevelopment(), "/") File ".../ag_ui_crewai/endpoint.py", line 250, in add_crewai_crew_fastapi_endpoint add_crewai_flow_fastapi_endpoint(app, ChatWithCrewFlow(crew=crew), path) File ".../ag_ui_crewai/crews.py", line 56, in init self.crew_chat_inputs = crew_chat_generate_crew_chat_inputs(...) File ".../crewai/cli/crew_chat.py", line 387, in generate_crew_chat_inputs description = generate_input_description_with_ai(input_name, crew, chat_llm) File ".../crewai/cli/crew_chat.py", line 481, in generate_input_description_with_ai response = chat_llm.call(messages=[...]) File ".../crewai/llm.py", line 956, in call return self._handle_non_streaming_response(...) ... APIError: <connection failure>

Root Cause

In orchestrated environments (Railway, Kubernetes, AWS ECS, Fly.io) the platform's readiness/health check fails because no process ever binds the port. The platform then marks the deploy failed and rolls back to the previous image, making the service effectively unresponsive to LLM-layer instability.

Fix Action

Fix / Workaround

Our workaround

We're shipping a defensive monkey-patch in our showcase to patch both functions to return static strings before ag_ui_crewai is imported. PR: https://github.com/CopilotKit/CopilotKit/pull/3974

PR fix notes

PR #3974: fix(crewai-crews): harden against LLM blocking calls at import time

Description (problem / solution / changelog)

Summary

CrewAI's ChatWithCrewFlow.__init__ (invoked from ag_ui_crewai.endpoint.add_crewai_crew_fastapi_endpoint at module import in ag-ui-crewai <= 0.1.5) makes synchronous blocking LLM calls via crewai.cli.crew_chat.generate_input_description_with_ai and generate_crew_description_with_ai. ANY LLM hiccup — aimock regression, OpenAI outage, network blip, DNS failure — crashes the Python process BEFORE uvicorn can bind its port, causing Railway/Kubernetes health checks to fail and deploys to roll back.

This was the direct cause of the crewai-crews Railway crash fixed server-side in #3971. That fix patched the aimock response schema, but the underlying fragility in upstream CrewAI / ag-ui-crewai remained — a future blip would crash us again.

This PR adds a defensive monkey-patch in agent_server.py that replaces both generator functions with static-string returns BEFORE ag_ui_crewai is imported. The AI-generated descriptions are only surfaced in the CrewAI chat UI (which the CopilotKit runtime does not use), so static defaults are functionally equivalent for our showcase.

Upstream issue filed: https://github.com/crewAIInc/crewAI/issues/5510

The long-term fix is deferred construction in ag-ui-crewai, which has landed on ag-ui main but is not yet released. Remove this shim once ag-ui-crewai > 0.1.5 ships.

Why a monkey-patch and not lazy-init

add_crewai_crew_fastapi_endpoint is the entry point and internally constructs ChatWithCrewFlow(crew) synchronously in ag-ui-crewai <= 0.1.5. Deferring that call would require either vendoring the endpoint function or reimplementing it. The monkey-patch is two lines and removes cleanly when the upstream fix ships.

Test plan

Verified locally via Docker build + run with an intentionally broken LLM endpoint (OPENAI_BASE_URL=http://invalid-host/v1):

Unhardened (negative control):

File "/app/agent_server.py", line 27, in <module>
    add_crewai_crew_fastapi_endpoint(app, LatestAiDevelopment(), "/")
  File ".../ag_ui_crewai/endpoint.py", line 250, in add_crewai_crew_fastapi_endpoint
    add_crewai_flow_fastapi_endpoint(app, ChatWithCrewFlow(crew=crew), path)
  File ".../ag_ui_crewai/crews.py", line 56, in __init__
    self.crew_chat_inputs = crew_chat_generate_crew_chat_inputs(...)
  File ".../crewai/cli/crew_chat.py", line 387, in generate_crew_chat_inputs
    description = generate_input_description_with_ai(input_name, crew, chat_llm)
  File ".../crewai/cli/crew_chat.py", line 481, in generate_input_description_with_ai
    response = chat_llm.call(...)
APIError

Container exits with code 1, never binds a port.

Hardened (this PR):

INFO:     Started server process [7]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

curl http://localhost:PORT/api/health -> {"status":"ok","integration":"crewai-crews","agent":"ok","timestamp":"..."} (HTTP 200).

Checklist

  • Docker build succeeds locally
  • Unhardened build crashes on import with broken LLM endpoint (negative control)
  • Hardened build starts cleanly with broken LLM endpoint and responds 200 on /api/health
  • Upstream issue filed on crewAIInc/crewAI

Changed files

  • showcase/packages/crewai-crews/requirements.txt (modified, +1/-1)
  • showcase/packages/crewai-crews/src/agent_server.py (modified, +73/-0)
  • showcase/starters/crewai-crews/agent/requirements.txt (modified, +1/-1)
  • showcase/starters/crewai-crews/agent_server.py (modified, +73/-0)

Code Example

response = chat_llm.call(messages=[{"role": "user", "content": prompt}])

---

response = chat_llm.call(messages=[{"role": "user", "content": prompt}])

---

File "/app/agent_server.py", line 27, in <module>
    add_crewai_crew_fastapi_endpoint(app, LatestAiDevelopment(), "/")
File ".../ag_ui_crewai/endpoint.py", line 250, in add_crewai_crew_fastapi_endpoint
    add_crewai_flow_fastapi_endpoint(app, ChatWithCrewFlow(crew=crew), path)
File ".../ag_ui_crewai/crews.py", line 56, in __init__
    self.crew_chat_inputs = crew_chat_generate_crew_chat_inputs(...)
File ".../crewai/cli/crew_chat.py", line 387, in generate_crew_chat_inputs
    description = generate_input_description_with_ai(input_name, crew, chat_llm)
File ".../crewai/cli/crew_chat.py", line 481, in generate_input_description_with_ai
    response = chat_llm.call(messages=[...])
File ".../crewai/llm.py", line 956, in call
    return self._handle_non_streaming_response(...)
...
APIError: <connection failure>
RAW_BUFFERClick to expand / collapse

Summary

ChatWithCrewFlow.__init__ in ag_ui_crewai.crews triggers synchronous blocking LLM calls at module import time via crewai.cli.crew_chat.generate_crew_chat_inputs, which in turn calls:

For users deploying CrewAI behind a FastAPI server via ag_ui_crewai.endpoint.add_crewai_crew_fastapi_endpoint (the recommended integration for AG-UI / CopilotKit), these LLM calls fire during module import — BEFORE uvicorn binds to its HTTP port.

Failure mode

ANY LLM provider hiccup during container startup causes the Python process to crash before the HTTP server is listening:

  • OpenAI 500 / 503 / rate-limit
  • Network blip, DNS failure, slow cold-start on a mock/proxy server
  • Invalid credentials (even transient)
  • Litellm APIError, Timeout, or APIConnectionError

In orchestrated environments (Railway, Kubernetes, AWS ECS, Fly.io) the platform's readiness/health check fails because no process ever binds the port. The platform then marks the deploy failed and rolls back to the previous image, making the service effectively unresponsive to LLM-layer instability.

We hit this on our Railway-hosted CopilotKit showcase when our LLM mock (aimock) returned a transient schema error. The mock error was recoverable — the issue is that it shouldn't have been able to crash the entire container before the HTTP server was ready.

Actual stack trace we observed

File "/app/agent_server.py", line 27, in <module>
    add_crewai_crew_fastapi_endpoint(app, LatestAiDevelopment(), "/")
File ".../ag_ui_crewai/endpoint.py", line 250, in add_crewai_crew_fastapi_endpoint
    add_crewai_flow_fastapi_endpoint(app, ChatWithCrewFlow(crew=crew), path)
File ".../ag_ui_crewai/crews.py", line 56, in __init__
    self.crew_chat_inputs = crew_chat_generate_crew_chat_inputs(...)
File ".../crewai/cli/crew_chat.py", line 387, in generate_crew_chat_inputs
    description = generate_input_description_with_ai(input_name, crew, chat_llm)
File ".../crewai/cli/crew_chat.py", line 481, in generate_input_description_with_ai
    response = chat_llm.call(messages=[...])
File ".../crewai/llm.py", line 956, in call
    return self._handle_non_streaming_response(...)
...
APIError: <connection failure>

Container exits with code 1, never binds a port, orchestrator's health check fails, deploy rolls back.

Why this is a CrewAI concern, not just an ag-ui-crewai concern

While ChatWithCrewFlow lives in ag-ui-crewai, the two functions that block are part of CrewAI's public crewai.cli.crew_chat module. CrewAI is asking users to consume these helpers at import/init time without any of the standard production-server defenses:

  • No timeout
  • No retry/fallback
  • No try/except with a graceful default
  • No opt-out

Any consumer that instantiates a chat flow with them in a serving context inherits this fragility.

Suggested fixes (any or all)

  1. Lazy init at first request. Have ChatWithCrewFlow.__init__ store the crew and LLM but defer generate_crew_chat_inputs until the first actual chat turn. (A similar fix has already landed on ag-ui-protocol/ag-ui main for add_crewai_crew_fastapi_endpoint — deferring ChatWithCrewFlow construction to first-request. But the underlying functions in CrewAI still have no defenses.)

  2. Try/except with a static fallback inside the generator functions. If the LLM call fails for any reason, fall back to a generic string like "Input value for the crew's tasks and agents." or "A CrewAI crew.". These descriptions are only surfaced in the CrewAI chat UI — shipping a generic default on LLM failure is strictly better than crashing the process.

  3. Make AI-generated descriptions opt-in. Accept a kwarg generate_descriptions: bool = True (default preserves current behavior), but let production users pass False to skip the LLM calls entirely.

  4. Timeout + bounded retry. At minimum, enforce a short timeout (e.g., 10s) on chat_llm.call in these two functions so a hung LLM can't indefinitely block process startup.

Our workaround

We're shipping a defensive monkey-patch in our showcase to patch both functions to return static strings before ag_ui_crewai is imported. PR: https://github.com/CopilotKit/CopilotKit/pull/3974

This is fragile (depends on private function names) and we'd much prefer an upstream fix so every AG-UI / CopilotKit / direct-CrewAI production deployment doesn't inherit this footgun.

Environment

  • crewai>=0.130.0
  • ag-ui-crewai==0.1.5 (latest released; main already has the deferred-construction fix in endpoint.py but no release yet)
  • Python 3.12

extent analysis

TL;DR

To fix the issue, consider implementing lazy initialization, adding try-except blocks with fallbacks, or making AI-generated descriptions opt-in to prevent synchronous blocking LLM calls from crashing the process during module import.

Guidance

  1. Implement lazy initialization: Defer generate_crew_chat_inputs until the first actual chat turn to prevent LLM calls during module import.
  2. Add try-except blocks with fallbacks: Catch exceptions in generate_input_description_with_ai and generate_crew_description_with_ai and return generic strings as fallbacks.
  3. Make AI-generated descriptions opt-in: Introduce a generate_descriptions parameter to allow production users to skip LLM calls.
  4. Enforce timeouts and retries: Set a short timeout (e.g., 10s) on chat_llm.call and consider implementing bounded retries.

Example

def generate_input_description_with_ai(input_name, crew, chat_llm):
    try:
        response = chat_llm.call(messages=[{"role": "user", "content": prompt}])
        # ...
    except APIError:
        return "Input value for the crew's tasks and agents."  # Fallback string

Notes

The suggested fixes aim to address the issue without modifying the underlying LLM calls. However, the effectiveness of these fixes may depend on the specific requirements and constraints of the CrewAI and ag-ui-crewai projects.

Recommendation

Apply workaround: Implement try-except blocks with fallbacks to prevent process crashes due to LLM call failures, as this is a relatively simple and non-intrusive change that can be made immediately.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING