crewai - ✅(Solved) Fix AnthropicCompletion._handle_completion silently discards stop_reason; no way to detect truncation via hooks or events [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#5148Fetched 2026-04-08 01:44:52
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
commented ×1cross-referenced ×1labeled ×1referenced ×1

Error Message

  1. after_llm_call hook with token ratio heuristic: The hook receives context.agent.role and context.llm._token_usage, so one could warn when completion_tokens ≥ max_tokens × 0.95. This works as a proxy but is not the same as checking stop_reason directly — it can produce false positives and misses cases where max_tokens is not explicitly set.

Fix Action

Fix / Workaround

  1. after_llm_call hook with token ratio heuristic: The hook receives context.agent.role and context.llm._token_usage, so one could warn when completion_tokens ≥ max_tokens × 0.95. This works as a proxy but is not the same as checking stop_reason directly — it can produce false positives and misses cases where max_tokens is not explicitly set.
  2. Monkey-patching self.client.messages.create: Wrapping the Anthropic SDK client on each AnthropicCompletion instance post-init to intercept the raw response. This works today without any framework changes but is fragile to internal refactoring and not a suitable permanent solution.
  3. Content sentinel / downstream heuristics: Checking whether the response ends mid-sentence or lacks expected structural markers. Too unreliable for production use — the failure mode (silent truncation mid-table) produces syntactically valid but semantically incomplete output that passes all surface checks.

PR fix notes

PR #5149: fix: surface Anthropic stop_reason to detect truncation (#5148)

Description (problem / solution / changelog)

Summary

Fixes #5148. Anthropic's Message response includes a stop_reason field that indicates why the API stopped generating (e.g. "max_tokens" means the output was truncated). Previously, AnthropicCompletion silently discarded this field, making it impossible for users to detect truncation via hooks or events.

Changes:

  • Add stop_reason: str | None = None field to LLMCallCompletedEvent
  • Add stop_reason parameter to BaseLLM._emit_call_completed_event
  • Add _extract_stop_reason() static method on AnthropicCompletion that safely extracts stop_reason as str | None (guards against non-string values, e.g. from MagicMock in tests)
  • Add _warn_if_truncated() helper on AnthropicCompletion that logs a warning when stop_reason == "max_tokens"
  • Plumb stop_reason through all 6 Anthropic completion methods (sync + async × regular/streaming/tool-use)
  • Add 7 unit tests covering warning behavior, event field propagation, and edge cases

Non-Anthropic providers are unaffected — they continue to emit stop_reason=None by default.

Review & Testing Checklist for Human

  • Verify all 6 methods are consistently updated: The same pattern (extract → warn → pass to event) is applied across _handle_completion, _handle_streaming_completion, _handle_tool_use_conversation, and their async counterparts. Confirm no code path was missed, especially around early returns (structured output, tool use blocks).
  • Streaming path correctness: In _handle_streaming_completion / _ahandle_streaming_completion, stop_reason is read from stream.get_final_message(). Verify that the reconstructed final_message actually carries the stop_reason from the stream (it should, per Anthropic SDK behavior).
  • Consider whether from_agent.role access is safe: _warn_if_truncated accesses from_agent.role when from_agent is truthy. This should always work since from_agent is typed as Agent | None, but confirm no caller passes a non-Agent truthy value.
  • Decide if other providers should also surface finish reasons: This PR is Anthropic-only. OpenAI has an analogous finish_reason field — consider whether a follow-up is needed.

Suggested manual test: Set max_tokens to a very small value (e.g. 50) on an Anthropic model, run a crew, and verify:

  1. A warning is logged containing stop_reason='max_tokens'
  2. The LLMCallCompletedEvent carries stop_reason="max_tokens" (observable via a custom event handler)

Notes

  • The stop_reason field on LLMCallCompletedEvent and the BaseLLM parameter are additive and backwards-compatible (default None).
  • _extract_stop_reason uses an isinstance(raw, str) guard so that non-string attribute values (e.g. auto-created MagicMock attributes in existing tests) safely become None rather than causing Pydantic validation errors.
  • No async-specific tests for the warning (only sync), though the code is symmetrical. The event-bus tests do exercise the full emit→capture path.

Link to Devin session: https://app.devin.ai/sessions/7214a66c41b94b07803ad5faacf12270

Changed files

  • lib/crewai/src/crewai/events/types/llm_events.py (modified, +1/-0)
  • lib/crewai/src/crewai/llms/base_llm.py (modified, +2/-0)
  • lib/crewai/src/crewai/llms/providers/anthropic/completion.py (modified, +54/-0)
  • lib/crewai/tests/llms/anthropic/test_anthropic.py (modified, +214/-0)

PR #5405: fix: extract Anthropic stop_reason to detect output truncation

Description (problem / solution / changelog)

Summary

Fixes #5148

Anthropic API responses include a stop_reason field that indicates why generation stopped ("end_turn", "max_tokens", "tool_use", etc.). Previously this field was silently discarded after token usage extraction, making it impossible for downstream hooks or event listeners to detect output truncation.

This PR:

  • Adds stop_reason: str | None field to LLMCallCompletedEvent so event subscribers can react programmatically to truncation
  • Adds _check_and_get_stop_reason() helper on AnthropicCompletion that extracts the field and logs a WARNING when stop_reason == "max_tokens" (includes agent role for context)
  • Propagates stop_reason through all 6 completion code paths: _handle_completion, _handle_streaming_completion, _handle_tool_use_conversation, and their async counterparts
  • Adds stop_reason parameter to BaseLLM._emit_call_completed_event() for provider-agnostic forwarding

The change is fully backward-compatible: stop_reason defaults to None, so existing event listeners are unaffected.

Changed files

FileChange
lib/crewai/src/crewai/events/types/llm_events.pyAdd stop_reason field to LLMCallCompletedEvent
lib/crewai/src/crewai/llms/base_llm.pyAdd stop_reason param to _emit_call_completed_event()
lib/crewai/src/crewai/llms/providers/anthropic/completion.pyExtract stop_reason in all 6 methods, add _check_and_get_stop_reason() helper
lib/crewai/tests/llms/anthropic/test_anthropic_stop_reason.pyComprehensive tests (event model, helper, sync/async completion, tool-use paths)

Test plan

  • LLMCallCompletedEvent accepts and defaults stop_reason correctly
  • _check_and_get_stop_reason returns correct values for end_turn, max_tokens, and missing attribute
  • Warning logged only when stop_reason == "max_tokens"
  • Agent role included in warning message
  • stop_reason propagated through _handle_completion (sync)
  • stop_reason propagated through _ahandle_completion (async)
  • stop_reason propagated through _handle_tool_use_conversation (sync)
  • stop_reason propagated through _ahandle_tool_use_conversation (async)

🤖 Generated with Claude Code

<!-- CURSOR_SUMMARY -->

[!NOTE] Medium Risk Touches the core LLM event emission path (BaseLLM._emit_call_completed_event) and Anthropic provider completion flows; while backwards-compatible (stop_reason=None default), any mismatched event consumers or provider edge cases could surface at runtime.

Overview Adds a provider-agnostic stop_reason: str | None to LLMCallCompletedEvent and threads it through BaseLLM._emit_call_completed_event().

Updates Anthropic completion handling to extract stop_reason on sync/async, streaming, and tool-use code paths via _check_and_get_stop_reason(), logging a warning when output is truncated (stop_reason == "max_tokens"). Comprehensive tests cover the new event field, warning behavior (including agent role), and propagation across the main Anthropic completion paths.

<sup>Reviewed by Cursor Bugbot for commit 1d85cceabf356cca7ec9c98e1015fe6194dd987a. Bugbot is set up for automated code reviews on this repo. Configure here.</sup>

<!-- /CURSOR_SUMMARY -->

Changed files

  • .editorconfig (removed, +0/-14)
  • .env.test (removed, +0/-160)
  • .github/CONTRIBUTING.md (removed, +0/-173)
  • .github/ISSUE_TEMPLATE/bug_report.yml (removed, +0/-115)
  • .github/ISSUE_TEMPLATE/config.yml (removed, +0/-1)
  • .github/ISSUE_TEMPLATE/feature_request.yml (removed, +0/-65)
  • .github/codeql/codeql-config.yml (removed, +0/-33)
  • .github/dependabot.yml (removed, +0/-16)
  • .github/security.md (removed, +0/-12)
  • .github/workflows/build-uv-cache.yml (removed, +0/-48)
  • .github/workflows/codeql.yml (removed, +0/-103)
  • .github/workflows/docs-broken-links.yml (removed, +0/-35)
  • .github/workflows/generate-tool-specs.yml (removed, +0/-63)
  • .github/workflows/linter.yml (removed, +0/-87)
  • .github/workflows/nightly.yml (removed, +0/-127)
  • .github/workflows/pr-size.yml (removed, +0/-32)
  • .github/workflows/pr-title.yml (removed, +0/-41)
  • .github/workflows/publish.yml (removed, +0/-166)
  • .github/workflows/stale.yml (removed, +0/-29)
  • .github/workflows/tests.yml (removed, +0/-137)
  • .github/workflows/type-checker.yml (removed, +0/-91)
  • .github/workflows/update-test-durations.yml (removed, +0/-71)
  • .github/workflows/vulnerability-scan.yml (removed, +0/-105)
  • .gitignore (removed, +0/-32)
  • .pre-commit-config.yaml (removed, +0/-33)
  • .python-version (removed, +0/-1)
  • LICENSE (removed, +0/-19)
  • README.md (removed, +0/-780)
  • conftest.py (removed, +0/-296)
  • docs/ar/api-reference/inputs.mdx (removed, +0/-8)
  • docs/ar/api-reference/introduction.mdx (removed, +0/-135)
  • docs/ar/api-reference/kickoff.mdx (removed, +0/-8)
  • docs/ar/api-reference/resume.mdx (removed, +0/-6)
  • docs/ar/api-reference/status.mdx (removed, +0/-6)
  • docs/ar/changelog.mdx (removed, +0/-841)
  • docs/ar/concepts/agent-capabilities.mdx (removed, +0/-147)
  • docs/ar/concepts/agents.mdx (removed, +0/-357)
  • docs/ar/concepts/checkpointing.mdx (removed, +0/-229)
  • docs/ar/concepts/cli.mdx (removed, +0/-287)
  • docs/ar/concepts/collaboration.mdx (removed, +0/-363)
  • docs/ar/concepts/crews.mdx (removed, +0/-204)
  • docs/ar/concepts/event-listener.mdx (removed, +0/-236)
  • docs/ar/concepts/files.mdx (removed, +0/-267)
  • docs/ar/concepts/flows.mdx (removed, +0/-1068)
  • docs/ar/concepts/knowledge.mdx (removed, +0/-1095)
  • docs/ar/concepts/llms.mdx (removed, +0/-1464)
  • docs/ar/concepts/memory.mdx (removed, +0/-878)
  • docs/ar/concepts/planning.mdx (removed, +0/-155)
  • docs/ar/concepts/processes.mdx (removed, +0/-67)
  • docs/ar/concepts/production-architecture.mdx (removed, +0/-154)
  • docs/ar/concepts/reasoning.mdx (removed, +0/-148)
  • docs/ar/concepts/skills.mdx (removed, +0/-306)
  • docs/ar/concepts/tasks.mdx (removed, +0/-1085)
  • docs/ar/concepts/testing.mdx (removed, +0/-49)
  • docs/ar/concepts/tools.mdx (removed, +0/-290)
  • docs/ar/concepts/training.mdx (removed, +0/-197)
  • docs/ar/enterprise/features/agent-repositories.mdx (removed, +0/-155)
  • docs/ar/enterprise/features/automations.mdx (removed, +0/-104)
  • docs/ar/enterprise/features/crew-studio.mdx (removed, +0/-88)
  • docs/ar/enterprise/features/flow-hitl-management.mdx (removed, +0/-558)
  • docs/ar/enterprise/features/hallucination-guardrail.mdx (removed, +0/-251)
  • docs/ar/enterprise/features/marketplace.mdx (removed, +0/-45)
  • docs/ar/enterprise/features/pii-trace-redactions.mdx (removed, +0/-342)
  • docs/ar/enterprise/features/rbac.mdx (removed, +0/-256)
  • docs/ar/enterprise/features/tools-and-integrations.mdx (removed, +0/-261)
  • docs/ar/enterprise/features/traces.mdx (removed, +0/-148)
  • docs/ar/enterprise/features/webhook-streaming.mdx (removed, +0/-172)
  • docs/ar/enterprise/guides/automation-triggers.mdx (removed, +0/-321)
  • docs/ar/enterprise/guides/azure-openai-setup.mdx (removed, +0/-54)
  • docs/ar/enterprise/guides/build-crew.mdx (removed, +0/-48)
  • docs/ar/enterprise/guides/capture_telemetry_logs.mdx (removed, +0/-39)
  • docs/ar/enterprise/guides/custom-mcp-server.mdx (removed, +0/-136)
  • docs/ar/enterprise/guides/deploy-to-amp.mdx (removed, +0/-445)
  • docs/ar/enterprise/guides/enable-crew-studio.mdx (removed, +0/-182)
  • docs/ar/enterprise/guides/gmail-trigger.mdx (removed, +0/-97)
  • docs/ar/enterprise/guides/google-calendar-trigger.mdx (removed, +0/-83)
  • docs/ar/enterprise/guides/google-drive-trigger.mdx (removed, +0/-80)
  • docs/ar/enterprise/guides/hubspot-trigger.mdx (removed, +0/-61)
  • docs/ar/enterprise/guides/human-in-the-loop.mdx (removed, +0/-157)
  • docs/ar/enterprise/guides/kickoff-crew.mdx (removed, +0/-178)
  • docs/ar/enterprise/guides/microsoft-teams-trigger.mdx (removed, +0/-70)
  • docs/ar/enterprise/guides/onedrive-trigger.mdx (removed, +0/-69)
  • docs/ar/enterprise/guides/outlook-trigger.mdx (removed, +0/-69)
  • docs/ar/enterprise/guides/prepare-for-deployment.mdx (removed, +0/-311)
  • docs/ar/enterprise/guides/private-package-registry.mdx (removed, +0/-263)
  • docs/ar/enterprise/guides/react-component-export.mdx (removed, +0/-112)
  • docs/ar/enterprise/guides/salesforce-trigger.mdx (removed, +0/-50)
  • docs/ar/enterprise/guides/slack-trigger.mdx (removed, +0/-62)
  • docs/ar/enterprise/guides/team-management.mdx (removed, +0/-91)
  • docs/ar/enterprise/guides/tool-repository.mdx (removed, +0/-154)
  • docs/ar/enterprise/guides/training-crews.mdx (removed, +0/-132)
  • docs/ar/enterprise/guides/update-crew.mdx (removed, +0/-91)
  • docs/ar/enterprise/guides/webhook-automation.mdx (removed, +0/-157)
  • docs/ar/enterprise/guides/zapier-trigger.mdx (removed, +0/-105)
  • docs/ar/enterprise/integrations/asana.mdx (removed, +0/-271)
  • docs/ar/enterprise/integrations/box.mdx (removed, +0/-280)
  • docs/ar/enterprise/integrations/clickup.mdx (removed, +0/-301)
  • docs/ar/enterprise/integrations/github.mdx (removed, +0/-0)
  • docs/ar/enterprise/integrations/gmail.mdx (removed, +0/-0)
  • docs/ar/enterprise/integrations/google_calendar.mdx (removed, +0/-0)

Code Example

if getattr(response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

---

if getattr(final_response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )
RAW_BUFFERClick to expand / collapse

Feature Area

Other (please specify in additional context)

Is your feature request related to a an existing bug? Please link it here.

NA

Describe the solution you'd like

The stop_reason field on Anthropic's Message response object is already accessible within _handle_completion() and _handle_tool_use_conversation() immediately after the API call — but is never read. Both methods extract token usage from the response and then discard the object, so the truncation signal is permanently lost before it can reach any hook, callback, or event subscriber.

The minimal fix is a logging.warning() call at two points where the raw response is still in scope:

Location 1 — _handle_completion(), after _track_token_usage_internal(usage):

if getattr(response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

Location 2 — _handle_tool_use_conversation(), after _track_token_usage_internal(follow_up_usage):

if getattr(final_response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

Location 2 is the more critical path — it handles the final synthesis response after all tool calls complete, which is where silent truncation caused significant downstream data corruption in our use case.

A more complete solution would also add stop_reason: str | None as a field on LLMCallCompletedEvent, allowing downstream subscribers (hooks, event listeners) to react to truncation programmatically rather than only through log monitoring.

The same fix should be applied to the async counterparts: _ahandle_completion() and _ahandle_tool_use_conversation().

Describe alternatives you've considered

  1. after_llm_call hook with token ratio heuristic: The hook receives context.agent.role and context.llm._token_usage, so one could warn when completion_tokens ≥ max_tokens × 0.95. This works as a proxy but is not the same as checking stop_reason directly — it can produce false positives and misses cases where max_tokens is not explicitly set.
  2. Monkey-patching self.client.messages.create: Wrapping the Anthropic SDK client on each AnthropicCompletion instance post-init to intercept the raw response. This works today without any framework changes but is fragile to internal refactoring and not a suitable permanent solution.
  3. Content sentinel / downstream heuristics: Checking whether the response ends mid-sentence or lacks expected structural markers. Too unreliable for production use — the failure mode (silent truncation mid-table) produces syntactically valid but semantically incomplete output that passes all surface checks.

Additional context

LLM provider / response observability

re: Willingness to Contribute (below): I'm happy to submit a pull request for the minimal fix (logging warning at both sync locations). Happy to also include the async paths and/or the LLMCallCompletedEvent field addition if that's the preferred direction — just let me know in the issue before I open it. However, this would be my first crewai pull request.

Willingness to Contribute

Yes, I'd be happy to submit a pull request

extent analysis

Fix Plan

To address the issue of silent truncation in Anthropic's Message response object, we will implement the following steps:

  • Add a logging warning at two points where the raw response is still in scope:
    • In _handle_completion(), after _track_token_usage_internal(usage)
    • In _handle_tool_use_conversation(), after _track_token_usage_internal(follow_up_usage)
  • Apply the same fix to the async counterparts: _ahandle_completion() and _ahandle_tool_use_conversation()
  • Consider adding stop_reason: str | None as a field on LLMCallCompletedEvent to allow downstream subscribers to react to truncation programmatically

Example Code

# In _handle_completion()
if getattr(response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

# In _handle_tool_use_conversation()
if getattr(final_response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

# Async counterparts
# In _ahandle_completion()
if getattr(response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

# In _ahandle_tool_use_conversation()
if getattr(final_response, "stop_reason", None) == "max_tokens":
    agent_hint = f" [{from_agent.role}]" if from_agent else ""
    logging.warning(
        f"Truncated response{agent_hint}: stop_reason='max_tokens'. "
        f"Consider increasing max_tokens (current: {self.max_tokens})."
    )

Verification

To verify that the fix worked, check the logs for the warning message indicating truncation. You can also test the fix by intentionally setting a low max_tokens value and verifying that the warning is triggered.

Extra Tips

  • Consider adding additional logging or monitoring to track truncation events and improve response observability.
  • Review the LLMCallCompletedEvent field addition to ensure it aligns with the desired functionality and does not introduce any unintended consequences.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING