litellm - 💡(How to fix) Fix [Bug]: Continuous False-Positive Slack Alerts for LLM Hanging Requests [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#27855Fetched 2026-05-14 03:30:05
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Error Message

general_settings: alerting: ["slack"] alert_type_config: llm_requests_hanging: digest: true digest_interval: 3600 # 1 hour llm_too_slow: digest: true digest_interval: 3600 # 1 hour llm_exceptions: digest: true

Code Example

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

---

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

---

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

---

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

llm_requests_hanging alert fires continuously for requests with response time below 600s threshold


What happened?

The llm_requests_hanging Slack alert fires continuously reporting "Requests are hanging - 600s+ request time" for multiple models, even though the actual response times (difference between start_time and end_time) are well below the 600-second threshold.

Slack Alert Details

  • Alert type: llm_requests_hanging (Digest)
  • Level: Medium
  • Message: "Requests are hanging - 600s+ request time"
  • Affected models: gemini-3-pro-preview, azure/gpt-5.1, and others
  • Behavior: Alerts fire continuously with identical start/end timestamps (e.g., Start: 2026-05-13 14:51:55, End: 2026-05-13 14:51:55)

Actual Log Data (max_time_diff_seconds between start and end)

Modelmax_time_diff_secondsmax_response_timetotal_calls
azure/gpt-4o4533084263
azure/gpt-5.4365365685
azure/gpt-5.233533610753
gemini/gemini-3-pro-preview3343353526
bedrock/us.anthropic.claude-opus-4-6-v128744743
azure/gpt-image-228628624
azure/gpt-5-mini242242544
azure/gpt-5.120520514615
azure/gpt-5.2-codex13613622
azure/gpt-image-1717230
azure/gpt-image-1.566667
gemini/gemini-3-pro-image-preview636352
azure/gpt-image-1-mini595952
azure/gemini-embedding-2-preview585816493
gemini/gemini-3.1-pro-preview545418
gemini/gemini-3-flash-preview424227
gemini/gemini-2.5-flash3738571

Key finding: The alerted models (gemini-3-pro-preview = 334s, azure/gpt-5.1 = 205s) have max_time_diff_seconds below 600s, yet the alert fires claiming "600s+ request time."


Relevant log output

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

Note: The Start and End timestamps are identical, which suggests the request completed near-instantly or the timestamps are not being set correctly in the alert context.


Expected behavior

  1. The llm_requests_hanging alert should only fire for requests that are genuinely still in-flight (no response received yet) and have exceeded 600 seconds without completing.
  2. Once a request completes successfully (success or failure callback fires), it should be immediately removed from the hanging request tracker.
  3. Alerts should not fire repeatedly/continuously for the same request if the condition has already been evaluated or resolved.
  4. The start and end timestamps in the alert should not be identical — identical timestamps suggest the request completed instantly, contradicting the "hanging" classification.

Configuration

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

Additional context

  • Running LiteLLM proxy deployed on AWS ECS with Slack alerting configured
  • The request_hanging_alerting_threshold is set to 600 seconds (default)
  • This causes significant alert fatigue as the team receives continuous false-positive notifications in Slack
  • The issue affects all model providers (Azure OpenAI, Google Gemini, AWS Bedrock)
  • All requests shown in the log data completed successfully (status = success) with response times well under 600s
  • The alerts appear to fire in a digest batch but reference requests that have already completed

Suggested Fix

  • In the hanging request checker, ensure only requests without a recorded end_time (i.e., genuinely in-flight) are evaluated against the threshold.
  • When async_log_success_event or async_log_failure_event fires, immediately remove the request from the in-flight tracking dictionary.
  • Add deduplication logic so the same request ID is not alerted on multiple times.

Steps to Reproduce

Configuration

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

Relevant log output

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

1.83.13

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  1. The llm_requests_hanging alert should only fire for requests that are genuinely still in-flight (no response received yet) and have exceeded 600 seconds without completing.
  2. Once a request completes successfully (success or failure callback fires), it should be immediately removed from the hanging request tracker.
  3. Alerts should not fire repeatedly/continuously for the same request if the condition has already been evaluated or resolved.
  4. The start and end timestamps in the alert should not be identical — identical timestamps suggest the request completed instantly, contradicting the "hanging" classification.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING