1. The `llm_requests_hanging` alert should **only** fire for requests that are genuinely still in-flight (no response received yet) and have exceeded 600 seconds without completing. 2. Once a request completes successfully (success or failure callback fires), it should be **immediately removed** from the hanging request tracker. 3. Alerts should not fire repeatedly/continuously for the same request if the condition has already been evaluated or resolved. 4. The start and end timestamps in the alert should not be identical — identical timestamps suggest the request completed instantly, contradicting the "hanging" classification. ---

Code Example

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

---

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

---

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

---

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

`llm_requests_hanging` alert fires continuously for requests with response time below 600s threshold

What happened?

The llm_requests_hanging Slack alert fires continuously reporting "Requests are hanging - 600s+ request time" for multiple models, even though the actual response times (difference between start_time and end_time) are well below the 600-second threshold.

Slack Alert Details

Alert type: llm_requests_hanging (Digest)
Level: Medium
Message: "Requests are hanging - 600s+ request time"
Affected models: gemini-3-pro-preview, azure/gpt-5.1, and others
Behavior: Alerts fire continuously with identical start/end timestamps (e.g., Start: 2026-05-13 14:51:55, End: 2026-05-13 14:51:55)

Actual Log Data (max_time_diff_seconds between start and end)

Model	max_time_diff_seconds	max_response_time	total_calls
azure/gpt-4o	453	308	4263
azure/gpt-5.4	365	365	685
azure/gpt-5.2	335	336	10753
gemini/gemini-3-pro-preview	334	335	3526
bedrock/us.anthropic.claude-opus-4-6-v1	287	44	743
azure/gpt-image-2	286	286	24
azure/gpt-5-mini	242	242	544
azure/gpt-5.1	205	205	14615
azure/gpt-5.2-codex	136	136	22
azure/gpt-image-1	71	72	30
azure/gpt-image-1.5	66	66	7
gemini/gemini-3-pro-image-preview	63	63	52
azure/gpt-image-1-mini	59	59	52
azure/gemini-embedding-2-preview	58	58	16493
gemini/gemini-3.1-pro-preview	54	54	18
gemini/gemini-3-flash-preview	42	42	27
gemini/gemini-2.5-flash	37	38	571

Key finding: The alerted models (gemini-3-pro-preview = 334s, azure/gpt-5.1 = 205s) have max_time_diff_seconds below 600s, yet the alert fires claiming "600s+ request time."

Relevant log output

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

Note: The Start and End timestamps are identical, which suggests the request completed near-instantly or the timestamps are not being set correctly in the alert context.

Expected behavior

The llm_requests_hanging alert should only fire for requests that are genuinely still in-flight (no response received yet) and have exceeded 600 seconds without completing.
Once a request completes successfully (success or failure callback fires), it should be immediately removed from the hanging request tracker.
Alerts should not fire repeatedly/continuously for the same request if the condition has already been evaluated or resolved.
The start and end timestamps in the alert should not be identical — identical timestamps suggest the request completed instantly, contradicting the "hanging" classification.

Configuration

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

Additional context

Running LiteLLM proxy deployed on AWS ECS with Slack alerting configured
The request_hanging_alerting_threshold is set to 600 seconds (default)
This causes significant alert fatigue as the team receives continuous false-positive notifications in Slack
The issue affects all model providers (Azure OpenAI, Google Gemini, AWS Bedrock)
All requests shown in the log data completed successfully (status = success) with response times well under 600s
The alerts appear to fire in a digest batch but reference requests that have already completed

Suggested Fix

In the hanging request checker, ensure only requests without a recorded end_time (i.e., genuinely in-flight) are evaluated against the threshold.
When async_log_success_event or async_log_failure_event fires, immediately remove the request from the in-flight tracking dictionary.
Add deduplication logic so the same request ID is not alerted on multiple times.

Steps to Reproduce

Configuration

general_settings:
  alerting: ["slack"]
  alert_type_config:
    llm_requests_hanging:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_too_slow:
      digest: true
      digest_interval: 3600  # 1 hour
    llm_exceptions:
      digest: true

Relevant log output

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 1
Message: Requests are hanging - 600s+ request time
Request Model: gemini-3-pro-preview API Base: None Key Alias: Team Alias:

Alert type: llm_requests_hanging (Digest) Level: Medium
Start: 2026-05-13 14:51:55 End: 2026-05-13 14:51:55
Count: 2
Message: Requests are hanging - 600s+ request time
Request Model: azure/gpt-5.1 API Base: None Key Alias: Team Alias:

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

1.83.13

Twitter / LinkedIn details

No response

FAQ

Expected behavior

The llm_requests_hanging alert should only fire for requests that are genuinely still in-flight (no response received yet) and have exceeded 600 seconds without completing.
Once a request completes successfully (success or failure callback fires), it should be immediately removed from the hanging request tracker.
Alerts should not fire repeatedly/continuously for the same request if the condition has already been evaluated or resolved.
The start and end timestamps in the alert should not be identical — identical timestamps suggest the request completed instantly, contradicting the "hanging" classification.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Continuous False-Positive Slack Alerts for LLM Hanging Requests [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Check for existing issues

What happened?

`llm_requests_hanging` alert fires continuously for requests with response time below 600s threshold

What happened?

Slack Alert Details

Actual Log Data (max_time_diff_seconds between start and end)

Relevant log output

Expected behavior

Configuration

Additional context

Suggested Fix

Steps to Reproduce

Configuration

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Continuous False-Positive Slack Alerts for LLM Hanging Requests [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Check for existing issues

What happened?

llm_requests_hanging alert fires continuously for requests with response time below 600s threshold

What happened?

Slack Alert Details

Actual Log Data (max_time_diff_seconds between start and end)

Relevant log output

Expected behavior

Configuration

Additional context

Suggested Fix

Steps to Reproduce

Configuration

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

`llm_requests_hanging` alert fires continuously for requests with response time below 600s threshold