litellm - ✅(Solved) Fix [Bug]: Pods get OOM Killed due to continous increase in memory. [1 pull requests, 1 participants]

litellm2026-04-06 11:00:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25219•Fetched 2026-04-08 02:52:57

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Blaze-DSP

Participants

Blaze-DSP

Timeline (top)

referenced ×2cross-referenced ×1labeled ×1

PR fix notes

PR #25235: fix(ui): use UTC time for request logs start/end time initialization

Repository: BerriAI/litellm
Author: HARSHAVARDHAN-RAJU5
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25235

Description (problem / solution / changelog)

Fixes #25235

moment() without .utc() creates local time strings which get misinterpreted during UTC conversion, causing logs to appear several hours behind real-time in the Request Logs UI.

Added .utc() to all moment() calls that set startTime/endTime state in view_logs/index.tsx.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

ui/litellm-dashboard/src/components/view_logs/index.tsx (modified, +10/-9)
ui/litellm-dashboard/src/components/view_logs/log_filter_logic.tsx (modified, +2/-2)

Code Example

...
general_settings:
      # DB Settings
      proxy_batch_write_at: 60
      database_connection_pool_limit: 10
      store_prompts_in_spend_logs: true
      maximum_spend_logs_retention_period: 30d
      maximum_spend_logs_retention_interval: 1d
      store_model_in_db: true
      allow_requests_on_db_unavailable: true

      # Health Check settings
      background_health_checks: true
      use_shared_health_check: true
      health_check_interval: 60
      health_check_details: false
...

---

...
containers:
        - args:
            - '--config'
            - /app/proxy_config.yaml
            - '--port'
            - '4000'
            - '--num_workers'
            - '4'
            - '--run_gunicorn'
            - '--max_requests_before_restart'
            - '1000'
          env:
            - name: LITELLM_LOG
              value: INFO
            - name: NODE_ENV
              value: production
            - name: USE_PRISMA_MIGRATE
              value: 'true'
            - name: SPEND_LOG_RUN_LOOPS
              value: '100'
            - name: SPEND_LOG_CLEANUP_BATCH_SIZE
              value: '1000'
            - name: SEPARATE_HEALTH_APP
              value: '1'
            - name: SEPARATE_HEALTH_PORT
              value: '4001'
            - name: DEFAULT_HEALTH_CHECK_PROMPT
              value: This is a test health check prompt.
...

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Have been seeing a lot of OOM kills after upgrading docker image to ghcr.io/berriai/litellm:main-v1.82.0-stable .

In earlier versions, pods used to have high memory spikes (OOM kills) during cleanup due to unefficient queries which was solved using #21930

Is there a possible solution to resolve this or some github issue around this?

Steps to Reproduce

...
general_settings:
      # DB Settings
      proxy_batch_write_at: 60
      database_connection_pool_limit: 10
      store_prompts_in_spend_logs: true
      maximum_spend_logs_retention_period: 30d
      maximum_spend_logs_retention_interval: 1d
      store_model_in_db: true
      allow_requests_on_db_unavailable: true

      # Health Check settings
      background_health_checks: true
      use_shared_health_check: true
      health_check_interval: 60
      health_check_details: false
...

Contianer:

...
containers:
        - args:
            - '--config'
            - /app/proxy_config.yaml
            - '--port'
            - '4000'
            - '--num_workers'
            - '4'
            - '--run_gunicorn'
            - '--max_requests_before_restart'
            - '1000'
          env:
            - name: LITELLM_LOG
              value: INFO
            - name: NODE_ENV
              value: production
            - name: USE_PRISMA_MIGRATE
              value: 'true'
            - name: SPEND_LOG_RUN_LOOPS
              value: '100'
            - name: SPEND_LOG_CLEANUP_BATCH_SIZE
              value: '1000'
            - name: SEPARATE_HEALTH_APP
              value: '1'
            - name: SEPARATE_HEALTH_PORT
              value: '4001'
            - name: DEFAULT_HEALTH_CHECK_PROMPT
              value: This is a test health check prompt.
...

Relevant log output

What part of LiteLLM is this about?

UI Dashboard

What LiteLLM version are you on ?

v1.82.0

Twitter / LinkedIn details

https://www.linkedin.com/in/dev-patel-8bb14a321/

extent analysis

TL;DR

Adjusting the database_connection_pool_limit and SPEND_LOG_CLEANUP_BATCH_SIZE settings may help mitigate the OOM kills in the LiteLLM container.

Guidance

Review the general_settings configuration, specifically the database_connection_pool_limit and proxy_batch_write_at settings, to ensure they are suitable for the current workload.
Consider reducing the SPEND_LOG_CLEANUP_BATCH_SIZE environment variable to a lower value, such as 100, to reduce memory usage during cleanup.
Investigate the store_prompts_in_spend_logs and store_model_in_db settings to determine if they can be optimized or disabled to reduce memory usage.
Monitor the container's memory usage and adjust the num_workers argument to a lower value if necessary to prevent OOM kills.

Example

No specific code snippet is provided, but the configuration files and environment variables can be adjusted as mentioned in the guidance section.

Notes

The issue seems to be related to the upgraded Docker image and the configuration settings. However, without more information about the specific queries and workload, it's difficult to provide a more detailed solution.

Recommendation

Apply workaround: Adjust the configuration settings and environment variables to optimize memory usage and prevent OOM kills, as the root cause of the issue is likely related to the increased memory usage after the upgrade.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: Pods get OOM Killed due to continous increase in memory. [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #25235: fix(ui): use UTC time for request logs start/end time initialization

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: Pods get OOM Killed due to continous increase in memory. [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #25235: fix(ui): use UTC time for request logs start/end time initialization

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING