litellm - ✅(Solved) Fix [Bug]: litellm_proxy_total_requests_metric Emits status_code=None for some of failed requests [1 pull requests, 1 participants]

litellm2026-03-20 16:52:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24224•Fetched 2026-04-08 01:09:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

MiloszJurewicz

Participants

MiloszJurewicz

Timeline (top)

referenced ×2cross-referenced ×1labeled ×1

Error Message

When aggregating total requests by HTTP error codes (e.g., 4xx and 5xx), the counts do not match with litellm_proxy_failed_requests_metric. // Alternate between malformed body (50%) and auth error (50%) // 50% auth error - wrong API key console.log('Sending auth error request (invalid API key)'); console.log(Error response body: ${response.body});

Fix Action

Fixed

Fixed by PR: fix(prometheus): default to status_code=500 for exceptions without status code (https://github.com/BerriAI/litellm/pull/24264)

PR fix notes

PR #24264: fix(prometheus): default to status_code=500 for exceptions without status code

Repository: BerriAI/litellm
Author: sourrris
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24264

Description (problem / solution / changelog)

Summary

_extract_status_code() returned None when an exception lacked status_code/code attributes
str(None) became the literal "None" in Prometheus labels, causing litellm_proxy_total_requests_metric 4xx/5xx aggregations to not match litellm_proxy_failed_requests_metric
Now defaults to 500 (unclassified server error) when an exception is present but carries no extractable status code — covers both direct exception param and kwargs["exception"] paths

Test plan

Added 5 regression tests in tests/test_litellm/integrations/test_prometheus_status_code_none.py
Verified no regressions in existing prometheus tests (test_prometheus_invalid_key_filtering.py — 2 pre-existing async failures unrelated to this change)

Fixes #24224

🤖 Generated with Claude Code

Changed files

litellm/integrations/prometheus.py (modified, +19/-7)
tests/test_litellm/integrations/test_prometheus_status_code_none.py (added, +48/-0)

Code Example

# PromQL query 1 - Sum total 4xx and 5xx
sum(litellm_proxy_total_requests_metric_total { cluster="AAA", litellm_identifier="atlas", status_code=~"[4-5].." })

#  Will not match total failed request
sum(litellm_proxy_failed_requests_metric_total { cluster="AAA", litellm_identifier="atlas" })

# But if we include total with status_code equal to none it will match litellm_proxy_failed_requests_metric_total 
sum(litellm_proxy_total_requests_metric_total {
  cluster="AAA",
  litellm_identifier="atlas",
  status_code=~"[4-5]..|None" 
})

---

import http from 'k6/http';
import { check, sleep } from 'k6';

// Test configuration
export const options = {
  scenarios: {
    constant_request_rate: {
      executor: 'constant-arrival-rate',
      rate: 2, // 2 requests per second
      timeUnit: '1s',
      duration: '10m', // Run for 10 minutes (adjust as needed)
      preAllocatedVUs: 2, // Pre-allocate 2 virtual users
      maxVUs: 10, // Maximum virtual users if needed
    },
  },
};

// Configuration
const BASE_URL = 'https://....';
const API_KEY = '.....';

// Models to rotate between
const MODELS = ['gpt-4.1-nano', 'gpt-4.1'];

export default function () {
  // Rotate between models
  const model = MODELS[__ITER % MODELS.length];

  // Alternate between malformed body (50%) and auth error (50%)
  const errorType = Math.random();
  
  let payload;
  let apiKey = API_KEY;
  
  if (errorType < 0.5) {
    // 50% malformed body - missing required messages field
    payload = JSON.stringify({
      model: model,
      // Missing messages field
    });
    console.log('Sending malformed request (missing messages)');
  } else {
    // 50% auth error - wrong API key
    apiKey = 'sk-invalid-key-12345';
    payload = JSON.stringify({
      model: model,
      messages: [
        {
          role: 'user',
          content: 'What is 1+1?',
        },
      ],
    });
    console.log('Sending auth error request (invalid API key)');
  }

  // Request parameters
  const params = {
    headers: {
      'Content-Type': 'application/json',
      'x-litellm-api-key': apiKey,
    },
    timeout: '60s',
  };

  // Make the request
  const response = http.post(`${BASE_URL}/chat/completions`, payload, params);

  // Check response
  check(response, {
    'status is 4xx or 5xx': (r) => r.status >= 400,
    'response has body': (r) => r.body.length > 0,
    'response is valid JSON': (r) => {
      try {
        JSON.parse(r.body);
        return true;
      } catch (e) {
        return false;
      }
    },
  });

  // Log the model used and response details
  console.log(`Request to ${model} - Status: ${response.status} - Duration: ${response.timings.duration}ms`);
  
  if (response.status !== 200) {
    console.log(`Error response body: ${response.body}`);
  }
}

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The litellm_proxy_total_requests_metric sometimes includes entries where status_code=None.

Observed Behavior:

When "testing" litellm and emitted metrics i noticed that some of the requests that return 4xx or 5xx status code (unsure which tbh) cause litellm_proxy_total_requests_metric includes data points with status_code=None.

When aggregating total requests by HTTP error codes (e.g., 4xx and 5xx), the counts do not match with litellm_proxy_failed_requests_metric.

Some failed requests are not being represented with a proper HTTP status code in the total requests metric. Which you can see with following promql queries

# PromQL query 1 - Sum total 4xx and 5xx
sum(litellm_proxy_total_requests_metric_total { cluster="AAA", litellm_identifier="atlas", status_code=~"[4-5].." })

#  Will not match total failed request
sum(litellm_proxy_failed_requests_metric_total { cluster="AAA", litellm_identifier="atlas" })

# But if we include total with status_code equal to none it will match litellm_proxy_failed_requests_metric_total 
sum(litellm_proxy_total_requests_metric_total {
  cluster="AAA",
  litellm_identifier="atlas",
  status_code=~"[4-5]..|None" 
})

Expected Behavior:

All requests in litellm_proxy_total_requests_metric should have a valid HTTP status code.
Aggregating 4xx and 5xx responses from total requests should align with litellm_proxy_failed_requests_metric.

Steps to Reproduce

I did use grafana k6s to emit bunch of traffic, running this script or just taking requests out of it will reproduce invalid metrics

import http from 'k6/http';
import { check, sleep } from 'k6';

// Test configuration
export const options = {
  scenarios: {
    constant_request_rate: {
      executor: 'constant-arrival-rate',
      rate: 2, // 2 requests per second
      timeUnit: '1s',
      duration: '10m', // Run for 10 minutes (adjust as needed)
      preAllocatedVUs: 2, // Pre-allocate 2 virtual users
      maxVUs: 10, // Maximum virtual users if needed
    },
  },
};

// Configuration
const BASE_URL = 'https://....';
const API_KEY = '.....';

// Models to rotate between
const MODELS = ['gpt-4.1-nano', 'gpt-4.1'];

export default function () {
  // Rotate between models
  const model = MODELS[__ITER % MODELS.length];

  // Alternate between malformed body (50%) and auth error (50%)
  const errorType = Math.random();
  
  let payload;
  let apiKey = API_KEY;
  
  if (errorType < 0.5) {
    // 50% malformed body - missing required messages field
    payload = JSON.stringify({
      model: model,
      // Missing messages field
    });
    console.log('Sending malformed request (missing messages)');
  } else {
    // 50% auth error - wrong API key
    apiKey = 'sk-invalid-key-12345';
    payload = JSON.stringify({
      model: model,
      messages: [
        {
          role: 'user',
          content: 'What is 1+1?',
        },
      ],
    });
    console.log('Sending auth error request (invalid API key)');
  }

  // Request parameters
  const params = {
    headers: {
      'Content-Type': 'application/json',
      'x-litellm-api-key': apiKey,
    },
    timeout: '60s',
  };

  // Make the request
  const response = http.post(`${BASE_URL}/chat/completions`, payload, params);

  // Check response
  check(response, {
    'status is 4xx or 5xx': (r) => r.status >= 400,
    'response has body': (r) => r.body.length > 0,
    'response is valid JSON': (r) => {
      try {
        JSON.parse(r.body);
        return true;
      } catch (e) {
        return false;
      }
    },
  });

  // Log the model used and response details
  console.log(`Request to ${model} - Status: ${response.status} - Duration: ${response.timings.duration}ms`);
  
  if (response.status !== 200) {
    console.log(`Error response body: ${response.body}`);
  }
}

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.12

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of litellm_proxy_total_requests_metric including entries with status_code=None, we need to ensure that all requests have a valid HTTP status code.

Here are the steps to fix the issue:

Update the metric collection logic to handle cases where the status code is not available or is None.
Modify the code to set a default status code (e.g., 500) when the actual status code is None.

Example code snippet in Python:

def collect_metric(response):
    status_code = response.status
    if status_code is None:
        # Set a default status code when the actual status code is None
        status_code = 500
    # Collect the metric with the valid status code
    litellm_proxy_total_requests_metric_total.labels(cluster="AAA", litellm_identifier="atlas", status_code=status_code).inc()

Review the litellm_proxy_total_requests_metric collection logic to ensure it handles all possible scenarios, including cases where the status code is not available.

Verification

To verify that the fix worked:

Run the same test script that reproduced the issue.
Check the litellm_proxy_total_requests_metric metric using PromQL queries to ensure that all requests have a valid HTTP status code.
Compare the counts of 4xx and 5xx responses from litellm_proxy_total_requests_metric with litellm_proxy_failed_requests_metric to ensure they match.

Example PromQL query:

sum(litellm_proxy_total_requests_metric_total { cluster="AAA", litellm_identifier="atlas", status_code=~"[4-5].." })

This should match the count from litellm_proxy_failed_requests_metric.

Extra Tips

Regularly review and update the metric collection logic to handle new scenarios and edge cases.
Consider adding additional logging or monitoring to detect and alert on cases where the status code is None.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: litellm_proxy_total_requests_metric Emits status_code=None for some of failed requests [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #24264: fix(prometheus): default to status_code=500 for exceptions without status code

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: litellm_proxy_total_requests_metric Emits status_code=None for some of failed requests [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #24264: fix(prometheus): default to status_code=500 for exceptions without status code

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING