litellm - ✅(Solved) Fix [Bug]: Spending for Huggingface model is always $0 [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#22863Fetched 2026-04-08 00:39:32
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Participants
Timeline (top)
referenced ×3cross-referenced ×2labeled ×2

Fix Action

Fixed

PR fix notes

PR #22873: fix(cost): respect custom cost settings for HuggingFace models

Description (problem / solution / changelog)

Summary

Fixes #22863

When custom_llm_provider is "huggingface", the _get_model_info_helper function in utils.py was unconditionally returning a hardcoded ModelInfoBase with input_cost_per_token=0 and output_cost_per_token=0. This caused all cost calculations for HuggingFace models to return $0, even when the user explicitly configured custom pricing via output_cost_per_token and input_cost_per_token in their config.yaml.

Root cause

In litellm/utils.py, the _get_model_info_helper function had a branch that checked if custom_llm_provider == "huggingface" and immediately returned zero-cost model info before checking litellm.model_cost for any user-registered custom pricing. When the Router processes user config, it registers the custom pricing under the model's ID in litellm.model_cost via litellm.register_model(), but this registered data was never consulted for HuggingFace models.

Fix

Added a guard condition and not _is_potential_model_name_in_model_cost(potential_model_names) to the HuggingFace branch, so the hardcoded zero-cost fallback is only used when the model is not already registered in litellm.model_cost. When custom pricing is registered (by the Router), the standard model_cost lookup path is followed instead, which correctly returns the user-defined costs.

This mirrors the existing pattern already used for ollama/ollama_chat providers on the very next line.

Test plan

  • Added test_huggingface_custom_cost_per_token — verifies that HuggingFace models with registered custom pricing return correct non-zero costs
  • Added test_huggingface_no_custom_cost_returns_zero — verifies that HuggingFace models without custom pricing still return $0 (backward compatibility)
  • All 35 existing tests in tests/test_litellm/test_cost_calculator.py pass
  • All existing test_get_model_info_huggingface_models tests pass

Changed files

  • litellm/utils.py (modified, +4/-1)
  • tests/test_litellm/test_cost_calculator.py (modified, +101/-17)

PR #22912: fix: HuggingFace cost calculation now respects custom input_cost_per_token/output_cost_per_token config

Description (problem / solution / changelog)

Summary

Fixes #22863

When using HuggingFace via router.huggingface.co with custom pricing (input_cost_per_token/output_cost_per_token in model config), cost tracking was always returning $0.

Root Cause

The completion_cost function wasn't extracting custom pricing from litellm_logging_obj.litellm_params.metadata.model_info when custom_pricing=True. For unknown providers (like HuggingFace), the code would fall back to the global model cost map, which doesn't contain custom deployments.

Fix

Added logic to extract custom_cost_per_token from model_info in the metadata when custom_pricing=True and no explicit custom_cost_per_token was passed.

Also fixed a subtle bug: replaced _input_cost or 0.0 with _input_cost if _input_cost is not None else 0.0 to correctly handle explicit zero costs (free input tokens).

Test Plan

  • test_custom_pricing_huggingface_extracts_from_model_info - HuggingFace custom pricing works
  • test_custom_pricing_unknown_provider_extracts_from_model_info - Any custom provider works
  • test_custom_pricing_partial_costs_in_model_info - Partial cost config works
  • All 36 tests in test_cost_calculator.py pass

Example

Before fix:

# response._hidden_params["response_cost"] == 0.0  # BUG

After fix:

# response._hidden_params["response_cost"] == 0.02  # (100 * 0.0001 + 50 * 0.0002)

Changed files

  • litellm/cost_calculator.py (modified, +28/-0)
  • tests/test_litellm/test_cost_calculator.py (modified, +208/-0)

Code Example

model_list:
  - model_name: <name>
    litellm_params:
      model: os.environ/HF_MODEL_NAME
      api_key: os.environ/HF_TOKEN
      api_base: os.environ/HF_API_BASE
      extra_headers:
        X-HF-Bill-To: os.environ/HF_BILL_TO
      additional_drop_params: ["max_retries"]
      output_cost_per_token: 3e-06
      input_cost_per_token: 1e-06

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL

litellm_settings:
  default_key_generate_params:
    max_budget: 0.1

---

HF_API_BASE=https://router.huggingface.co/v1
HF_MODEL_NAME=huggingface/moonshotai/Kimi-K2-Instruct-0905

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

I'm using the below config.yaml

model_list:
  - model_name: <name>
    litellm_params:
      model: os.environ/HF_MODEL_NAME
      api_key: os.environ/HF_TOKEN
      api_base: os.environ/HF_API_BASE
      extra_headers:
        X-HF-Bill-To: os.environ/HF_BILL_TO
      additional_drop_params: ["max_retries"]
      output_cost_per_token: 3e-06
      input_cost_per_token: 1e-06

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL

litellm_settings:
  default_key_generate_params:
    max_budget: 0.1

with below .env:

HF_API_BASE=https://router.huggingface.co/v1
HF_MODEL_NAME=huggingface/moonshotai/Kimi-K2-Instruct-0905

No matter what parameters I update, every request cost calculated is always 0 and in utils.py:5200~5300, I can see that if custom_provider is huggingface everything is set to 0.

Steps to Reproduce

  1. Use the config.yaml and env provided.
  2. Notice that spending is always 0.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

latest

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

The issue seems to be related to the cost calculation for Hugging Face models in LiteLLM. To fix this, we need to update the config.yaml file to correctly calculate the cost.

Steps to Fix

  • Update the config.yaml file to include the correct cost calculation parameters for Hugging Face models.
  • Add the cost_per_token parameter to the litellm_params section.
  • Remove the output_cost_per_token and input_cost_per_token parameters as they are not needed.

Example config.yaml update:

model_list:
  - model_name: <name>
    litellm_params:
      model: os.environ/HF_MODEL_NAME
      api_key: os.environ/HF_TOKEN
      api_base: os.environ/HF_API_BASE
      extra_headers:
        X-HF-Bill-To: os.environ/HF_BILL_TO
      additional_drop_params: ["max_retries"]
      cost_per_token: 0.000003  # update this value to the correct cost per token
  • Update the utils.py file to correctly calculate the cost based on the cost_per_token parameter.

Example utils.py update:

if custom_provider == "huggingface":
    cost = len(input_text) * cost_per_token
    # ... rest of the cost calculation code ...

Verification

To verify that the fix worked, you can test the cost calculation by sending a request with different input lengths and checking that the calculated cost is correct.

Extra Tips

  • Make sure to update the cost_per_token value to the correct cost per token for your Hugging Face model.
  • If you are using a different model or provider, you may need to update the cost calculation code in utils.py accordingly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Spending for Huggingface model is always $0 [2 pull requests, 1 participants]