litellm - ✅(Solved) Fix [Bug]: Spending for Huggingface model is always $0 [2 pull requests, 1 participants]

litellm2026-03-05 05:00:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#22863•Fetched 2026-04-08 00:39:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Gauravcg492

Participants

Gauravcg492

Timeline (top)

referenced ×3cross-referenced ×2labeled ×2

Fix Action

Fixed

Fixed by PR: fix(cost): respect custom cost settings for HuggingFace models (https://github.com/BerriAI/litellm/pull/22873)
Fixed by PR: fix: HuggingFace cost calculation now respects custom input_cost_per_token/output_cost_per_token config (https://github.com/BerriAI/litellm/pull/22912)

PR fix notes

PR #22873: fix(cost): respect custom cost settings for HuggingFace models

Repository: BerriAI/litellm
Author: weiguangli-io
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/22873

Description (problem / solution / changelog)

Summary

Fixes #22863

When custom_llm_provider is "huggingface", the _get_model_info_helper function in utils.py was unconditionally returning a hardcoded ModelInfoBase with input_cost_per_token=0 and output_cost_per_token=0. This caused all cost calculations for HuggingFace models to return $0, even when the user explicitly configured custom pricing via output_cost_per_token and input_cost_per_token in their config.yaml.

Root cause

In litellm/utils.py, the _get_model_info_helper function had a branch that checked if custom_llm_provider == "huggingface" and immediately returned zero-cost model info before checking litellm.model_cost for any user-registered custom pricing. When the Router processes user config, it registers the custom pricing under the model's ID in litellm.model_cost via litellm.register_model(), but this registered data was never consulted for HuggingFace models.

Fix

Added a guard condition and not _is_potential_model_name_in_model_cost(potential_model_names) to the HuggingFace branch, so the hardcoded zero-cost fallback is only used when the model is not already registered in litellm.model_cost. When custom pricing is registered (by the Router), the standard model_cost lookup path is followed instead, which correctly returns the user-defined costs.

This mirrors the existing pattern already used for ollama/ollama_chat providers on the very next line.

Test plan

Added test_huggingface_custom_cost_per_token — verifies that HuggingFace models with registered custom pricing return correct non-zero costs
Added test_huggingface_no_custom_cost_returns_zero — verifies that HuggingFace models without custom pricing still return $0 (backward compatibility)
All 35 existing tests in tests/test_litellm/test_cost_calculator.py pass
All existing test_get_model_info_huggingface_models tests pass

Changed files

litellm/utils.py (modified, +4/-1)
tests/test_litellm/test_cost_calculator.py (modified, +101/-17)

PR #22912: fix: HuggingFace cost calculation now respects custom input_cost_per_token/output_cost_per_token config

Repository: BerriAI/litellm
Author: doramirdor
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/22912

Description (problem / solution / changelog)

Summary

Fixes #22863

When using HuggingFace via router.huggingface.co with custom pricing (input_cost_per_token/output_cost_per_token in model config), cost tracking was always returning $0.

Root Cause

The completion_cost function wasn't extracting custom pricing from litellm_logging_obj.litellm_params.metadata.model_info when custom_pricing=True. For unknown providers (like HuggingFace), the code would fall back to the global model cost map, which doesn't contain custom deployments.

Fix

Added logic to extract custom_cost_per_token from model_info in the metadata when custom_pricing=True and no explicit custom_cost_per_token was passed.

Also fixed a subtle bug: replaced _input_cost or 0.0 with _input_cost if _input_cost is not None else 0.0 to correctly handle explicit zero costs (free input tokens).

Test Plan

test_custom_pricing_huggingface_extracts_from_model_info - HuggingFace custom pricing works
test_custom_pricing_unknown_provider_extracts_from_model_info - Any custom provider works
test_custom_pricing_partial_costs_in_model_info - Partial cost config works
All 36 tests in test_cost_calculator.py pass

Example

Before fix:

# response._hidden_params["response_cost"] == 0.0  # BUG

After fix:

# response._hidden_params["response_cost"] == 0.02  # (100 * 0.0001 + 50 * 0.0002)

Changed files

litellm/cost_calculator.py (modified, +28/-0)
tests/test_litellm/test_cost_calculator.py (modified, +208/-0)

Code Example

model_list:
  - model_name: <name>
    litellm_params:
      model: os.environ/HF_MODEL_NAME
      api_key: os.environ/HF_TOKEN
      api_base: os.environ/HF_API_BASE
      extra_headers:
        X-HF-Bill-To: os.environ/HF_BILL_TO
      additional_drop_params: ["max_retries"]
      output_cost_per_token: 3e-06
      input_cost_per_token: 1e-06

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL

litellm_settings:
  default_key_generate_params:
    max_budget: 0.1

---

HF_API_BASE=https://router.huggingface.co/v1
HF_MODEL_NAME=huggingface/moonshotai/Kimi-K2-Instruct-0905

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

I'm using the below config.yaml

model_list:
  - model_name: <name>
    litellm_params:
      model: os.environ/HF_MODEL_NAME
      api_key: os.environ/HF_TOKEN
      api_base: os.environ/HF_API_BASE
      extra_headers:
        X-HF-Bill-To: os.environ/HF_BILL_TO
      additional_drop_params: ["max_retries"]
      output_cost_per_token: 3e-06
      input_cost_per_token: 1e-06

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL

litellm_settings:
  default_key_generate_params:
    max_budget: 0.1

with below .env:

HF_API_BASE=https://router.huggingface.co/v1
HF_MODEL_NAME=huggingface/moonshotai/Kimi-K2-Instruct-0905

No matter what parameters I update, every request cost calculated is always 0 and in utils.py:5200~5300, I can see that if custom_provider is huggingface everything is set to 0.

Steps to Reproduce

Use the config.yaml and env provided.
Notice that spending is always 0.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

latest

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

The issue seems to be related to the cost calculation for Hugging Face models in LiteLLM. To fix this, we need to update the config.yaml file to correctly calculate the cost.

Steps to Fix

Update the config.yaml file to include the correct cost calculation parameters for Hugging Face models.
Add the cost_per_token parameter to the litellm_params section.
Remove the output_cost_per_token and input_cost_per_token parameters as they are not needed.

Example config.yaml update:

model_list:
  - model_name: <name>
    litellm_params:
      model: os.environ/HF_MODEL_NAME
      api_key: os.environ/HF_TOKEN
      api_base: os.environ/HF_API_BASE
      extra_headers:
        X-HF-Bill-To: os.environ/HF_BILL_TO
      additional_drop_params: ["max_retries"]
      cost_per_token: 0.000003  # update this value to the correct cost per token

Update the utils.py file to correctly calculate the cost based on the cost_per_token parameter.

Example utils.py update:

if custom_provider == "huggingface":
    cost = len(input_text) * cost_per_token
    # ... rest of the cost calculation code ...

Verification

To verify that the fix worked, you can test the cost calculation by sending a request with different input lengths and checking that the calculated cost is correct.

Extra Tips

Make sure to update the cost_per_token value to the correct cost per token for your Hugging Face model.
If you are using a different model or provider, you may need to update the cost calculation code in utils.py accordingly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Spending for Huggingface model is always $0 [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #22873: fix(cost): respect custom cost settings for HuggingFace models

Description (problem / solution / changelog)

Summary

Root cause

Fix

Test plan

Changed files

PR #22912: fix: HuggingFace cost calculation now respects custom input_cost_per_token/output_cost_per_token config

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Test Plan

Example

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING