litellm - 💡(How to fix) Fix [Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23309Fetched 2026-04-08 00:37:36
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2labeled ×2

Root Cause

The model_info custom pricing (input_cost_per_token, output_cost_per_token, cache_read_input_token_cost, cache_creation_input_token_cost) is only being used for cost calculation when the call type is acompletion (OpenAI format via /chat/completions).

When Claude Code sends requests via the Anthropic native /v1/messages endpoint (call type: anthropic_messages), the custom pricing from model_info is not applied, and the cost defaults to $0.00.

Code Example

model_list:
  - model_name: claude-sonnet-4-5
    litellm_params:
      model: azure_ai/sec-claude-sonnet-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20250929"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-sonnet-4-5-20250929
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000003
      output_cost_per_token: 0.000015
      cache_read_input_token_cost: 0.0000003
      cache_creation_input_token_cost: 0.00000375

  - model_name: claude-haiku-4-5
    litellm_params:
      model: azure_ai/sec-claude-haiku-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20251001"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-haiku-4-5-20251001
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.0000008
      output_cost_per_token: 0.000004
      cache_read_input_token_cost: 0.00000008
      cache_creation_input_token_cost: 0.000001

  - model_name: claude-opus-4-6
    litellm_params:
      model: azure_ai/sec-claude-opus-4-6
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "1"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-opus-4-6-20260101
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000005
      output_cost_per_token: 0.000025
      cache_read_input_token_cost: 0.0000005
      cache_creation_input_token_cost: 0.00000625

litellm_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  drop_params: true
  modify_params: true

general_settings:
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  store_prompts_in_spend_logs: true

---

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="Bearer <LITELLM_MASTER_KEY>"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ANTHROPIC_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_SONNET_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="claude-haiku-4-5"
export ANTHROPIC_DEFAULT_OPUS_MODEL="claude-opus-4-6"
RAW_BUFFERClick to expand / collapse

[Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider

What happened?

Custom pricing defined in model_info section of the proxy config is correctly applied for acompletion call type (via /chat/completions) but not applied for anthropic_messages call type (via /v1/messages), resulting in $0.00 spend tracking.

This affects users running Claude Code through LiteLLM proxy with azure_ai provider, since Claude Code uses the Anthropic native /v1/messages endpoint.

LiteLLM Version

v1.81.14 (Docker image: ghcr.io/berriai/litellm:main-latest)

Evidence

✅ Working: acompletion call type (Test Connection / Postman via /chat/completions)

  • Call Type: acompletion
  • Model: azure_ai/sec-claude-opus-4-6
  • Tokens: 42 (13 prompt + 29 completion)
  • Cost: $0.00079000 ✅ (correctly calculated)
  • Cost Breakdown:
    • Input Cost: $0.00006500 (13 prompt tokens)
    • Output Cost: $0.00072500 (29 completion tokens)

❌ Failing: anthropic_messages call type (Claude Code via /v1/messages)

  • Call Type: anthropic_messages
  • Model: azure_ai/sec-claude-sonnet-4-5
  • Tokens: 93,748 (93,333 prompt + 415 completion)
  • Cache Read Tokens: 93,185
  • Cache Creation Tokens: 143
  • Cost: $0.00000000 ❌ (should not be zero)
  • Cost Breakdown:
    • Input Cost: $0.00000000 (93,333 prompt tokens)
    • Output Cost: $0.00000000 (415 completion tokens)

Configuration

litellm_config.yaml

model_list:
  - model_name: claude-sonnet-4-5
    litellm_params:
      model: azure_ai/sec-claude-sonnet-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20250929"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-sonnet-4-5-20250929
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000003
      output_cost_per_token: 0.000015
      cache_read_input_token_cost: 0.0000003
      cache_creation_input_token_cost: 0.00000375

  - model_name: claude-haiku-4-5
    litellm_params:
      model: azure_ai/sec-claude-haiku-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20251001"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-haiku-4-5-20251001
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.0000008
      output_cost_per_token: 0.000004
      cache_read_input_token_cost: 0.00000008
      cache_creation_input_token_cost: 0.000001

  - model_name: claude-opus-4-6
    litellm_params:
      model: azure_ai/sec-claude-opus-4-6
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "1"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-opus-4-6-20260101
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000005
      output_cost_per_token: 0.000025
      cache_read_input_token_cost: 0.0000005
      cache_creation_input_token_cost: 0.00000625

litellm_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  drop_params: true
  modify_params: true

general_settings:
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  store_prompts_in_spend_logs: true

Claude Code environment variables

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="Bearer <LITELLM_MASTER_KEY>"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ANTHROPIC_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_SONNET_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="claude-haiku-4-5"
export ANTHROPIC_DEFAULT_OPUS_MODEL="claude-opus-4-6"

Steps to Reproduce

  1. Configure LiteLLM proxy with azure_ai provider and custom pricing in model_info
  2. Start the proxy with Docker
  3. Test via Postman (POST http://localhost:4000/chat/completions) → Cost is correctly calculated ✅
  4. Test via Claude Code (which uses /v1/messages endpoint internally) → Cost is $0.00

Root Cause Analysis

The model_info custom pricing (input_cost_per_token, output_cost_per_token, cache_read_input_token_cost, cache_creation_input_token_cost) is only being used for cost calculation when the call type is acompletion (OpenAI format via /chat/completions).

When Claude Code sends requests via the Anthropic native /v1/messages endpoint (call type: anthropic_messages), the custom pricing from model_info is not applied, and the cost defaults to $0.00.

What I've already tried

  • Setting base_model in model_info (e.g., base_model: anthropic/claude-sonnet-4-5-20250929)
  • Setting litellm_provider: azure_ai and mode: chat in model_info
  • Setting pricing in litellm_params instead of model_info
  • Setting modify_params: true in litellm_settings
  • Using ANTHROPIC_BASE_URL=http://localhost:4000 (without /anthropic)

None of these resolved the issue for the anthropic_messages call type.

Expected Behavior

Custom pricing defined in model_info should be applied consistently for all call types, including anthropic_messages (Anthropic pass-through via /v1/messages), not just acompletion.

Related Issues

  • #8874 - (fix) Anthropic pass through cost tracking (race condition fix, but did not address azure_ai provider)
  • #11789 - Anthropic cost calculations incorrect with streaming and prompt caching
  • #11975 - Custom Pricing in model_info Not Applied for Cost Tracking

Environment

  • LiteLLM Version: v1.81.14
  • Docker Image: ghcr.io/berriai/litellm:main-latest
  • Provider: Azure AI Foundry (azure_ai)
  • Client: Claude Code v2.1.71

extent analysis

Fix Plan

To apply custom pricing for anthropic_messages call type, update the litellm_proxy.py file to include the custom pricing logic for Anthropic native endpoint calls.

  1. Update the handle_anthropic_messages function:

def handle_anthropic_messages(self, request): # ... existing code ... model_info = self.get_model_info(request.model) if model_info: input_cost_per_token = model_info.get('input_cost_per_token', 0) output_cost_per_token = model_info.get('output_cost_per_token', 0) cache_read_input_token_cost = model_info.get('cache_read_input_token_cost', 0) cache_creation_input_token_cost = model_info.get('cache_creation_input_token_cost', 0)

    # Calculate cost based on custom pricing
    input_tokens = request.prompt_length
    output_tokens = response.length
    cache_read_tokens = request.cache_read_tokens
    cache_creation_tokens = request.cache_creation_tokens
    
    input_cost = input_tokens * input_cost_per_token
    output_cost = output_tokens * output_cost_per_token
    cache_read_cost = cache_read_tokens * cache_read_input_token_cost
    cache_creation_cost = cache_creation_tokens * cache_creation_input_token_cost
    
    total_cost = input_cost + output_cost + cache_read_cost + cache_creation_cost
    
    # Update the response with the calculated cost
    response.cost = total_cost
# ... existing code ...

2. **Add custom pricing to the `model_info` dictionary**:
   Ensure that the `model_info` dictionary contains the custom pricing keys:
   ```python
model_info = {
    'base_model': 'claude-sonnet-4-5-20250929',
    'litellm_provider': 'azure_ai',
    'mode': 'chat',
    'input_cost_per_token': 0.000003,
    'output_cost_per_token': 0.000015,
    'cache_read_input_token_cost': 0.0000003,
    'cache_creation_input_token_cost': 0.00000375
}

Verification

To verify the fix, test the anthropic_messages call type using Claude Code and check that the cost is correctly calculated based on the custom pricing defined in model_info.

  1. Restart the LiteLLM proxy with the updated code.
  2. Send a request to the /v1/messages endpoint using Claude Code.
  3. Check the response for the calculated cost.
  4. Verify that the cost matches the expected value based on the custom pricing.

Extra Tips

  • Ensure that the model_info dictionary is correctly populated with the custom pricing keys.
  • Verify that the handle_anthropic_messages function is correctly calculating the cost based on the custom pricing.
  • Test the fix with different models and call types to ensure that the custom pricing

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider [1 participants]