litellm - 💡(How to fix) Fix [Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider [1 participants]

litellm2026-03-11 00:52:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23309•Fetched 2026-04-08 00:37:36

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Oviedo369

Participants

Oviedo369

Timeline (top)

cross-referenced ×2labeled ×2

Root Cause

The model_info custom pricing (input_cost_per_token, output_cost_per_token, cache_read_input_token_cost, cache_creation_input_token_cost) is only being used for cost calculation when the call type is acompletion (OpenAI format via /chat/completions).

When Claude Code sends requests via the Anthropic native /v1/messages endpoint (call type: anthropic_messages), the custom pricing from model_info is not applied, and the cost defaults to $0.00.

Code Example

model_list:
  - model_name: claude-sonnet-4-5
    litellm_params:
      model: azure_ai/sec-claude-sonnet-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20250929"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-sonnet-4-5-20250929
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000003
      output_cost_per_token: 0.000015
      cache_read_input_token_cost: 0.0000003
      cache_creation_input_token_cost: 0.00000375

  - model_name: claude-haiku-4-5
    litellm_params:
      model: azure_ai/sec-claude-haiku-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20251001"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-haiku-4-5-20251001
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.0000008
      output_cost_per_token: 0.000004
      cache_read_input_token_cost: 0.00000008
      cache_creation_input_token_cost: 0.000001

  - model_name: claude-opus-4-6
    litellm_params:
      model: azure_ai/sec-claude-opus-4-6
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "1"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-opus-4-6-20260101
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000005
      output_cost_per_token: 0.000025
      cache_read_input_token_cost: 0.0000005
      cache_creation_input_token_cost: 0.00000625

litellm_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  drop_params: true
  modify_params: true

general_settings:
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  store_prompts_in_spend_logs: true

---

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="Bearer <LITELLM_MASTER_KEY>"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ANTHROPIC_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_SONNET_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="claude-haiku-4-5"
export ANTHROPIC_DEFAULT_OPUS_MODEL="claude-opus-4-6"

RAW_BUFFERClick to expand / collapse

[Bug]: Custom pricing in `model_info` not applied for `anthropic_messages` call type with `azure_ai` provider

What happened?

Custom pricing defined in model_info section of the proxy config is correctly applied for acompletion call type (via /chat/completions) but not applied for anthropic_messages call type (via /v1/messages), resulting in $0.00 spend tracking.

This affects users running Claude Code through LiteLLM proxy with azure_ai provider, since Claude Code uses the Anthropic native /v1/messages endpoint.

LiteLLM Version

v1.81.14 (Docker image: ghcr.io/berriai/litellm:main-latest)

Evidence

✅ Working: `acompletion` call type (Test Connection / Postman via `/chat/completions`)

Call Type: acompletion
Model: azure_ai/sec-claude-opus-4-6
Tokens: 42 (13 prompt + 29 completion)
Cost: $0.00079000 ✅ (correctly calculated)
Cost Breakdown:
- Input Cost: $0.00006500 (13 prompt tokens)
- Output Cost: $0.00072500 (29 completion tokens)

❌ Failing: `anthropic_messages` call type (Claude Code via `/v1/messages`)

Call Type: anthropic_messages
Model: azure_ai/sec-claude-sonnet-4-5
Tokens: 93,748 (93,333 prompt + 415 completion)
Cache Read Tokens: 93,185
Cache Creation Tokens: 143
Cost: $0.00000000 ❌ (should not be zero)
Cost Breakdown:
- Input Cost: $0.00000000 (93,333 prompt tokens)
- Output Cost: $0.00000000 (415 completion tokens)

Configuration

`litellm_config.yaml`

model_list:
  - model_name: claude-sonnet-4-5
    litellm_params:
      model: azure_ai/sec-claude-sonnet-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20250929"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-sonnet-4-5-20250929
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000003
      output_cost_per_token: 0.000015
      cache_read_input_token_cost: 0.0000003
      cache_creation_input_token_cost: 0.00000375

  - model_name: claude-haiku-4-5
    litellm_params:
      model: azure_ai/sec-claude-haiku-4-5
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "20251001"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-haiku-4-5-20251001
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.0000008
      output_cost_per_token: 0.000004
      cache_read_input_token_cost: 0.00000008
      cache_creation_input_token_cost: 0.000001

  - model_name: claude-opus-4-6
    litellm_params:
      model: azure_ai/sec-claude-opus-4-6
      api_base: https://my-resource.services.ai.azure.com
      api_key: os.environ/AZURE_FOUNDRY_API_KEY
      api_version: "2024-05-01-preview"
      model_version: "1"
      timeout: 300
      connect_timeout: 30
      drop_params: true
    model_info:
      base_model: claude-opus-4-6-20260101
      litellm_provider: azure_ai
      mode: chat
      input_cost_per_token: 0.000005
      output_cost_per_token: 0.000025
      cache_read_input_token_cost: 0.0000005
      cache_creation_input_token_cost: 0.00000625

litellm_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  drop_params: true
  modify_params: true

general_settings:
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  store_prompts_in_spend_logs: true

Claude Code environment variables

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="Bearer <LITELLM_MASTER_KEY>"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
export ANTHROPIC_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_SONNET_MODEL="claude-sonnet-4-5"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="claude-haiku-4-5"
export ANTHROPIC_DEFAULT_OPUS_MODEL="claude-opus-4-6"

Steps to Reproduce

Configure LiteLLM proxy with azure_ai provider and custom pricing in model_info
Start the proxy with Docker
Test via Postman (POST http://localhost:4000/chat/completions) → Cost is correctly calculated ✅
Test via Claude Code (which uses /v1/messages endpoint internally) → Cost is $0.00 ❌

Root Cause Analysis

What I've already tried

Setting base_model in model_info (e.g., base_model: anthropic/claude-sonnet-4-5-20250929)
Setting litellm_provider: azure_ai and mode: chat in model_info
Setting pricing in litellm_params instead of model_info
Setting modify_params: true in litellm_settings
Using ANTHROPIC_BASE_URL=http://localhost:4000 (without /anthropic)

None of these resolved the issue for the anthropic_messages call type.

Expected Behavior

Custom pricing defined in model_info should be applied consistently for all call types, including anthropic_messages (Anthropic pass-through via /v1/messages), not just acompletion.

Related Issues

#8874 - (fix) Anthropic pass through cost tracking (race condition fix, but did not address azure_ai provider)
#11789 - Anthropic cost calculations incorrect with streaming and prompt caching
#11975 - Custom Pricing in model_info Not Applied for Cost Tracking

Environment

LiteLLM Version: v1.81.14
Docker Image: ghcr.io/berriai/litellm:main-latest
Provider: Azure AI Foundry (azure_ai)
Client: Claude Code v2.1.71

extent analysis

Fix Plan

To apply custom pricing for anthropic_messages call type, update the litellm_proxy.py file to include the custom pricing logic for Anthropic native endpoint calls.

Update the handle_anthropic_messages function:

def handle_anthropic_messages(self, request): # ... existing code ... model_info = self.get_model_info(request.model) if model_info: input_cost_per_token = model_info.get('input_cost_per_token', 0) output_cost_per_token = model_info.get('output_cost_per_token', 0) cache_read_input_token_cost = model_info.get('cache_read_input_token_cost', 0) cache_creation_input_token_cost = model_info.get('cache_creation_input_token_cost', 0)

    # Calculate cost based on custom pricing
    input_tokens = request.prompt_length
    output_tokens = response.length
    cache_read_tokens = request.cache_read_tokens
    cache_creation_tokens = request.cache_creation_tokens
    
    input_cost = input_tokens * input_cost_per_token
    output_cost = output_tokens * output_cost_per_token
    cache_read_cost = cache_read_tokens * cache_read_input_token_cost
    cache_creation_cost = cache_creation_tokens * cache_creation_input_token_cost
    
    total_cost = input_cost + output_cost + cache_read_cost + cache_creation_cost
    
    # Update the response with the calculated cost
    response.cost = total_cost
# ... existing code ...


2. **Add custom pricing to the `model_info` dictionary**:
   Ensure that the `model_info` dictionary contains the custom pricing keys:
   ```python
model_info = {
    'base_model': 'claude-sonnet-4-5-20250929',
    'litellm_provider': 'azure_ai',
    'mode': 'chat',
    'input_cost_per_token': 0.000003,
    'output_cost_per_token': 0.000015,
    'cache_read_input_token_cost': 0.0000003,
    'cache_creation_input_token_cost': 0.00000375
}

Verification

To verify the fix, test the anthropic_messages call type using Claude Code and check that the cost is correctly calculated based on the custom pricing defined in model_info.

Restart the LiteLLM proxy with the updated code.
Send a request to the /v1/messages endpoint using Claude Code.
Check the response for the calculated cost.
Verify that the cost matches the expected value based on the custom pricing.

Extra Tips

Ensure that the model_info dictionary is correctly populated with the custom pricing keys.
Verify that the handle_anthropic_messages function is correctly calculating the cost based on the custom pricing.
Test the fix with different models and call types to ensure that the custom pricing

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

[Bug]: Custom pricing in `model_info` not applied for `anthropic_messages` call type with `azure_ai` provider

What happened?

LiteLLM Version

Evidence

✅ Working: `acompletion` call type (Test Connection / Postman via `/chat/completions`)

❌ Failing: `anthropic_messages` call type (Claude Code via `/v1/messages`)

Configuration

`litellm_config.yaml`

Claude Code environment variables

Steps to Reproduce

Root Cause Analysis

What I've already tried

Expected Behavior

Related Issues

Environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

[Bug]: Custom pricing in model_info not applied for anthropic_messages call type with azure_ai provider

What happened?

LiteLLM Version

Evidence

✅ Working: acompletion call type (Test Connection / Postman via /chat/completions)

❌ Failing: anthropic_messages call type (Claude Code via /v1/messages)

Configuration

litellm_config.yaml

Claude Code environment variables

Steps to Reproduce

Root Cause Analysis

What I've already tried

Expected Behavior

Related Issues

Environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

[Bug]: Custom pricing in `model_info` not applied for `anthropic_messages` call type with `azure_ai` provider

✅ Working: `acompletion` call type (Test Connection / Postman via `/chat/completions`)

❌ Failing: `anthropic_messages` call type (Claude Code via `/v1/messages`)

`litellm_config.yaml`