litellm - ✅(Solved) Fix [Bug]: url_context tool falsely billed as web_search grounding request [1 pull requests, 1 participants]

junan-n1 · 2026-04-22T14:59:37Z

[litellm] PR 26254: fix cost : do not bill url context as web search grounding - Repository: BerriAI/litellm - Author: junan-n1 - State: open | merged: False -… # PR #26254: fix(cost): do not bill url_context as web_search grounding - Repository: BerriAI/litellm - Author: junan-n1 - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/26254 ## Description (problem / solution / changelog) ## Summary Fixes url_context being falsely billed as web_search grounding. Gemini's `url_context` tool emits `url_citation` annotations for per-claim grounding, but the detection logic in `response_object_includes_web_search_call` treated any `url_citation` annotation as evidence of a web_search call. This caused url_context requests to be charged the $0.035 grounding surcharge when Google's actual pricing for url_context is **token-cost only** (no per-request fee). Per [Google's url_context docs](https://ai.google.dev/gemini-api/docs/url-context): "Charged as input tokens per model pricing." Neither the [Gemini API pricing page](https://ai.google.dev/gemini-api/docs/pricing) nor the [Vertex AI pricing page](https://cloud.google.com/vertex-ai/generative-ai/pricing) list a url_context surcharge; only Grounding with Google Search is billed per-request. Fixes #26253 ## Changes - **`litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py`**: - New helper `response_object_includes_url_context_call()` that inspects the `vertex_ai_url_context_metadata` field already attached to `ModelResponse` by the Gemini adapter (`vertex_and_google_ai_studio_gemini.py:2475-2479`). - In `response_object_includes_web_search_call`, the annotation-based shortcut now skips when url_context metadata is present. Detection falls through to the structured `usage.prompt_tokens_details.web_search_requests` check, which correctly distinguishes actual web_search calls from url_context calls. - **`tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py`**: - `test_url_context_does_not_trigger_web_search_cost` — url_context + `url_citation` annotation + no `web_search_requests` → no grounding surcharge. - `test_url_citation_without_url_context_still_triggers_web_search_cost` — regression guard for #15858: pure-annotation responses (Anthropic-style) are still correctly detected. - `test_response_object_includes_url_context_call` — covers both storage paths (top-level attribute and `_hidden_params`). ## Behavior table | Scenario | `url_citation` annotation | `vertex_ai_url_context_metadata` | `web_search_requests` | Before | After | |---|---|---|---|---|---| | web_search only | ✓ | — | > 0 | True ✓ | True ✓ | | url_context only | ✓ | ✓ | None | True ❌ | **False ✓** | | both | ✓ | ✓ | > 0 | True ✓ | True ✓ (via structured check) | | neither | — | — | None | False ✓ | False ✓ | ## Test plan - [x] Added 3 unit tests (see `test_tool_call_cost_tracking.py`) - [x] Full file: `uv run pytest tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py -v` → 17/17 passing (3 new + 14 existing regression) - [x] `uv run black .` — no changes required - [x] Verified existing `test_get_cost_for_vertex_ai_gemini_web_search` continues to pass (no regression to #15858 fix) ## Related - Depends on / complements #24369 — the hardcoded `$0.035` rate is also incorrect for Gemini 3.x (should be `$0.014` per [current Gemini 3 pricing](https://ai.google.dev/gemini-api/docs/pricing)). That fix is being handled by PR #24448. This PR fixes the **trigger** (should not fire for url_context); #24448 fixes the **rate** (should be $0.014 not $0.035). Both are needed to correctly bill Gemini 3 grounded requests. - #15858 — earlier bug (grounding fee was missing entirely) which introduced the annotation-based shortcut this PR refines. ## Changed files - `litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py` (modified, +31/-2) - `tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py` (modified, +154/-0) ## Fixed - Fixed by PR: fix(cost): do not bill url_context as web_search grounding (https://github.com/BerriAI/litellm/pull/26254) ### Check for existing issues - [x] I have searched existing issues — #15858 and #24369 are related but do not cover this case. ### What happened? When using Gemini's `url_context` tool (distinct from Grounding with Google Search), LiteLLM incorrectly charges the `$0.035` web-search grounding fee per request. Root cause: the detection logic in `tool_call_cost_tracking.py` treats any `url_citation` annotation in the response as evidence of a web_search call — but Gemini's `url_context` tool *also* emits `url_citation` annotations (for per-claim grounding against the fetched URL). Per Google's pricing documentation: - `url_context` has **no per-request surcharge**; it is "Charged as input tokens per model pricing." Sources: - https://ai.google.dev/gemini-api/docs/url-context ("URL Context Tool … Charged as input tokens per model pricing") -

litellm2026-04-22 14:59:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26253•Fetched 2026-04-23 07:24:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

junan-n1

Participants

junan-n1

Timeline (top)

cross-referenced ×1labeled ×1

Root Cause

litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py, response_object_includes_web_search_call() at line ~316:

if isinstance(response_object, ModelResponse):
    # chat completions only include url_citation annotations when a web search call is made
    has_url_citations = (
        StandardBuiltInToolCostTracking.response_includes_annotation_type(
            response_object=response_object, annotation_type="url_citation"
        )
    )
    if has_url_citations:
        return True    # ← false positive: url_context also emits url_citation

The comment "chat completions only include url_citation annotations when a web search call is made" was correct when this code was written, but is no longer true now that Gemini's url_context tool emits the same annotation type for per-claim grounding against user-specified URLs.

Fix Action

Fixed

Fixed by PR: fix(cost): do not bill url_context as web_search grounding (https://github.com/BerriAI/litellm/pull/26254)

PR fix notes

PR #26254: fix(cost): do not bill url_context as web_search grounding

Repository: BerriAI/litellm
Author: junan-n1
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26254

Description (problem / solution / changelog)

Summary

Fixes url_context being falsely billed as web_search grounding.

Gemini's url_context tool emits url_citation annotations for per-claim grounding, but the detection logic in response_object_includes_web_search_call treated any url_citation annotation as evidence of a web_search call. This caused url_context requests to be charged the $0.035 grounding surcharge when Google's actual pricing for url_context is token-cost only (no per-request fee).

Per Google's url_context docs: "Charged as input tokens per model pricing." Neither the Gemini API pricing page nor the Vertex AI pricing page list a url_context surcharge; only Grounding with Google Search is billed per-request.

Fixes #26253

Changes

litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py:
- New helper response_object_includes_url_context_call() that inspects the vertex_ai_url_context_metadata field already attached to ModelResponse by the Gemini adapter (vertex_and_google_ai_studio_gemini.py:2475-2479).
- In response_object_includes_web_search_call, the annotation-based shortcut now skips when url_context metadata is present. Detection falls through to the structured usage.prompt_tokens_details.web_search_requests check, which correctly distinguishes actual web_search calls from url_context calls.
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py:
- test_url_context_does_not_trigger_web_search_cost — url_context + url_citation annotation + no web_search_requests → no grounding surcharge.
- test_url_citation_without_url_context_still_triggers_web_search_cost — regression guard for #15858: pure-annotation responses (Anthropic-style) are still correctly detected.
- test_response_object_includes_url_context_call — covers both storage paths (top-level attribute and _hidden_params).

Behavior table

Scenario	`url_citation` annotation	`vertex_ai_url_context_metadata`	`web_search_requests`	Before	After
web_search only	✓	—	> 0	True ✓	True ✓
url_context only	✓	✓	None	True ❌	False ✓
both	✓	✓	> 0	True ✓	True ✓ (via structured check)
neither	—	—	None	False ✓	False ✓

Test plan

Added 3 unit tests (see test_tool_call_cost_tracking.py)
Full file: uv run pytest tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py -v → 17/17 passing (3 new + 14 existing regression)
uv run black . — no changes required
Verified existing test_get_cost_for_vertex_ai_gemini_web_search continues to pass (no regression to #15858 fix)

Depends on / complements #24369 — the hardcoded $0.035 rate is also incorrect for Gemini 3.x (should be $0.014 per current Gemini 3 pricing). That fix is being handled by PR #24448. This PR fixes the trigger (should not fire for url_context); #24448 fixes the rate (should be $0.014 not $0.035). Both are needed to correctly bill Gemini 3 grounded requests.
#15858 — earlier bug (grounding fee was missing entirely) which introduced the annotation-based shortcut this PR refines.

Changed files

litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py (modified, +31/-2)
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py (modified, +154/-0)

Code Example

import httpx

# Any Gemini Vertex model alias exposing url_context via the proxy
payload = {
    "model": "gemini-3-flash-preview-vertex",
    "messages": [{
        "role": "user",
        "content": (
            "Use the url_context tool to fetch "
            "https://pmc.ncbi.nlm.nih.gov/articles/PMC10018306/ "
            "and extract: 1) title, 2) publication date, 3) journal, "
            "4) a 2-sentence summary."
        ),
    }],
    "tools": [{"url_context": {}}],  # note: NOT web_search / googleSearch
}

r = httpx.post(
    "http://localhost:4000/v1/chat/completions",
    headers={"Authorization": "Bearer sk-..."},
    json=payload,
    timeout=180.0,
)
print("litellm cost:", r.headers.get("x-litellm-response-cost"))
print("vertex_ai_url_context_metadata populated:",
      bool(r.json().get("vertex_ai_url_context_metadata")))
print("vertex_ai_grounding_metadata populated:",
      bool(r.json().get("vertex_ai_grounding_metadata")))

---

if isinstance(response_object, ModelResponse):
    # chat completions only include url_citation annotations when a web search call is made
    has_url_citations = (
        StandardBuiltInToolCostTracking.response_includes_annotation_type(
            response_object=response_object, annotation_type="url_citation"
        )
    )
    if has_url_citations:
        return True    # ← false positive: url_context also emits url_citation

---

if isinstance(response_object, ModelResponse):
-    # chat completions only include url_citation annotations when a web search call is made
     has_url_citations = (
         StandardBuiltInToolCostTracking.response_includes_annotation_type(
             response_object=response_object, annotation_type="url_citation"
         )
     )
-    if has_url_citations:
+    # Gemini's url_context tool also emits url_citation annotations, so annotation
+    # presence alone is not sufficient for Gemini/Vertex. Skip this shortcut when
+    # url_context metadata is present and rely on usage.prompt_tokens_details.web_search_requests
+    # below to detect actual web_search calls.
+    if has_url_citations and not StandardBuiltInToolCostTracking.response_object_includes_url_context_call(response_object):
         return True

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched existing issues — #15858 and #24369 are related but do not cover this case.

What happened?

When using Gemini's url_context tool (distinct from Grounding with Google Search), LiteLLM incorrectly charges the $0.035 web-search grounding fee per request. Root cause: the detection logic in tool_call_cost_tracking.py treats any url_citation annotation in the response as evidence of a web_search call — but Gemini's url_context tool also emits url_citation annotations (for per-claim grounding against the fetched URL).

Per Google's pricing documentation:

url_context has no per-request surcharge; it is "Charged as input tokens per model pricing." Sources:
- https://ai.google.dev/gemini-api/docs/url-context ("URL Context Tool … Charged as input tokens per model pricing")
- https://ai.google.dev/gemini-api/docs/pricing (no url_context line item)
- https://cloud.google.com/vertex-ai/generative-ai/pricing (grounding options listed are Google Search, Maps, Your Data — url_context is not a listed grounding product)
Grounding with Google Search on Gemini 3.x is $14/1,000 queries, with 5,000 free/month. (Addressed separately by #24369 / PR #24448.)

So url_context requests are currently being over-billed in LiteLLM's cost ledger. On a 6-URL test probe, LiteLLM reported a total of $0.211 when Google's actual billing for pure url_context calls should be a few cents total (token cost only). This creates a systematic overcount ~1000× for url_context-heavy workloads.

Reproduction

import httpx

# Any Gemini Vertex model alias exposing url_context via the proxy
payload = {
    "model": "gemini-3-flash-preview-vertex",
    "messages": [{
        "role": "user",
        "content": (
            "Use the url_context tool to fetch "
            "https://pmc.ncbi.nlm.nih.gov/articles/PMC10018306/ "
            "and extract: 1) title, 2) publication date, 3) journal, "
            "4) a 2-sentence summary."
        ),
    }],
    "tools": [{"url_context": {}}],  # note: NOT web_search / googleSearch
}

r = httpx.post(
    "http://localhost:4000/v1/chat/completions",
    headers={"Authorization": "Bearer sk-..."},
    json=payload,
    timeout=180.0,
)
print("litellm cost:", r.headers.get("x-litellm-response-cost"))
print("vertex_ai_url_context_metadata populated:",
      bool(r.json().get("vertex_ai_url_context_metadata")))
print("vertex_ai_grounding_metadata populated:",
      bool(r.json().get("vertex_ai_grounding_metadata")))

Observed:

x-litellm-response-cost: 0.0353 (token cost + $0.035 grounding surcharge)
vertex_ai_url_context_metadata populated with URL_RETRIEVAL_STATUS_SUCCESS — proves url_context fired
vertex_ai_grounding_metadata populated with groundingChunks / groundingSupports — these populate when the model grounds an answer, regardless of which tool sourced the grounding
No vertex_ai_grounding_metadata[].webSearchQueries field — meaning web_search was not called
message.annotations contains url_citation entries → this is what trips the false positive in the cost calculator

Expected:

No grounding surcharge for url_context-only calls. Cost should equal prompt_tokens × input_rate + completion_tokens × output_rate per model tier.

Root cause

litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py, response_object_includes_web_search_call() at line ~316:

if isinstance(response_object, ModelResponse):
    # chat completions only include url_citation annotations when a web search call is made
    has_url_citations = (
        StandardBuiltInToolCostTracking.response_includes_annotation_type(
            response_object=response_object, annotation_type="url_citation"
        )
    )
    if has_url_citations:
        return True    # ← false positive: url_context also emits url_citation

Proposed fix

Use the vertex_ai_url_context_metadata field (already populated by LiteLLM's Gemini adapter at litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py:2475-2479) as a negative signal: if url_context metadata is present, the annotation-based shortcut should not fire; fall through to the structured web_search_requests check, which correctly distinguishes the two tools.

Diff sketch:

 if isinstance(response_object, ModelResponse):
-    # chat completions only include url_citation annotations when a web search call is made
     has_url_citations = (
         StandardBuiltInToolCostTracking.response_includes_annotation_type(
             response_object=response_object, annotation_type="url_citation"
         )
     )
-    if has_url_citations:
+    # Gemini's url_context tool also emits url_citation annotations, so annotation
+    # presence alone is not sufficient for Gemini/Vertex. Skip this shortcut when
+    # url_context metadata is present and rely on usage.prompt_tokens_details.web_search_requests
+    # below to detect actual web_search calls.
+    if has_url_citations and not StandardBuiltInToolCostTracking.response_object_includes_url_context_call(response_object):
         return True

…plus a small response_object_includes_url_context_call helper that reads vertex_ai_url_context_metadata off the response object (already attached by the Gemini adapter).

Will open a PR with the fix + regression tests shortly.

#15858 — earlier bug that added the annotation-based detection in the first place (the fix introduced this false positive).
#24369 / PR #24448 — separate bug: hardcoded $0.035 rate should be $0.014 for Gemini 3.x per current pricing. These cover the rate; this issue covers the trigger.

What part of LiteLLM is this about?

Cost tracking / Vertex AI Gemini / url_context tool.

What LiteLLM version are you on?

main (reproduced on litellm/llms/vertex_ai/gemini/cost_calculator.py + tool_call_cost_tracking.py as of the latest commit on main at time of writing).

extent analysis

TL;DR

Update the tool_call_cost_tracking.py file to correctly distinguish between url_context and web_search calls by checking for the presence of vertex_ai_url_context_metadata.

Guidance

Modify the response_object_includes_web_search_call function to check for vertex_ai_url_context_metadata and only return True if url_citation annotations are present and vertex_ai_url_context_metadata is not.
Add a new helper function response_object_includes_url_context_call to check for the presence of vertex_ai_url_context_metadata.
Update the cost calculation logic to use the new helper function to determine if a web_search call was made.
Verify the fix by running the reproduction code and checking that the x-litellm-response-cost header no longer includes the $0.035 grounding surcharge for url_context-only calls.

Example

def response_object_includes_url_context_call(response_object):
    return hasattr(response_object, 'vertex_ai_url_context_metadata')

def response_object_includes_web_search_call(response_object):
    has_url_citations = StandardBuiltInToolCostTracking.response_includes_annotation_type(
        response_object=response_object, annotation_type="url_citation"
    )
    if has_url_citations and not response_object_includes_url_context_call(response_object):
        return True
    #... rest of the function remains the same

Notes

The proposed fix assumes that the presence of vertex_ai_url_context_metadata is a reliable indicator that the url_context tool was used. If this assumption is incorrect, additional logic may be needed to correctly distinguish between url_context and web_search calls.

Recommendation

Apply the proposed fix to update the cost tracking logic to correctly handle url_context calls. This fix should prevent the systematic overcounting of costs for url_context-heavy workloads.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #device allocation #model download #tokenizer error #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.