litellm - ✅(Solved) Fix [Bug]: url_context tool falsely billed as web_search grounding request [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26253Fetched 2026-04-23 07:24:18
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Root Cause

litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py, response_object_includes_web_search_call() at line ~316:

if isinstance(response_object, ModelResponse):
    # chat completions only include url_citation annotations when a web search call is made
    has_url_citations = (
        StandardBuiltInToolCostTracking.response_includes_annotation_type(
            response_object=response_object, annotation_type="url_citation"
        )
    )
    if has_url_citations:
        return True    # ← false positive: url_context also emits url_citation

The comment "chat completions only include url_citation annotations when a web search call is made" was correct when this code was written, but is no longer true now that Gemini's url_context tool emits the same annotation type for per-claim grounding against user-specified URLs.

Fix Action

Fixed

PR fix notes

PR #26254: fix(cost): do not bill url_context as web_search grounding

Description (problem / solution / changelog)

Summary

Fixes url_context being falsely billed as web_search grounding.

Gemini's url_context tool emits url_citation annotations for per-claim grounding, but the detection logic in response_object_includes_web_search_call treated any url_citation annotation as evidence of a web_search call. This caused url_context requests to be charged the $0.035 grounding surcharge when Google's actual pricing for url_context is token-cost only (no per-request fee).

Per Google's url_context docs: "Charged as input tokens per model pricing." Neither the Gemini API pricing page nor the Vertex AI pricing page list a url_context surcharge; only Grounding with Google Search is billed per-request.

Fixes #26253

Changes

  • litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py:

    • New helper response_object_includes_url_context_call() that inspects the vertex_ai_url_context_metadata field already attached to ModelResponse by the Gemini adapter (vertex_and_google_ai_studio_gemini.py:2475-2479).
    • In response_object_includes_web_search_call, the annotation-based shortcut now skips when url_context metadata is present. Detection falls through to the structured usage.prompt_tokens_details.web_search_requests check, which correctly distinguishes actual web_search calls from url_context calls.
  • tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py:

    • test_url_context_does_not_trigger_web_search_cost — url_context + url_citation annotation + no web_search_requests → no grounding surcharge.
    • test_url_citation_without_url_context_still_triggers_web_search_cost — regression guard for #15858: pure-annotation responses (Anthropic-style) are still correctly detected.
    • test_response_object_includes_url_context_call — covers both storage paths (top-level attribute and _hidden_params).

Behavior table

Scenariourl_citation annotationvertex_ai_url_context_metadataweb_search_requestsBeforeAfter
web_search only> 0True ✓True ✓
url_context onlyNoneTrue ❌False ✓
both> 0True ✓True ✓ (via structured check)
neitherNoneFalse ✓False ✓

Test plan

  • Added 3 unit tests (see test_tool_call_cost_tracking.py)
  • Full file: uv run pytest tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py -v → 17/17 passing (3 new + 14 existing regression)
  • uv run black . — no changes required
  • Verified existing test_get_cost_for_vertex_ai_gemini_web_search continues to pass (no regression to #15858 fix)

Related

  • Depends on / complements #24369 — the hardcoded $0.035 rate is also incorrect for Gemini 3.x (should be $0.014 per current Gemini 3 pricing). That fix is being handled by PR #24448. This PR fixes the trigger (should not fire for url_context); #24448 fixes the rate (should be $0.014 not $0.035). Both are needed to correctly bill Gemini 3 grounded requests.
  • #15858 — earlier bug (grounding fee was missing entirely) which introduced the annotation-based shortcut this PR refines.

Changed files

  • litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py (modified, +31/-2)
  • tests/test_litellm/litellm_core_utils/llm_cost_calc/test_tool_call_cost_tracking.py (modified, +154/-0)

Code Example

import httpx

# Any Gemini Vertex model alias exposing url_context via the proxy
payload = {
    "model": "gemini-3-flash-preview-vertex",
    "messages": [{
        "role": "user",
        "content": (
            "Use the url_context tool to fetch "
            "https://pmc.ncbi.nlm.nih.gov/articles/PMC10018306/ "
            "and extract: 1) title, 2) publication date, 3) journal, "
            "4) a 2-sentence summary."
        ),
    }],
    "tools": [{"url_context": {}}],  # note: NOT web_search / googleSearch
}

r = httpx.post(
    "http://localhost:4000/v1/chat/completions",
    headers={"Authorization": "Bearer sk-..."},
    json=payload,
    timeout=180.0,
)
print("litellm cost:", r.headers.get("x-litellm-response-cost"))
print("vertex_ai_url_context_metadata populated:",
      bool(r.json().get("vertex_ai_url_context_metadata")))
print("vertex_ai_grounding_metadata populated:",
      bool(r.json().get("vertex_ai_grounding_metadata")))

---

if isinstance(response_object, ModelResponse):
    # chat completions only include url_citation annotations when a web search call is made
    has_url_citations = (
        StandardBuiltInToolCostTracking.response_includes_annotation_type(
            response_object=response_object, annotation_type="url_citation"
        )
    )
    if has_url_citations:
        return True    # ← false positive: url_context also emits url_citation

---

if isinstance(response_object, ModelResponse):
-    # chat completions only include url_citation annotations when a web search call is made
     has_url_citations = (
         StandardBuiltInToolCostTracking.response_includes_annotation_type(
             response_object=response_object, annotation_type="url_citation"
         )
     )
-    if has_url_citations:
+    # Gemini's url_context tool also emits url_citation annotations, so annotation
+    # presence alone is not sufficient for Gemini/Vertex. Skip this shortcut when
+    # url_context metadata is present and rely on usage.prompt_tokens_details.web_search_requests
+    # below to detect actual web_search calls.
+    if has_url_citations and not StandardBuiltInToolCostTracking.response_object_includes_url_context_call(response_object):
         return True
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched existing issues — #15858 and #24369 are related but do not cover this case.

What happened?

When using Gemini's url_context tool (distinct from Grounding with Google Search), LiteLLM incorrectly charges the $0.035 web-search grounding fee per request. Root cause: the detection logic in tool_call_cost_tracking.py treats any url_citation annotation in the response as evidence of a web_search call — but Gemini's url_context tool also emits url_citation annotations (for per-claim grounding against the fetched URL).

Per Google's pricing documentation:

So url_context requests are currently being over-billed in LiteLLM's cost ledger. On a 6-URL test probe, LiteLLM reported a total of $0.211 when Google's actual billing for pure url_context calls should be a few cents total (token cost only). This creates a systematic overcount ~1000× for url_context-heavy workloads.

Reproduction

import httpx

# Any Gemini Vertex model alias exposing url_context via the proxy
payload = {
    "model": "gemini-3-flash-preview-vertex",
    "messages": [{
        "role": "user",
        "content": (
            "Use the url_context tool to fetch "
            "https://pmc.ncbi.nlm.nih.gov/articles/PMC10018306/ "
            "and extract: 1) title, 2) publication date, 3) journal, "
            "4) a 2-sentence summary."
        ),
    }],
    "tools": [{"url_context": {}}],  # note: NOT web_search / googleSearch
}

r = httpx.post(
    "http://localhost:4000/v1/chat/completions",
    headers={"Authorization": "Bearer sk-..."},
    json=payload,
    timeout=180.0,
)
print("litellm cost:", r.headers.get("x-litellm-response-cost"))
print("vertex_ai_url_context_metadata populated:",
      bool(r.json().get("vertex_ai_url_context_metadata")))
print("vertex_ai_grounding_metadata populated:",
      bool(r.json().get("vertex_ai_grounding_metadata")))

Observed:

  • x-litellm-response-cost: 0.0353 (token cost + $0.035 grounding surcharge)
  • vertex_ai_url_context_metadata populated with URL_RETRIEVAL_STATUS_SUCCESS — proves url_context fired
  • vertex_ai_grounding_metadata populated with groundingChunks / groundingSupports — these populate when the model grounds an answer, regardless of which tool sourced the grounding
  • No vertex_ai_grounding_metadata[].webSearchQueries field — meaning web_search was not called
  • message.annotations contains url_citation entries → this is what trips the false positive in the cost calculator

Expected:

  • No grounding surcharge for url_context-only calls. Cost should equal prompt_tokens × input_rate + completion_tokens × output_rate per model tier.

Root cause

litellm/litellm_core_utils/llm_cost_calc/tool_call_cost_tracking.py, response_object_includes_web_search_call() at line ~316:

if isinstance(response_object, ModelResponse):
    # chat completions only include url_citation annotations when a web search call is made
    has_url_citations = (
        StandardBuiltInToolCostTracking.response_includes_annotation_type(
            response_object=response_object, annotation_type="url_citation"
        )
    )
    if has_url_citations:
        return True    # ← false positive: url_context also emits url_citation

The comment "chat completions only include url_citation annotations when a web search call is made" was correct when this code was written, but is no longer true now that Gemini's url_context tool emits the same annotation type for per-claim grounding against user-specified URLs.

Proposed fix

Use the vertex_ai_url_context_metadata field (already populated by LiteLLM's Gemini adapter at litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py:2475-2479) as a negative signal: if url_context metadata is present, the annotation-based shortcut should not fire; fall through to the structured web_search_requests check, which correctly distinguishes the two tools.

Diff sketch:

 if isinstance(response_object, ModelResponse):
-    # chat completions only include url_citation annotations when a web search call is made
     has_url_citations = (
         StandardBuiltInToolCostTracking.response_includes_annotation_type(
             response_object=response_object, annotation_type="url_citation"
         )
     )
-    if has_url_citations:
+    # Gemini's url_context tool also emits url_citation annotations, so annotation
+    # presence alone is not sufficient for Gemini/Vertex. Skip this shortcut when
+    # url_context metadata is present and rely on usage.prompt_tokens_details.web_search_requests
+    # below to detect actual web_search calls.
+    if has_url_citations and not StandardBuiltInToolCostTracking.response_object_includes_url_context_call(response_object):
         return True

…plus a small response_object_includes_url_context_call helper that reads vertex_ai_url_context_metadata off the response object (already attached by the Gemini adapter).

Will open a PR with the fix + regression tests shortly.

Related

  • #15858 — earlier bug that added the annotation-based detection in the first place (the fix introduced this false positive).
  • #24369 / PR #24448 — separate bug: hardcoded $0.035 rate should be $0.014 for Gemini 3.x per current pricing. These cover the rate; this issue covers the trigger.

What part of LiteLLM is this about?

Cost tracking / Vertex AI Gemini / url_context tool.

What LiteLLM version are you on?

main (reproduced on litellm/llms/vertex_ai/gemini/cost_calculator.py + tool_call_cost_tracking.py as of the latest commit on main at time of writing).

extent analysis

TL;DR

Update the tool_call_cost_tracking.py file to correctly distinguish between url_context and web_search calls by checking for the presence of vertex_ai_url_context_metadata.

Guidance

  • Modify the response_object_includes_web_search_call function to check for vertex_ai_url_context_metadata and only return True if url_citation annotations are present and vertex_ai_url_context_metadata is not.
  • Add a new helper function response_object_includes_url_context_call to check for the presence of vertex_ai_url_context_metadata.
  • Update the cost calculation logic to use the new helper function to determine if a web_search call was made.
  • Verify the fix by running the reproduction code and checking that the x-litellm-response-cost header no longer includes the $0.035 grounding surcharge for url_context-only calls.

Example

def response_object_includes_url_context_call(response_object):
    return hasattr(response_object, 'vertex_ai_url_context_metadata')

def response_object_includes_web_search_call(response_object):
    has_url_citations = StandardBuiltInToolCostTracking.response_includes_annotation_type(
        response_object=response_object, annotation_type="url_citation"
    )
    if has_url_citations and not response_object_includes_url_context_call(response_object):
        return True
    #... rest of the function remains the same

Notes

The proposed fix assumes that the presence of vertex_ai_url_context_metadata is a reliable indicator that the url_context tool was used. If this assumption is incorrect, additional logic may be needed to correctly distinguish between url_context and web_search calls.

Recommendation

Apply the proposed fix to update the cost tracking logic to correctly handle url_context calls. This fix should prevent the systematic overcounting of costs for url_context-heavy workloads.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING