litellm - 💡(How to fix) Fix [Bug]: tool_choice silently dropped on cached Gemini requests — affects both gemini/ and vertex_ai/ providers

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Root cause

Fix Action

Fix / Workaround

check_and_create_cache correctly pops tools from optional_params (vertex_ai_context_caching.py:339) and bakes them into the CachedContent body (vertex_ai_context_caching.py:404). It does NOT do the equivalent for tool_choice. The value stays in optional_params, gets translated to toolConfig in _transform_request_body, and then the 1.84.0 #26077 guard at gemini/transformation.py:764-767 defensively omits toolConfig from the generate body (correctly — Vertex would 400 otherwise). Net effect: toolConfig ends up in neither the cache nor the generate request. This is the same architectural gap noted in #21969 for Bedrock — tools-related caching is incomplete on LiteLLM's cache-creation path. PR #26077 patches the wire-level symptom (the 400 collision) but not the underlying asymmetry.

Relevant code paths

  • litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:339 — pops tools from optional_params
  • litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:404 — bakes tools into cached_content_request_body
  • No equivalent two lines exist for tool_choice / toolConfig
  • litellm/llms/vertex_ai/gemini/transformation.py:764-767 — the #26077 guard that defensively drops toolConfig from the generate body when cached_content is set

Code Example

import litellm
litellm.modify_params = True   # engage the #26077 guard
MODEL = "gemini/gemini-2.5-flash"
LONG_SYSTEM = (
    "You are a deterministic test fixture. Always respond with one short sentence. "
    "Never volunteer tool calls unless explicitly asked. " * 200
)
TOOL = {
    "type": "function",
    "function": {
        "name": "log_test_event",
        "description": "Internal logging tool.",
        "parameters": {
            "type": "object",
            "properties": {"event_name": {"type": "string"}},
            "required": ["event_name"],
        },
    },
}
def run(label, cache_control, tool_choice):
    sys_block = {"type": "text", "text": LONG_SYSTEM}
    if cache_control:
        sys_block["cache_control"] = {"type": "ephemeral"}
    kwargs = {
        "model": MODEL,
        "messages": [
            {"role": "system", "content": [sys_block]},
            {"role": "user", "content": "Hi, how are you?"},
        ],
        "tools": [TOOL],
    }
    if tool_choice:
        kwargs["tool_choice"] = tool_choice
    resp = litellm.completion(**kwargs)
    msg = resp.choices[0].message
    print(f"{label}: tool_calls={len(msg.tool_calls or [])}, content={(msg.content or '')[:60]!r}")
run("C (no cache, required)", cache_control=False, tool_choice="required")
run("B (cache, no choice)",    cache_control=True,  tool_choice=None)
run("A (cache, required)",     cache_control=True,  tool_choice="required")

---

C (no cache, required): tool_calls=1, content=''
B (cache, no choice):   tool_calls=0, content='I am functioning correctly.'
A (cache, required):    tool_calls=0, content='I am functioning correctly.'BUG: tool_choice dropped

---

# vertex_ai_context_caching.py — three additions:
tools = optional_params.pop("tools", None)
tool_choice = optional_params.pop("tool_choice", None)               # NEW
generated_cache_key = local_cache_obj.get_cache_key(
    messages=cached_messages, tools=tools,
    tool_choice=tool_choice, model=model,                            # NEW: in key
)
cached_content_request_body["tools"] = tools
if tool_choice is not None:                                          # NEW
    cached_content_request_body["toolConfig"] = tool_choice

---

LiteLLM Version:  1.84.0 (also reproduces on 1.85.0, 1.86.0)
Python Version:   3.11
Providers:        gemini/  AND  vertex_ai/  (shared caching code path)
Model tested:     gemini/gemini-2.5-flash   (also: gemini-2.5-pro,
                  vertex_ai/gemini-3-flash-preview, vertex_ai/gemini-3.1-flash-lite-preview)
modify_params:    True

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When a Gemini request includes both:

  1. cache_control: {"type": "ephemeral"} markers on system content blocks (triggering LiteLLM's Context Cache integration), AND
  2. An explicit tool_choice value (e.g. "auto", "required", or a function-pin dict) …the user-specified tool_choice is silently dropped on every cached request. The request reaching Gemini is byte-for-byte identical to one sent without any tool_choice field. Empirical confirmation — three-condition comparison against gemini/gemini-2.5-flash on LiteLLM 1.84.0 with litellm.modify_params = True. Long system prompt to clear the 1024-token cache minimum, single function tool the model would not naturally invoke for a greeting, tool_choice="required" to force a tool call when set:
Conditioncache_controltool_choicetool_calls returnedusage.cached_tokens
C (control)off"required"1 (log_test_event(...)) ✓none
Bonunset0 (text: "I am functioning correctly.")6656 (cache hit)
A (bug)on"required"0 (text: "I am functioning correctly.") ✗6656 (cache hit)

A vs C is the smoking gun: identical tool_choice="required", but adding the cache_control marker silently drops the value. A and B return identical content, proving the request reaching Gemini is the same in both cases — tool_choice was never on the wire.

Root cause

check_and_create_cache correctly pops tools from optional_params (vertex_ai_context_caching.py:339) and bakes them into the CachedContent body (vertex_ai_context_caching.py:404). It does NOT do the equivalent for tool_choice. The value stays in optional_params, gets translated to toolConfig in _transform_request_body, and then the 1.84.0 #26077 guard at gemini/transformation.py:764-767 defensively omits toolConfig from the generate body (correctly — Vertex would 400 otherwise). Net effect: toolConfig ends up in neither the cache nor the generate request. This is the same architectural gap noted in #21969 for Bedrock — tools-related caching is incomplete on LiteLLM's cache-creation path. PR #26077 patches the wire-level symptom (the 400 collision) but not the underlying asymmetry.

Relevant code paths

  • litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:339 — pops tools from optional_params
  • litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:404 — bakes tools into cached_content_request_body
  • No equivalent two lines exist for tool_choice / toolConfig
  • litellm/llms/vertex_ai/gemini/transformation.py:764-767 — the #26077 guard that defensively drops toolConfig from the generate body when cached_content is set

Steps to Reproduce

Save as repro.py, then run with GEMINI_API_KEY=... python repro.py:

import litellm
litellm.modify_params = True   # engage the #26077 guard
MODEL = "gemini/gemini-2.5-flash"
LONG_SYSTEM = (
    "You are a deterministic test fixture. Always respond with one short sentence. "
    "Never volunteer tool calls unless explicitly asked. " * 200
)
TOOL = {
    "type": "function",
    "function": {
        "name": "log_test_event",
        "description": "Internal logging tool.",
        "parameters": {
            "type": "object",
            "properties": {"event_name": {"type": "string"}},
            "required": ["event_name"],
        },
    },
}
def run(label, cache_control, tool_choice):
    sys_block = {"type": "text", "text": LONG_SYSTEM}
    if cache_control:
        sys_block["cache_control"] = {"type": "ephemeral"}
    kwargs = {
        "model": MODEL,
        "messages": [
            {"role": "system", "content": [sys_block]},
            {"role": "user", "content": "Hi, how are you?"},
        ],
        "tools": [TOOL],
    }
    if tool_choice:
        kwargs["tool_choice"] = tool_choice
    resp = litellm.completion(**kwargs)
    msg = resp.choices[0].message
    print(f"{label}: tool_calls={len(msg.tool_calls or [])}, content={(msg.content or '')[:60]!r}")
run("C (no cache, required)", cache_control=False, tool_choice="required")
run("B (cache, no choice)",    cache_control=True,  tool_choice=None)
run("A (cache, required)",     cache_control=True,  tool_choice="required")

Expected output:

C (no cache, required): tool_calls=1, content=''
B (cache, no choice):   tool_calls=0, content='I am functioning correctly.'
A (cache, required):    tool_calls=0, content='I am functioning correctly.'   ← BUG: tool_choice dropped

To prove toolConfig survives without caching, replace tool_choice="required" with tool_choice={"type": "function", "function": {"name": "log_test_event"}} in condition C — Gemini will still call the pinned function. Re-run condition A with the same dict — Gemini ignores the pin and responds in text. Identical bug, different tool_choice shape, same evidence.

Suggested fix

Mirror the existing tools handling at the cache-creation step (~5 lines):

# vertex_ai_context_caching.py — three additions:
tools = optional_params.pop("tools", None)
tool_choice = optional_params.pop("tool_choice", None)               # NEW
generated_cache_key = local_cache_obj.get_cache_key(
    messages=cached_messages, tools=tools,
    tool_choice=tool_choice, model=model,                            # NEW: in key
)
cached_content_request_body["tools"] = tools
if tool_choice is not None:                                          # NEW
    cached_content_request_body["toolConfig"] = tool_choice

This puts toolConfig inside the CachedContent body at creation, so cache hits inherit it. The #26077 generate-side guard then continues to work as designed (omits these fields from the follow-up generate body, because they're already in the cache). Cache-key impact: different tool_choice values produce different cache entries, which is desired — pinning "required" vs "auto" is a semantically different cached prefix and shouldn't share storage.

Environment

LiteLLM Version:  1.84.0 (also reproduces on 1.85.0, 1.86.0)
Python Version:   3.11
Providers:        gemini/  AND  vertex_ai/  (shared caching code path)
Model tested:     gemini/gemini-2.5-flash   (also: gemini-2.5-pro,
                  vertex_ai/gemini-3-flash-preview, vertex_ai/gemini-3.1-flash-lite-preview)
modify_params:    True

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.84.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: tool_choice silently dropped on cached Gemini requests — affects both gemini/ and vertex_ai/ providers