litellm - 💡(How to fix) Fix [Bug]: tool_choice silently dropped on cached Gemini requests — affects both gemini/ and vertex_ai/ providers

litellm2026-05-27 23:02:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Root cause

Fix Action

Fix / Workaround

check_and_create_cache correctly pops tools from optional_params (vertex_ai_context_caching.py:339) and bakes them into the CachedContent body (vertex_ai_context_caching.py:404). It does NOT do the equivalent for tool_choice. The value stays in optional_params, gets translated to toolConfig in _transform_request_body, and then the 1.84.0 #26077 guard at gemini/transformation.py:764-767 defensively omits toolConfig from the generate body (correctly — Vertex would 400 otherwise). Net effect: toolConfig ends up in neither the cache nor the generate request. This is the same architectural gap noted in #21969 for Bedrock — tools-related caching is incomplete on LiteLLM's cache-creation path. PR #26077 patches the wire-level symptom (the 400 collision) but not the underlying asymmetry.

Relevant code paths

litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:339 — pops tools from optional_params
litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:404 — bakes tools into cached_content_request_body
No equivalent two lines exist for tool_choice / toolConfig
litellm/llms/vertex_ai/gemini/transformation.py:764-767 — the #26077 guard that defensively drops toolConfig from the generate body when cached_content is set

Code Example

import litellm
litellm.modify_params = True   # engage the #26077 guard
MODEL = "gemini/gemini-2.5-flash"
LONG_SYSTEM = (
    "You are a deterministic test fixture. Always respond with one short sentence. "
    "Never volunteer tool calls unless explicitly asked. " * 200
)
TOOL = {
    "type": "function",
    "function": {
        "name": "log_test_event",
        "description": "Internal logging tool.",
        "parameters": {
            "type": "object",
            "properties": {"event_name": {"type": "string"}},
            "required": ["event_name"],
        },
    },
}
def run(label, cache_control, tool_choice):
    sys_block = {"type": "text", "text": LONG_SYSTEM}
    if cache_control:
        sys_block["cache_control"] = {"type": "ephemeral"}
    kwargs = {
        "model": MODEL,
        "messages": [
            {"role": "system", "content": [sys_block]},
            {"role": "user", "content": "Hi, how are you?"},
        ],
        "tools": [TOOL],
    }
    if tool_choice:
        kwargs["tool_choice"] = tool_choice
    resp = litellm.completion(**kwargs)
    msg = resp.choices[0].message
    print(f"{label}: tool_calls={len(msg.tool_calls or [])}, content={(msg.content or '')[:60]!r}")
run("C (no cache, required)", cache_control=False, tool_choice="required")
run("B (cache, no choice)",    cache_control=True,  tool_choice=None)
run("A (cache, required)",     cache_control=True,  tool_choice="required")

---

C (no cache, required): tool_calls=1, content=''
B (cache, no choice):   tool_calls=0, content='I am functioning correctly.'
A (cache, required):    tool_calls=0, content='I am functioning correctly.'   ← BUG: tool_choice dropped

---

# vertex_ai_context_caching.py — three additions:
tools = optional_params.pop("tools", None)
tool_choice = optional_params.pop("tool_choice", None)               # NEW
generated_cache_key = local_cache_obj.get_cache_key(
    messages=cached_messages, tools=tools,
    tool_choice=tool_choice, model=model,                            # NEW: in key
)
cached_content_request_body["tools"] = tools
if tool_choice is not None:                                          # NEW
    cached_content_request_body["toolConfig"] = tool_choice

---

LiteLLM Version:  1.84.0 (also reproduces on 1.85.0, 1.86.0)
Python Version:   3.11
Providers:        gemini/  AND  vertex_ai/  (shared caching code path)
Model tested:     gemini/gemini-2.5-flash   (also: gemini-2.5-pro,
                  vertex_ai/gemini-3-flash-preview, vertex_ai/gemini-3.1-flash-lite-preview)
modify_params:    True

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When a Gemini request includes both:

cache_control: {"type": "ephemeral"} markers on system content blocks (triggering LiteLLM's Context Cache integration), AND
An explicit tool_choice value (e.g. "auto", "required", or a function-pin dict) …the user-specified tool_choice is silently dropped on every cached request. The request reaching Gemini is byte-for-byte identical to one sent without any tool_choice field. Empirical confirmation — three-condition comparison against gemini/gemini-2.5-flash on LiteLLM 1.84.0 with litellm.modify_params = True. Long system prompt to clear the 1024-token cache minimum, single function tool the model would not naturally invoke for a greeting, tool_choice="required" to force a tool call when set:

Condition	`cache_control`	`tool_choice`	`tool_calls` returned	`usage.cached_tokens`
C (control)	off	`"required"`	1 (`log_test_event(...)`) ✓	none
B	on	unset	0 (text: `"I am functioning correctly."`)	6656 (cache hit)
A (bug)	on	`"required"`	0 (text: `"I am functioning correctly."`) ✗	6656 (cache hit)

A vs C is the smoking gun: identical tool_choice="required", but adding the cache_control marker silently drops the value. A and B return identical content, proving the request reaching Gemini is the same in both cases — tool_choice was never on the wire.

Root cause

Relevant code paths

litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:339 — pops tools from optional_params
litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py:404 — bakes tools into cached_content_request_body
No equivalent two lines exist for tool_choice / toolConfig
litellm/llms/vertex_ai/gemini/transformation.py:764-767 — the #26077 guard that defensively drops toolConfig from the generate body when cached_content is set

Steps to Reproduce

Save as repro.py, then run with GEMINI_API_KEY=... python repro.py:

import litellm
litellm.modify_params = True   # engage the #26077 guard
MODEL = "gemini/gemini-2.5-flash"
LONG_SYSTEM = (
    "You are a deterministic test fixture. Always respond with one short sentence. "
    "Never volunteer tool calls unless explicitly asked. " * 200
)
TOOL = {
    "type": "function",
    "function": {
        "name": "log_test_event",
        "description": "Internal logging tool.",
        "parameters": {
            "type": "object",
            "properties": {"event_name": {"type": "string"}},
            "required": ["event_name"],
        },
    },
}
def run(label, cache_control, tool_choice):
    sys_block = {"type": "text", "text": LONG_SYSTEM}
    if cache_control:
        sys_block["cache_control"] = {"type": "ephemeral"}
    kwargs = {
        "model": MODEL,
        "messages": [
            {"role": "system", "content": [sys_block]},
            {"role": "user", "content": "Hi, how are you?"},
        ],
        "tools": [TOOL],
    }
    if tool_choice:
        kwargs["tool_choice"] = tool_choice
    resp = litellm.completion(**kwargs)
    msg = resp.choices[0].message
    print(f"{label}: tool_calls={len(msg.tool_calls or [])}, content={(msg.content or '')[:60]!r}")
run("C (no cache, required)", cache_control=False, tool_choice="required")
run("B (cache, no choice)",    cache_control=True,  tool_choice=None)
run("A (cache, required)",     cache_control=True,  tool_choice="required")

Expected output:

C (no cache, required): tool_calls=1, content=''
B (cache, no choice):   tool_calls=0, content='I am functioning correctly.'
A (cache, required):    tool_calls=0, content='I am functioning correctly.'   ← BUG: tool_choice dropped

To prove toolConfig survives without caching, replace tool_choice="required" with tool_choice={"type": "function", "function": {"name": "log_test_event"}} in condition C — Gemini will still call the pinned function. Re-run condition A with the same dict — Gemini ignores the pin and responds in text. Identical bug, different tool_choice shape, same evidence.

Suggested fix

Mirror the existing tools handling at the cache-creation step (~5 lines):

# vertex_ai_context_caching.py — three additions:
tools = optional_params.pop("tools", None)
tool_choice = optional_params.pop("tool_choice", None)               # NEW
generated_cache_key = local_cache_obj.get_cache_key(
    messages=cached_messages, tools=tools,
    tool_choice=tool_choice, model=model,                            # NEW: in key
)
cached_content_request_body["tools"] = tools
if tool_choice is not None:                                          # NEW
    cached_content_request_body["toolConfig"] = tool_choice

This puts toolConfig inside the CachedContent body at creation, so cache hits inherit it. The #26077 generate-side guard then continues to work as designed (omits these fields from the follow-up generate body, because they're already in the cache). Cache-key impact: different tool_choice values produce different cache entries, which is desired — pinning "required" vs "auto" is a semantically different cached prefix and shouldn't share storage.

Environment

LiteLLM Version:  1.84.0 (also reproduces on 1.85.0, 1.86.0)
Python Version:   3.11
Providers:        gemini/  AND  vertex_ai/  (shared caching code path)
Model tested:     gemini/gemini-2.5-flash   (also: gemini-2.5-pro,
                  vertex_ai/gemini-3-flash-preview, vertex_ai/gemini-3.1-flash-lite-preview)
modify_params:    True

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.84.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: tool_choice silently dropped on cached Gemini requests — affects both gemini/ and vertex_ai/ providers

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Relevant code paths

Code Example

Check for existing issues

What happened?

Relevant code paths

Steps to Reproduce

Suggested fix

Environment

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

TRENDING