litellm - ✅(Solved) Fix [Bug]: Vertex AI sends cached_content + system_instruction/tools/toolConfig in same request → 400 INVALID_ARGUMENT (refile of #17304) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26014Fetched 2026-04-19 15:06:18
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Error Message

Vertex_aiException BadRequestError - { "error": { "code": 400, "message": "Tool config, tools and system instruction should not be set in the request when using cached content.", "status": "INVALID_ARGUMENT" } }

Root Cause

This bug was previously reported in #17304 (2025-12-01, v1.80.7) and proposed the correct fix ("move those values to CachedContent from GenerateContent request"). That issue was auto-closed as not_planned on 2026-03-10 by the stale bot without any maintainer engagement. A community PR (#23986, 2026-03-18) attempted the fix but was closed because it over-scoped and introduced a TypeError. The correct 3-line change has never landed.

PR fix notes

PR #26077: fix(vertex_ai): omit system_instruction/tools/toolConfig when cachedContent set

Description (problem / solution / changelog)

Summary

Vertex generateContent rejects cachedContent in the same request as system_instruction, tools, or toolConfig (400 INVALID_ARGUMENT). Those fields must only appear on CachedContent creation, not on the follow-up generate call.

Changes

  • Guard those fields (and server-side tool invocation toolConfig mutation) behind cached_content is None in _transform_request_body.
  • Add regression test test_cached_content_omits_system_instruction_tools_toolconfig.

Related

Fixes #26014 (re-file of #17304).

<img width="1136" height="686" alt="image" src="https://github.com/user-attachments/assets/6f3b11d7-0528-4dae-8c38-653066c6cab7" />

Changed files

  • litellm/llms/vertex_ai/gemini/transformation.py (modified, +5/-4)
  • tests/test_litellm/llms/vertex_ai/gemini/test_vertex_ai_gemini_transformation.py (modified, +47/-0)

Code Example

Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

---

last_continuous_block_idx = get_first_continuous_block_idx(filtered_messages)
if filtered_messages and last_continuous_block_idx is not None:
    first_cached_idx = filtered_messages[0][0]
    last_cached_idx = filtered_messages[last_continuous_block_idx][0]
    cached_messages = messages[first_cached_idx : last_cached_idx + 1]
    non_cached_messages = (
        messages[:first_cached_idx] + messages[last_cached_idx + 1 :]
    )

---

data = RequestBody(contents=content)
if system_instructions is not None:
    data["system_instruction"] = system_instructions
if tools is not None:
    data["tools"] = tools
if tool_choice is not None:
    data["toolConfig"] = tool_choice
...
if cached_content is not None:
    data["cachedContent"] = cached_content   # <-- collision with the three above

---

import asyncio
import litellm

litellm.use_litellm_proxy = True  # optional, if routing through your proxy
MODEL = "vertex_ai/gemini-3.1-pro-preview"  # reproduces on gemini-2.5-pro etc. too

BIG = "AVAILABLE DATA CONTEXT\n" + ("glossary_entry_token " * 8000)   # > 32K tokens

def cached(text):
    return {
        "role": "system",
        "content": [
            {"type": "text", "text": text, "cache_control": {"type": "ephemeral"}}
        ],
    }

messages = [
    cached(BIG),                                                         # cached
    {"role": "system", "content": "AVAILABLE SKILLS\n- a\n- b"},         # breaks the block
    cached("TOOL DEFINITIONS\n" + ("tool_def_token " * 8000)),           # NOT cached anymore
    {"role": "user", "content": "Summarize."},
]

tools = [{"type": "function", "function": {
    "name": "get_weather",
    "description": "Get weather",
    "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
    "cache_control": {"type": "ephemeral"},
}}]

async def main():
    try:
        resp = await litellm.acompletion(model=MODEL, messages=messages, tools=tools)
        print(resp.choices[0].message.content[:80])
    except Exception as e:
        print(type(e).__name__, str(e)[:600])

asyncio.run(main())

---

BadRequestError litellm.BadRequestError: ... Vertex_aiException BadRequestError - {
  "error": { "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT" } }

---

data = RequestBody(contents=content)
-if system_instructions is not None:
+if system_instructions is not None and cached_content is None:
     data["system_instruction"] = system_instructions
-if tools is not None:
+if tools is not None and cached_content is None:
     data["tools"] = tools
-if tool_choice is not None:
+if tool_choice is not None and cached_content is None:
     data["toolConfig"] = tool_choice
 ...
 if cached_content is not None:
     data["cachedContent"] = cached_content

---

litellm.BadRequestError: Error code: 400 - ... Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

This bug was previously reported in #17304 (2025-12-01, v1.80.7) and proposed the correct fix ("move those values to CachedContent from GenerateContent request"). That issue was auto-closed as not_planned on 2026-03-10 by the stale bot without any maintainer engagement. A community PR (#23986, 2026-03-18) attempted the fix but was closed because it over-scoped and introduced a TypeError. The correct 3-line change has never landed.

I'm re-filing because the bug is still present on the latest release (1.83.9) and is actively breaking production traffic for us on vertex_ai/gemini-3.1-pro-preview.

What happened?

When an OpenAI-format request contains cache_control: {"type": "ephemeral"} markers on system messages (Anthropic prompt-caching convention), LiteLLM's Vertex AI path translates those into Vertex's Context Cache API. But the generateContent request body is then assembled with both cachedContent AND system_instruction (+ tools / toolConfig), which Vertex explicitly rejects:

Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

Root cause (two interacting issues)

(1) separate_cached_messages only keeps the first continuous block of cache-marked messages. In litellm/llms/vertex_ai/context_caching/transformation.py:

last_continuous_block_idx = get_first_continuous_block_idx(filtered_messages)
if filtered_messages and last_continuous_block_idx is not None:
    first_cached_idx = filtered_messages[0][0]
    last_cached_idx = filtered_messages[last_continuous_block_idx][0]
    cached_messages = messages[first_cached_idx : last_cached_idx + 1]
    non_cached_messages = (
        messages[:first_cached_idx] + messages[last_cached_idx + 1 :]
    )

Any role=system message that has cache_control but sits after an uncached message gets dropped into non_cached_messages. Those leftover system messages then get turned into Vertex's system_instruction field. This is common in real-world agent frameworks that interleave cached and uncached system blocks (e.g. RAG contexts, skill catalogs, tool definitions).

(2) The generateContent body builder unconditionally adds system_instruction / tools / toolConfig even when cachedContent is set. In litellm/llms/vertex_ai/gemini/transformation.py (lines 749–765 on [email protected]):

data = RequestBody(contents=content)
if system_instructions is not None:
    data["system_instruction"] = system_instructions
if tools is not None:
    data["tools"] = tools
if tool_choice is not None:
    data["toolConfig"] = tool_choice
...
if cached_content is not None:
    data["cachedContent"] = cached_content   # <-- collision with the three above

Per Vertex AI docs, cachedContent is mutually exclusive with system_instruction, tools, and toolConfig in the same generateContent call — those must be baked into the cache at creation time, not re-sent.

Steps to Reproduce

Minimal reproduction using the SDK (works against the proxy too). The payload mimics what an agent loop produces: a large system block with cache_control, followed by a plain system block (the thing that breaks it), followed by another cache_control block.

import asyncio
import litellm

litellm.use_litellm_proxy = True  # optional, if routing through your proxy
MODEL = "vertex_ai/gemini-3.1-pro-preview"  # reproduces on gemini-2.5-pro etc. too

BIG = "AVAILABLE DATA CONTEXT\n" + ("glossary_entry_token " * 8000)   # > 32K tokens

def cached(text):
    return {
        "role": "system",
        "content": [
            {"type": "text", "text": text, "cache_control": {"type": "ephemeral"}}
        ],
    }

messages = [
    cached(BIG),                                                         # cached
    {"role": "system", "content": "AVAILABLE SKILLS\n- a\n- b"},         # breaks the block
    cached("TOOL DEFINITIONS\n" + ("tool_def_token " * 8000)),           # NOT cached anymore
    {"role": "user", "content": "Summarize."},
]

tools = [{"type": "function", "function": {
    "name": "get_weather",
    "description": "Get weather",
    "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
    "cache_control": {"type": "ephemeral"},
}}]

async def main():
    try:
        resp = await litellm.acompletion(model=MODEL, messages=messages, tools=tools)
        print(resp.choices[0].message.content[:80])
    except Exception as e:
        print(type(e).__name__, str(e)[:600])

asyncio.run(main())

Observed:

BadRequestError litellm.BadRequestError: ... Vertex_aiException BadRequestError - {
  "error": { "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT" } }

Confirmed on 1.81.8 (SDK), 1.81.0 / 1.83.3 / 1.83.9 (proxy source inspection — byte-identical generateContent body builder across all three).

Suggested fix (3 lines)

Guard the three conflicting fields in litellm/llms/vertex_ai/gemini/transformation.py so they are only added when cached_content is None. This was exactly the fix in the core diff of #23986 (which should have been merged with only this change):

 data = RequestBody(contents=content)
-if system_instructions is not None:
+if system_instructions is not None and cached_content is None:
     data["system_instruction"] = system_instructions
-if tools is not None:
+if tools is not None and cached_content is None:
     data["tools"] = tools
-if tool_choice is not None:
+if tool_choice is not None and cached_content is None:
     data["toolConfig"] = tool_choice
 ...
 if cached_content is not None:
     data["cachedContent"] = cached_content

This is the simplest correct fix — it matches Vertex AI's documented API contract (those three fields live in the cache, not in the call) and preserves all existing behavior when cached_content is not set.

Optional follow-on fix for the separator issue (#1 above): either include all cache-marked messages in cached_messages (not just the first continuous block), or raise a clear error when the caller mixes cache_control patterns in a way Vertex can't represent. The 3-line guard above is sufficient to stop the 400; the separator logic just becomes less useful (fewer messages get cached) rather than broken.

Relevant log output

litellm.BadRequestError: Error code: 400 - ... Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

What part of LiteLLM is this about?

Both SDK and Proxy (shared code path in llms/vertex_ai/).

What LiteLLM version are you on ?

Reproduced on 1.81.8 (SDK), confirmed unchanged source in 1.81.0 / 1.83.3 / 1.83.9.

Related

  • #17304 — original bug report, closed as not_planned by stale bot without maintainer reply
  • #23986 — community PR with the correct 3-line fix, closed because the PR also contained unrelated deletions

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The most likely fix is to guard the conflicting fields in the generateContent body builder to only add them when cached_content is None, as suggested in the 3-line fix.

Guidance

  • Verify that the issue is indeed caused by the conflicting fields in the generateContent body builder by checking the request payload sent to Vertex AI.
  • Apply the suggested 3-line fix to guard the conflicting fields, ensuring that system_instruction, tools, and toolConfig are only added when cached_content is None.
  • Test the fix using the provided minimal reproduction code to ensure that the BadRequestError is resolved.
  • Consider addressing the optional follow-on fix for the separator issue to improve the caching behavior.

Example

The suggested 3-line fix can be applied as follows:

 data = RequestBody(contents=content)
-if system_instructions is not None:
+if system_instructions is not None and cached_content is None:
     data["system_instruction"] = system_instructions
-if tools is not None:
+if tools is not None and cached_content is None:
     data["tools"] = tools
-if tool_choice is not None:
+if tool_choice is not None and cached_content is None:
     data["toolConfig"] = tool_choice
 ...
 if cached_content is not None:
     data["cachedContent"] = cached_content

Notes

The fix assumes that the cached_content field is correctly set when using cached content. If this is not the case, additional debugging may be required.

Recommendation

Apply the suggested 3-line fix to guard the conflicting fields, as it is a simple and effective solution that matches the Vertex AI documented API contract.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING