litellm - ✅(Solved) Fix [Bug]: Vertex AI sends cached_content + system_instruction/tools/toolConfig in same request → 400 INVALID_ARGUMENT (refile of #17304) [1 pull requests, 1 participants]

litellm2026-04-18 17:13:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26014•Fetched 2026-04-19 15:06:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

psarma89

Participants

psarma89

Timeline (top)

labeled ×1

Error Message

Vertex_aiException BadRequestError - { "error": { "code": 400, "message": "Tool config, tools and system instruction should not be set in the request when using cached content.", "status": "INVALID_ARGUMENT" } }

Root Cause

This bug was previously reported in #17304 (2025-12-01, v1.80.7) and proposed the correct fix ("move those values to CachedContent from GenerateContent request"). That issue was auto-closed as not_planned on 2026-03-10 by the stale bot without any maintainer engagement. A community PR (#23986, 2026-03-18) attempted the fix but was closed because it over-scoped and introduced a TypeError. The correct 3-line change has never landed.

PR fix notes

PR #26077: fix(vertex_ai): omit system_instruction/tools/toolConfig when cachedContent set

Repository: BerriAI/litellm
Author: Sameerlite
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26077

Description (problem / solution / changelog)

Summary

Vertex generateContent rejects cachedContent in the same request as system_instruction, tools, or toolConfig (400 INVALID_ARGUMENT). Those fields must only appear on CachedContent creation, not on the follow-up generate call.

Changes

Guard those fields (and server-side tool invocation toolConfig mutation) behind cached_content is None in _transform_request_body.
Add regression test test_cached_content_omits_system_instruction_tools_toolconfig.

Fixes #26014 (re-file of #17304).

Changed files

litellm/llms/vertex_ai/gemini/transformation.py (modified, +5/-4)
tests/test_litellm/llms/vertex_ai/gemini/test_vertex_ai_gemini_transformation.py (modified, +47/-0)

Code Example

Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

---

last_continuous_block_idx = get_first_continuous_block_idx(filtered_messages)
if filtered_messages and last_continuous_block_idx is not None:
    first_cached_idx = filtered_messages[0][0]
    last_cached_idx = filtered_messages[last_continuous_block_idx][0]
    cached_messages = messages[first_cached_idx : last_cached_idx + 1]
    non_cached_messages = (
        messages[:first_cached_idx] + messages[last_cached_idx + 1 :]
    )

---

data = RequestBody(contents=content)
if system_instructions is not None:
    data["system_instruction"] = system_instructions
if tools is not None:
    data["tools"] = tools
if tool_choice is not None:
    data["toolConfig"] = tool_choice
...
if cached_content is not None:
    data["cachedContent"] = cached_content   # <-- collision with the three above

---

import asyncio
import litellm

litellm.use_litellm_proxy = True  # optional, if routing through your proxy
MODEL = "vertex_ai/gemini-3.1-pro-preview"  # reproduces on gemini-2.5-pro etc. too

BIG = "AVAILABLE DATA CONTEXT\n" + ("glossary_entry_token " * 8000)   # > 32K tokens

def cached(text):
    return {
        "role": "system",
        "content": [
            {"type": "text", "text": text, "cache_control": {"type": "ephemeral"}}
        ],
    }

messages = [
    cached(BIG),                                                         # cached
    {"role": "system", "content": "AVAILABLE SKILLS\n- a\n- b"},         # breaks the block
    cached("TOOL DEFINITIONS\n" + ("tool_def_token " * 8000)),           # NOT cached anymore
    {"role": "user", "content": "Summarize."},
]

tools = [{"type": "function", "function": {
    "name": "get_weather",
    "description": "Get weather",
    "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
    "cache_control": {"type": "ephemeral"},
}}]

async def main():
    try:
        resp = await litellm.acompletion(model=MODEL, messages=messages, tools=tools)
        print(resp.choices[0].message.content[:80])
    except Exception as e:
        print(type(e).__name__, str(e)[:600])

asyncio.run(main())

---

BadRequestError litellm.BadRequestError: ... Vertex_aiException BadRequestError - {
  "error": { "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT" } }

---

data = RequestBody(contents=content)
-if system_instructions is not None:
+if system_instructions is not None and cached_content is None:
     data["system_instruction"] = system_instructions
-if tools is not None:
+if tools is not None and cached_content is None:
     data["tools"] = tools
-if tool_choice is not None:
+if tool_choice is not None and cached_content is None:
     data["toolConfig"] = tool_choice
 ...
 if cached_content is not None:
     data["cachedContent"] = cached_content

---

litellm.BadRequestError: Error code: 400 - ... Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

I'm re-filing because the bug is still present on the latest release (1.83.9) and is actively breaking production traffic for us on vertex_ai/gemini-3.1-pro-preview.

What happened?

When an OpenAI-format request contains cache_control: {"type": "ephemeral"} markers on system messages (Anthropic prompt-caching convention), LiteLLM's Vertex AI path translates those into Vertex's Context Cache API. But the generateContent request body is then assembled with both cachedContent AND system_instruction (+ tools / toolConfig), which Vertex explicitly rejects:

Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

Root cause (two interacting issues)

(1) separate_cached_messages only keeps the first continuous block of cache-marked messages. In litellm/llms/vertex_ai/context_caching/transformation.py:

last_continuous_block_idx = get_first_continuous_block_idx(filtered_messages)
if filtered_messages and last_continuous_block_idx is not None:
    first_cached_idx = filtered_messages[0][0]
    last_cached_idx = filtered_messages[last_continuous_block_idx][0]
    cached_messages = messages[first_cached_idx : last_cached_idx + 1]
    non_cached_messages = (
        messages[:first_cached_idx] + messages[last_cached_idx + 1 :]
    )

Any role=system message that has cache_control but sits after an uncached message gets dropped into non_cached_messages. Those leftover system messages then get turned into Vertex's system_instruction field. This is common in real-world agent frameworks that interleave cached and uncached system blocks (e.g. RAG contexts, skill catalogs, tool definitions).

(2) The generateContent body builder unconditionally adds system_instruction / tools / toolConfig even when cachedContent is set. In litellm/llms/vertex_ai/gemini/transformation.py (lines 749–765 on [email protected]):

data = RequestBody(contents=content)
if system_instructions is not None:
    data["system_instruction"] = system_instructions
if tools is not None:
    data["tools"] = tools
if tool_choice is not None:
    data["toolConfig"] = tool_choice
...
if cached_content is not None:
    data["cachedContent"] = cached_content   # <-- collision with the three above

Per Vertex AI docs, cachedContent is mutually exclusive with system_instruction, tools, and toolConfig in the same generateContent call — those must be baked into the cache at creation time, not re-sent.

Steps to Reproduce

Minimal reproduction using the SDK (works against the proxy too). The payload mimics what an agent loop produces: a large system block with cache_control, followed by a plain system block (the thing that breaks it), followed by another cache_control block.

import asyncio
import litellm

litellm.use_litellm_proxy = True  # optional, if routing through your proxy
MODEL = "vertex_ai/gemini-3.1-pro-preview"  # reproduces on gemini-2.5-pro etc. too

BIG = "AVAILABLE DATA CONTEXT\n" + ("glossary_entry_token " * 8000)   # > 32K tokens

def cached(text):
    return {
        "role": "system",
        "content": [
            {"type": "text", "text": text, "cache_control": {"type": "ephemeral"}}
        ],
    }

messages = [
    cached(BIG),                                                         # cached
    {"role": "system", "content": "AVAILABLE SKILLS\n- a\n- b"},         # breaks the block
    cached("TOOL DEFINITIONS\n" + ("tool_def_token " * 8000)),           # NOT cached anymore
    {"role": "user", "content": "Summarize."},
]

tools = [{"type": "function", "function": {
    "name": "get_weather",
    "description": "Get weather",
    "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
    "cache_control": {"type": "ephemeral"},
}}]

async def main():
    try:
        resp = await litellm.acompletion(model=MODEL, messages=messages, tools=tools)
        print(resp.choices[0].message.content[:80])
    except Exception as e:
        print(type(e).__name__, str(e)[:600])

asyncio.run(main())

Observed:

BadRequestError litellm.BadRequestError: ... Vertex_aiException BadRequestError - {
  "error": { "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT" } }

Confirmed on 1.81.8 (SDK), 1.81.0 / 1.83.3 / 1.83.9 (proxy source inspection — byte-identical generateContent body builder across all three).

Suggested fix (3 lines)

Guard the three conflicting fields in litellm/llms/vertex_ai/gemini/transformation.py so they are only added when cached_content is None. This was exactly the fix in the core diff of #23986 (which should have been merged with only this change):

 data = RequestBody(contents=content)
-if system_instructions is not None:
+if system_instructions is not None and cached_content is None:
     data["system_instruction"] = system_instructions
-if tools is not None:
+if tools is not None and cached_content is None:
     data["tools"] = tools
-if tool_choice is not None:
+if tool_choice is not None and cached_content is None:
     data["toolConfig"] = tool_choice
 ...
 if cached_content is not None:
     data["cachedContent"] = cached_content

This is the simplest correct fix — it matches Vertex AI's documented API contract (those three fields live in the cache, not in the call) and preserves all existing behavior when cached_content is not set.

Optional follow-on fix for the separator issue (#1 above): either include all cache-marked messages in cached_messages (not just the first continuous block), or raise a clear error when the caller mixes cache_control patterns in a way Vertex can't represent. The 3-line guard above is sufficient to stop the 400; the separator logic just becomes less useful (fewer messages get cached) rather than broken.

Relevant log output

litellm.BadRequestError: Error code: 400 - ... Vertex_aiException BadRequestError - {
  "error": {
    "code": 400,
    "message": "Tool config, tools and system instruction should not be set in the request when using cached content.",
    "status": "INVALID_ARGUMENT"
  }
}

What part of LiteLLM is this about?

Both SDK and Proxy (shared code path in llms/vertex_ai/).

What LiteLLM version are you on ?

Reproduced on 1.81.8 (SDK), confirmed unchanged source in 1.81.0 / 1.83.3 / 1.83.9.

#17304 — original bug report, closed as not_planned by stale bot without maintainer reply
#23986 — community PR with the correct 3-line fix, closed because the PR also contained unrelated deletions

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The most likely fix is to guard the conflicting fields in the generateContent body builder to only add them when cached_content is None, as suggested in the 3-line fix.

Guidance

Verify that the issue is indeed caused by the conflicting fields in the generateContent body builder by checking the request payload sent to Vertex AI.
Apply the suggested 3-line fix to guard the conflicting fields, ensuring that system_instruction, tools, and toolConfig are only added when cached_content is None.
Test the fix using the provided minimal reproduction code to ensure that the BadRequestError is resolved.
Consider addressing the optional follow-on fix for the separator issue to improve the caching behavior.

Example

The suggested 3-line fix can be applied as follows:

 data = RequestBody(contents=content)
-if system_instructions is not None:
+if system_instructions is not None and cached_content is None:
     data["system_instruction"] = system_instructions
-if tools is not None:
+if tools is not None and cached_content is None:
     data["tools"] = tools
-if tool_choice is not None:
+if tool_choice is not None and cached_content is None:
     data["toolConfig"] = tool_choice
 ...
 if cached_content is not None:
     data["cachedContent"] = cached_content

Notes

The fix assumes that the cached_content field is correctly set when using cached content. If this is not the case, additional debugging may be required.

Recommendation

Apply the suggested 3-line fix to guard the conflicting fields, as it is a simple and effective solution that matches the Vertex AI documented API contract.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt issue #agent setup #task chaining #parallel task

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.