langchain - ✅(Solved) Fix Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations [1 pull requests, 5 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#35308Fetched 2026-04-08 00:26:44
View on GitHub
Comments
5
Participants
3
Timeline
10
Reactions
0
Author
Timeline (top)
commented ×4mentioned ×2subscribed ×2cross-referenced ×1

PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

Error Message

Error Message and Stack Trace (if applicable)

No error — the second call silently misses the cache and makes an unnecessary API call.

Root Cause

The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

Fix Action

Fix / Workaround

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

PR fix notes

PR #35312: chore(core): Normalization of total_cost for prompt key in llm_cache

Description (problem / solution / changelog)

  • Simple Normalisation of total_cost to align with converted_generation to hit the cache with the same message content if a message history is injected.
  • Fixes #35308

Changed files

  • libs/core/langchain_core/language_models/chat_models.py (modified, +28/-16)
  • libs/core/tests/unit_tests/language_models/chat_models/test_cache.py (modified, +176/-0)

Code Example

from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost

---

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

---

langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0
RAW_BUFFERClick to expand / collapse

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost

Error Message and Stack Trace (if applicable)

No error — the second call silently misses the cache and makes an unnecessary API call.

Description

PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

The cascade

  1. Run 1 (cold cache): Call 1 goes to API → response has usage_metadata with 5 keys (no total_cost). This AIMessage goes into history. Call 2 is cached with this history.

  2. Run 2 (warm cache): Call 1 → cache hit_convert_cached_generations injects total_cost: 0 (now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0, ) → cache MISS.

  3. Run 3: Same injection, but Call 2 now matches the Run 2 cached entry → cache hit again.

Evidence

We confirmed this in a real pipeline processing 100 papers. For the same paper, two refine_coreferences entries exist in the SQLite cache — one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the "total_cost": 0, string (17 bytes).

Note on existing id normalisation

The codebase already handles a similar issue with the id field — it strips id from messages before computing cache keys (lines 1151-1158). However, usage_metadata is not similarly normalised, so the total_cost injection pollutes downstream cache keys.

Suggested fix

Either:

  1. Don't inject total_cost into the AIMessage's usage_metadata (track it separately for LangSmith)
  2. Strip/normalise usage_metadata fields (like total_cost) from AIMessages in conversation history before computing cache keys, similar to how id is already handled

System Info

langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0

extent analysis

Fix Plan

Option 1: Remove total_cost injection

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # No need to zero out cost on cache hits
    pass

Option 2: Strip/normalise usage_metadata fields

# libs/core/langchain_core/language_models/chat_models.py, lines 1159-1166
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # Strip/normalise usage_metadata fields
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                k: v for k, v in (gen.message.usage_metadata or {}).items() if k != "total_cost"
            }
        }
    )

Verification

  1. Run the example code with both options.
  2. Check if the second call to chain_with_history.invoke is cached correctly.
  3. Verify that the total_cost field is not injected into the AIMessage's usage_metadata or is stripped/normalised correctly.

Extra Tips

  • Consider tracking total_cost separately for LangSmith as suggested in the issue description.
  • Review the codebase for similar issues with other fields in usage_metadata.
  • Update the documentation to reflect the changes made to the code.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - ✅(Solved) Fix Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations [1 pull requests, 5 comments, 3 participants]