langchain - ✅(Solved) Fix Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations [1 pull requests, 5 comments, 3 participants]

langchain2026-02-18 13:52:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

langchain-ai/langchain#35308•Fetched 2026-04-08 00:26:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4mentioned ×2subscribed ×2cross-referenced ×1

PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

Error Message

Error Message and Stack Trace (if applicable)

No error — the second call silently misses the cache and makes an unnecessary API call.

Root Cause

Fix Action

Fix / Workaround

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

PR fix notes

PR #35312: chore(core): Normalization of `total_cost` for `prompt` key in `llm_cache`

Repository: langchain-ai/langchain
Author: keenborder786
State: open | merged: False
Link: https://github.com/langchain-ai/langchain/pull/35312

Description (problem / solution / changelog)

Simple Normalisation of total_cost to align with converted_generation to hit the cache with the same message content if a message history is injected.
Fixes #35308

Changed files

libs/core/langchain_core/language_models/chat_models.py (modified, +28/-16)
libs/core/tests/unit_tests/language_models/chat_models/test_cache.py (modified, +176/-0)

Code Example

from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost

---

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

---

langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0

RAW_BUFFERClick to expand / collapse

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost

Error Message and Stack Trace (if applicable)

No error — the second call silently misses the cache and makes an unnecessary API call.

Description

PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

The cascade

Run 1 (cold cache): Call 1 goes to API → response has usage_metadata with 5 keys (no total_cost). This AIMessage goes into history. Call 2 is cached with this history.
Run 2 (warm cache): Call 1 → cache hit → _convert_cached_generations injects total_cost: 0 (now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0, ) → cache MISS.
Run 3: Same injection, but Call 2 now matches the Run 2 cached entry → cache hit again.

Evidence

We confirmed this in a real pipeline processing 100 papers. For the same paper, two refine_coreferences entries exist in the SQLite cache — one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the "total_cost": 0, string (17 bytes).

Note on existing `id` normalisation

The codebase already handles a similar issue with the id field — it strips id from messages before computing cache keys (lines 1151-1158). However, usage_metadata is not similarly normalised, so the total_cost injection pollutes downstream cache keys.

Suggested fix

Either:

Don't inject total_cost into the AIMessage's usage_metadata (track it separately for LangSmith)
Strip/normalise usage_metadata fields (like total_cost) from AIMessages in conversation history before computing cache keys, similar to how id is already handled

System Info

langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0

extent analysis

Fix Plan

Option 1: Remove `total_cost` injection

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # No need to zero out cost on cache hits
    pass

Option 2: Strip/normalise `usage_metadata` fields

# libs/core/langchain_core/language_models/chat_models.py, lines 1159-1166
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # Strip/normalise usage_metadata fields
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                k: v for k, v in (gen.message.usage_metadata or {}).items() if k != "total_cost"
            }
        }
    )

Verification

Run the example code with both options.
Check if the second call to chain_with_history.invoke is cached correctly.
Verify that the total_cost field is not injected into the AIMessage's usage_metadata or is stripped/normalised correctly.

Extra Tips

Consider tracking total_cost separately for LangSmith as suggested in the issue description.
Review the codebase for similar issues with other fields in usage_metadata.
Update the documentation to reflect the changes made to the code.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #conversation history #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

langchain - ✅(Solved) Fix Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations [1 pull requests, 5 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #35312: chore(core): Normalization of total_cost for prompt key in llm_cache

Description (problem / solution / changelog)

Changed files

Code Example

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

The cascade

Evidence

Note on existing id normalisation

Suggested fix

System Info

extent analysis

Fix Plan

Option 1: Remove total_cost injection

Option 2: Strip/normalise usage_metadata fields

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

PR #35312: chore(core): Normalization of `total_cost` for `prompt` key in `llm_cache`

Note on existing `id` normalisation

Option 1: Remove `total_cost` injection

Option 2: Strip/normalise `usage_metadata` fields