Skip to content

Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations #35308

@aunitt

Description

@aunitt

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost

Error Message and Stack Trace (if applicable)

No error — the second call silently misses the cache and makes an unnecessary API call.

Description

PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

The cascade

  1. Run 1 (cold cache): Call 1 goes to API → response has usage_metadata with 5 keys (no total_cost). This AIMessage goes into history. Call 2 is cached with this history.

  2. Run 2 (warm cache): Call 1 → cache hit_convert_cached_generations injects total_cost: 0 (now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0, ) → cache MISS.

  3. Run 3: Same injection, but Call 2 now matches the Run 2 cached entry → cache hit again.

Evidence

We confirmed this in a real pipeline processing 100 papers. For the same paper, two refine_coreferences entries exist in the SQLite cache — one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the "total_cost": 0, string (17 bytes).

Note on existing id normalisation

The codebase already handles a similar issue with the id field — it strips id from messages before computing cache keys (lines 1151-1158). However, usage_metadata is not similarly normalised, so the total_cost injection pollutes downstream cache keys.

Suggested fix

Either:

  1. Don't inject total_cost into the AIMessage's usage_metadata (track it separately for LangSmith)
  2. Strip/normalise usage_metadata fields (like total_cost) from AIMessages in conversation history before computing cache keys, similar to how id is already handled

System Info

langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions