Checked other resources
Example Code
from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI
# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))
# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}
def get_session_history(session_id):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
chain_with_history = RunnableWithMessageHistory(llm, get_session_history)
# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config) # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config) # Call 2 - cached (history includes Call 1 response)
# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config) # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config) # Call 2 - CACHE MISS! History differs due to total_cost
Error Message and Stack Trace (if applicable)
No error — the second call silently misses the cache and makes an unnecessary API call.
Description
PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:
# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
# We zero out cost on cache hits
gen.message = gen.message.model_copy(
update={
"usage_metadata": {
**(gen.message.usage_metadata or {}),
"total_cost": 0,
}
}
)
The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.
The cascade
-
Run 1 (cold cache): Call 1 goes to API → response has usage_metadata with 5 keys (no total_cost). This AIMessage goes into history. Call 2 is cached with this history.
-
Run 2 (warm cache): Call 1 → cache hit → _convert_cached_generations injects total_cost: 0 (now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0, ) → cache MISS.
-
Run 3: Same injection, but Call 2 now matches the Run 2 cached entry → cache hit again.
Evidence
We confirmed this in a real pipeline processing 100 papers. For the same paper, two refine_coreferences entries exist in the SQLite cache — one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the "total_cost": 0, string (17 bytes).
Note on existing id normalisation
The codebase already handles a similar issue with the id field — it strips id from messages before computing cache keys (lines 1151-1158). However, usage_metadata is not similarly normalised, so the total_cost injection pollutes downstream cache keys.
Suggested fix
Either:
- Don't inject
total_cost into the AIMessage's usage_metadata (track it separately for LangSmith)
- Strip/normalise
usage_metadata fields (like total_cost) from AIMessages in conversation history before computing cache keys, similar to how id is already handled
System Info
langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
No error — the second call silently misses the cache and makes an unnecessary API call.
Description
PR #32437 introduced code in
_convert_cached_generationsthat injects"total_cost": 0intousage_metadataof AIMessages on cache hits:The original API response does NOT include
total_costinusage_metadata. When the modified AIMessage (withtotal_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.The cascade
Run 1 (cold cache): Call 1 goes to API → response has
usage_metadatawith 5 keys (nototal_cost). This AIMessage goes into history. Call 2 is cached with this history.Run 2 (warm cache): Call 1 → cache hit →
_convert_cached_generationsinjectstotal_cost: 0(now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0,) → cache MISS.Run 3: Same injection, but Call 2 now matches the Run 2 cached entry → cache hit again.
Evidence
We confirmed this in a real pipeline processing 100 papers. For the same paper, two
refine_coreferencesentries exist in the SQLite cache — one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the"total_cost": 0,string (17 bytes).Note on existing
idnormalisationThe codebase already handles a similar issue with the
idfield — it stripsidfrom messages before computing cache keys (lines 1151-1158). However,usage_metadatais not similarly normalised, so thetotal_costinjection pollutes downstream cache keys.Suggested fix
Either:
total_costinto the AIMessage'susage_metadata(track it separately for LangSmith)usage_metadatafields (liketotal_cost) from AIMessages in conversation history before computing cache keys, similar to howidis already handledSystem Info