Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations

## Checked other resources

- [x] I added a very descriptive title to this issue.
- [x] I searched the LangChain documentation with the integrated search.
- [x] I used the GitHub search to find a similar question and didn't find it.
- [x] I am sure that this is a bug in LangChain rather than my code.
- [x] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

## Example Code

```python
from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost
```

## Error Message and Stack Trace (if applicable)

No error — the second call silently misses the cache and makes an unnecessary API call.

## Description

PR #32437 introduced code in `_convert_cached_generations` that injects `"total_cost": 0` into `usage_metadata` of AIMessages on cache hits:

```python
# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )
```

The original API response does NOT include `total_cost` in `usage_metadata`. When the modified AIMessage (with `total_cost: 0`) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

### The cascade

1. **Run 1 (cold cache)**: Call 1 goes to API → response has `usage_metadata` with 5 keys (no `total_cost`). This AIMessage goes into history. Call 2 is cached with this history.

2. **Run 2 (warm cache)**: Call 1 → **cache hit** → `_convert_cached_generations` injects `total_cost: 0` (now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes (`"total_cost": 0, `) → **cache MISS**.

3. **Run 3**: Same injection, but Call 2 now matches the Run 2 cached entry → cache hit again.

### Evidence

We confirmed this in a real pipeline processing 100 papers. For the same paper, two `refine_coreferences` entries exist in the SQLite cache — one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The **only** difference is the `"total_cost": 0, ` string (17 bytes).

### Note on existing `id` normalisation

The codebase already handles a similar issue with the `id` field — it strips `id` from messages before computing cache keys (lines 1151-1158). However, `usage_metadata` is not similarly normalised, so the `total_cost` injection pollutes downstream cache keys.

### Suggested fix

Either:
1. Don't inject `total_cost` into the AIMessage's `usage_metadata` (track it separately for LangSmith)
2. Strip/normalise `usage_metadata` fields (like `total_cost`) from AIMessages in conversation history before computing cache keys, similar to how `id` is already handled

## System Info

```
langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations #35308

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

The cascade

Evidence

Note on existing `id` normalisation

Suggested fix

System Info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cache hit total_cost injection breaks downstream cache keys for multi-turn conversations #35308

Description

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

The cascade

Evidence

Note on existing id normalisation

Suggested fix

System Info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Note on existing `id` normalisation