Problem
Current implicit caching for Gemini models doesn't guarantee caching, even when exceeding minimum token thresholds (1,024 tokens for 2.5 Flash, 2,048 tokens for 2.5 Pro). This is the observed behavior after thorough testing.
Solution
Add explicit context caching support using Gemini API's caching feature:
- Cache Creation: Create caches with configurable TTL
- Guaranteed Savings: Predictable caching mechanism
Problem
Current implicit caching for Gemini models doesn't guarantee caching, even when exceeding minimum token thresholds (1,024 tokens for 2.5 Flash, 2,048 tokens for 2.5 Pro). This is the observed behavior after thorough testing.
Solution
Add explicit context caching support using Gemini API's caching feature: