llamaIndex - ✅(Solved) Fix [Feature Request]: Gemini prompt caching [1 pull requests, 4 comments, 2 participants]

llamaIndex2026-03-09 10:24:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#20924•Fetched 2026-04-08 00:30:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mdciri

Participants

bittoby

mdciri

Timeline (top)

commented ×4mentioned ×4subscribed ×4labeled ×2

RAW_BUFFERClick to expand / collapse

Feature Description

Integrate the Gemini prompt caching to save LLM costs.

Reason

Context caching is a paid feature designed to reduce cost. Billing is based on the following factors:

Cache token count: The number of input tokens cached, billed at a reduced rate when included in subsequent prompts.
Storage duration: The amount of time cached tokens are stored (TTL), billed based on the TTL duration of cached token count. There are no minimum or maximum bounds on the TTL.
Other factors: Other charges apply, such as for non-cached input tokens and output tokens.

Value of Feature

By reducing both the cost and latency of processing large datasets, this integration transforms the "Context-Augmented" experience that LlamaIndex is known for.

extent analysis

Fix: Implement Gemini Prompt Caching

Step-by-Step Solution Plan

Install the Gemini API client library: Run gem install google-cloud-gemini to install the required library.
Import the library and set up credentials: Add the following code to your Ruby file:

require 'google/cloud/gemini'

Set up credentials

gemini = Google::Cloud::Gemini.new

3. **Create a cache client**: Create a cache client instance to interact with the Gemini API:
   ```ruby
cache_client = gemini.cache_client

Cache input tokens: Before sending a prompt to the LLM, cache the input tokens using the cache_client:

input_tokens = "This is a sample input" cache_client.cache_tokens(input_tokens, ttl: 3600) # Cache for 1 hour

5. **Use the cached tokens**: When sending a subsequent prompt, use the cached tokens to reduce costs:
   ```ruby
prompt = "This is a sample prompt with cached tokens: #{input_tokens}"

Monitor cache metrics: Use the cache_client to monitor cache metrics, such as cache hit rate and cache size:

cache_metrics = cache_client.get_metrics


#### Verification

* Verify that the cache is working by checking the cache hit rate and cache size.
* Monitor the LLM costs to ensure that they have decreased after implementing the cache.

#### Extra Tips

* Make sure to handle cache expiration and refresh cache tokens as needed.
* Monitor cache performance and adjust the TTL duration accordingly.
* Consider implementing a cache invalidation strategy to ensure cache freshness.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #conversation history #tool integration #LLM response #prompt template

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

llamaIndex - ✅(Solved) Fix [Feature Request]: Gemini prompt caching [1 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #21081: feat: add cache management methods and token count extraction for Gemini prompt caching

Description (problem / solution / changelog)

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Checklist

Changed files

Feature Description

Reason

Value of Feature

extent analysis

Fix: Implement Gemini Prompt Caching

Step-by-Step Solution Plan

Set up credentials

Still need to ship something?

TRENDING

llamaIndex - ✅(Solved) Fix [Feature Request]: Gemini prompt caching [1 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #21081: feat: add cache management methods and token count extraction for Gemini prompt caching

Description (problem / solution / changelog)

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Checklist

Changed files

Feature Description

Reason

Value of Feature

extent analysis

Fix: Implement Gemini Prompt Caching

Step-by-Step Solution Plan

Set up credentials

Still need to ship something?

RELATED_DISCOVERY

TRENDING