Skip to content

Module 4: LangCache

Users ask the same questions over and over, and agents use 4x more tokens than chat. LangCache caches responses by meaning, saving up to 90% on token costs.


How it works

LangCache process

  1. User sends a question
  2. LangCache searches for a semantically similar cached response
  3. HIT — cached response returns instantly, no LLM call
  4. MISS — full pipeline runs, response gets cached for next time

Cloud setup

1. Create a LangCache instance

  1. In the Redis Cloud console, find LangCache in the left sidebar.
  2. Click Quick Create.

LangCache quick create

  1. Name your cache (e.g. iris-workshop) and confirm.
  2. Click Create.

2. Copy your service key

Save this — you won't see it again.

LangCache service key

3. Get your Host and Cache ID

After dismissing the service key dialog, find the remaining credentials on the instance details page:

LangCache connectivity details

4. Configure environment

Add to your .env:

LANGCACHE_HOST=<URL List from step 3>
LANGCACHE_CACHE_ID=<cache-id from step 3>
LANGCACHE_API_KEY=<service key from step 2>

5. Seed the cache

make seed-langcache
.\workshop.ps1 seed-langcache

This loads one cached response so you can verify immediately.


Exercise

Open exercises/banking/langcache.py

search_request_body(prompt)

This tells LangCache what to search for, how similar a match must be, and what strategy to use.

def search_request_body(self, prompt):
    return {
        "prompt": prompt,                  # what to search for
        "similarityThreshold": 0.82,       # min similarity score (0–1)
        "searchStrategies": ["semantic"],   # match by meaning, not exact text
    }

Fill in the method body with the dict above.


Try it

Restart with make dev, then open localhost:3040.

  1. Click Redis Iris to open the activity panel.
  2. Notice LangCache already has 1 cached response. Someone asked about fixed deposit interest rates.
  3. Ask: "Tell me about your fixed deposit rates and terms" — the response comes back instantly. No LLM call, no token cost. Notice the wording is different from the cached question, but LangCache matches on meaning, not exact text.
  4. Check the activity panel — LangCache shows HIT with a similarity score.
  5. Now ask something new: "Can you waive my card annual fee?" — that's a MISS, so the full agent pipeline runs.

Why won't my own answers get cached?

In this workshop, we intentionally skip storing new LLM responses in the cache. That way every question flows through the full pipeline (routing, context, memory) so you can see all the Redis Iris components in action. The seeded cache entry is there just to show the HIT experience.


Verify

  • Cache HIT returns an instant response (no LLM call)
  • Cache MISS falls through to the full pipeline
  • Activity panel shows LangCache HIT/MISS status