Caching that delivers 37% more throughput
Real-world load test against a production tours page backed by Lunexa. One toggle, measurable gains across latency, throughput, and tail percentiles — no code changes.
+37%
7.62 → 10.42 req/s
−25%
1.22 s → 0.91 s
−20%
2.06 s → 1.65 s
−26%
−157 ms avg
Before vs. after the cache
10 concurrent workers hitting the same endpoint for 10 seconds. All other infrastructure held constant — only the Lunexa collection cache toggle changed between runs.
| Metric | Cache off | Cache on | Improvement |
|---|---|---|---|
| Requests completed | 82 | 108 | +31.7% |
| Throughput (req/s) | 7.62 | 10.42 | +36.8% |
| Average latency | 1.308 s | 0.948 s | −27.5% |
| Median (p50) | 1.220 s | 0.913 s | −25.2% |
| p90 latency | 2.035 s | 1.602 s | −21.3% |
| p95 latency | 2.057 s | 1.652 s | −19.7% |
| Fastest response | 0.528 s | 0.308 s | −41.7% |
| Server TTFB (avg) | 0.597 s | 0.440 s | −26.3% |
Measured on 2026-04-19 against a staging frontend backed by Lunexa production infrastructure. TTFB = time from request sent to first byte of response received.
Tail latency, not just averages
Caching helps the worst 10% of requests — the ones your users actually remember. Averages hide outliers; percentiles don't.
Cache off
1.31 savg
Cache on
0.95 savg
What the cache actually does
Exact-query cache
Identical search payloads (same query, same filters, same facets) skip the search engine and return the previous result within TTL.
Per-collection TTL
Each collection has its own cache_ttl_seconds — product catalogs that change nightly can use 1 hour; fast-moving news indexes 30 seconds.
Natural-Language cache
NL queries have a separate, typically longer TTL (nl_cache_ttl_seconds) because the embedding call is the dominant cost. Default 24 hours.
Per-user key isolation
API-key-scoped caches prevent one tenant's cache state from affecting another's, even when keys sit on top of the same collection.
Enable it in under a minute
- 1
Open your collection settings
Dashboard → your project → Collections → pick the collection → Settings tab.
- 2
Toggle Use cache
Set cache_ttl_seconds (default 120 s is a sensible starting point) and, if you use NL Search, nl_cache_ttl_seconds (default 86400 s).
- 3
Save — no deploy required
The change is live on the next request. Roll back any time by toggling it off.
How we measured
Workload: customer product-catalog page (server-rendered, one Lunexa search per request)
Tool: hey (10 concurrent, 10 s duration)
Date: 2026-04-19
Infra: identical between runs; only cache toggle flipped
Sample: 82 requests (off), 108 requests (on)
The frontend page makes a Lunexa search call server-side before rendering, so every front-end request is a real search request end-to-end. Numbers reflect the user-observable latency, not cherry-picked API timings.
Try it on your workload
Every plan includes the cache layer. Turn it on per collection and keep the TTL that makes sense for your data.
Start free