Benchmark

Caching that delivers 37% more throughput

Real-world load test against a production tours page backed by Lunexa. One toggle, measurable gains across latency, throughput, and tail percentiles — no code changes.

More throughput

+37%

7.62 → 10.42 req/s

Faster median

−25%

1.22 s → 0.91 s

Tighter p95

−20%

2.06 s → 1.65 s

Server TTFB

−26%

−157 ms avg

Before vs. after the cache

10 concurrent workers hitting the same endpoint for 10 seconds. All other infrastructure held constant — only the Lunexa collection cache toggle changed between runs.

Metric	Cache off	Cache on	Improvement
Requests completed	82	108	+31.7%
Throughput (req/s)	7.62	10.42	+36.8%
Average latency	1.308 s	0.948 s	−27.5%
Median (p50)	1.220 s	0.913 s	−25.2%
p90 latency	2.035 s	1.602 s	−21.3%
p95 latency	2.057 s	1.652 s	−19.7%
Fastest response	0.528 s	0.308 s	−41.7%
Server TTFB (avg)	0.597 s	0.440 s	−26.3%

Measured on 2026-04-19 against a staging frontend backed by Lunexa production infrastructure. TTFB = time from request sent to first byte of response received.

Tail latency, not just averages

Caching helps the worst 10% of requests — the ones your users actually remember. Averages hide outliers; percentiles don't.

Cache off

1.31 savg

p50

1.22 s

p75

1.41 s

p90

2.04 s

p95

2.06 s

Cache on

0.95 savg

p50

0.91 s

p75

1.19 s

p90

1.60 s

p95

1.65 s

What the cache actually does

Exact-query cache

Identical search payloads (same query, same filters, same facets) skip the search engine and return the previous result within TTL.

Per-collection TTL

Each collection has its own cache_ttl_seconds — product catalogs that change nightly can use 1 hour; fast-moving news indexes 30 seconds.

Natural-Language cache

NL queries have a separate, typically longer TTL (nl_cache_ttl_seconds) because the embedding call is the dominant cost. Default 24 hours.

Per-user key isolation

API-key-scoped caches prevent one tenant's cache state from affecting another's, even when keys sit on top of the same collection.

Enable it in under a minute

1
Open your collection settings
Dashboard → your project → Collections → pick the collection → Settings tab.
2
Toggle Use cache
Set cache_ttl_seconds (default 120 s is a sensible starting point) and, if you use NL Search, nl_cache_ttl_seconds (default 86400 s).
3
Save — no deploy required
The change is live on the next request. Roll back any time by toggling it off.

How we measured

Workload: customer product-catalog page (server-rendered, one Lunexa search per request)

Tool: hey (10 concurrent, 10 s duration)

Date: 2026-04-19

Infra: identical between runs; only cache toggle flipped

Sample: 82 requests (off), 108 requests (on)

The frontend page makes a Lunexa search call server-side before rendering, so every front-end request is a real search request end-to-end. Numbers reflect the user-observable latency, not cherry-picked API timings.

Try it on your workload

Every plan includes the cache layer. Turn it on per collection and keep the TTL that makes sense for your data.

Start free