Benchmark

Caching that delivers 37% more throughput

Real-world load test against a production tours page backed by Lunexa. One toggle, measurable gains across latency, throughput, and tail percentiles — no code changes.

More throughput

+37%

7.62 → 10.42 req/s

Faster median

−25%

1.22 s → 0.91 s

Tighter p95

−20%

2.06 s → 1.65 s

Server TTFB

−26%

−157 ms avg

Before vs. after the cache

10 concurrent workers hitting the same endpoint for 10 seconds. All other infrastructure held constant — only the Lunexa collection cache toggle changed between runs.

MetricCache offCache onImprovement
Requests completed82108+31.7%
Throughput (req/s)7.6210.42+36.8%
Average latency1.308 s0.948 s−27.5%
Median (p50)1.220 s0.913 s−25.2%
p90 latency2.035 s1.602 s−21.3%
p95 latency2.057 s1.652 s−19.7%
Fastest response0.528 s0.308 s−41.7%
Server TTFB (avg)0.597 s0.440 s−26.3%

Measured on 2026-04-19 against a staging frontend backed by Lunexa production infrastructure. TTFB = time from request sent to first byte of response received.

Tail latency, not just averages

Caching helps the worst 10% of requests — the ones your users actually remember. Averages hide outliers; percentiles don't.

Cache off

1.31 savg

p50
1.22 s
p75
1.41 s
p90
2.04 s
p95
2.06 s

Cache on

0.95 savg

p50
0.91 s
p75
1.19 s
p90
1.60 s
p95
1.65 s

What the cache actually does

Exact-query cache

Identical search payloads (same query, same filters, same facets) skip the search engine and return the previous result within TTL.

Per-collection TTL

Each collection has its own cache_ttl_seconds — product catalogs that change nightly can use 1 hour; fast-moving news indexes 30 seconds.

Natural-Language cache

NL queries have a separate, typically longer TTL (nl_cache_ttl_seconds) because the embedding call is the dominant cost. Default 24 hours.

Per-user key isolation

API-key-scoped caches prevent one tenant's cache state from affecting another's, even when keys sit on top of the same collection.

Enable it in under a minute

  1. 1

    Open your collection settings

    Dashboard → your project → Collections → pick the collection → Settings tab.

  2. 2

    Toggle Use cache

    Set cache_ttl_seconds (default 120 s is a sensible starting point) and, if you use NL Search, nl_cache_ttl_seconds (default 86400 s).

  3. 3

    Save — no deploy required

    The change is live on the next request. Roll back any time by toggling it off.

How we measured

Workload: customer product-catalog page (server-rendered, one Lunexa search per request)

Tool: hey (10 concurrent, 10 s duration)

Date: 2026-04-19

Infra: identical between runs; only cache toggle flipped

Sample: 82 requests (off), 108 requests (on)

The frontend page makes a Lunexa search call server-side before rendering, so every front-end request is a real search request end-to-end. Numbers reflect the user-observable latency, not cherry-picked API timings.

Try it on your workload

Every plan includes the cache layer. Turn it on per collection and keep the TTL that makes sense for your data.

Start free