Files

History

Andy Lee d4f5f2896f Faster Update (#148 )

* stash

* stash

* add std err in add and trace progress

* fix.

* docs

* style: format

* docs

* better figs

* better figs

* update results

* fotmat

---------

Co-authored-by: yichuan-w <yichuan-w@users.noreply.github.com>

2025-11-05 13:37:47 -08:00

__init__.py

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

bench_hnsw_rng_recompute.py

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

bench_results.csv

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

bench_update_vs_offline_search.py

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

offline_vs_update.csv

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

plot_bench_results.py

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

README.md

Faster Update (#148 )

2025-11-05 13:37:47 -08:00

README.md

Update Benchmarks

This directory hosts two benchmark suites that exercise LEANN’s HNSW “update + search” pipeline under different assumptions:

RNG recompute latency – measure how random-neighbour pruning and cache settings influence incremental add() latency when embeddings are fetched over the ZMQ embedding server.
Update strategy comparison – compare a fully sequential update pipeline against an offline approach that keeps the graph static and fuses results.

Both suites build a non-compact, is_recompute=True index so that new embeddings are pulled from the embedding server. Benchmark outputs are written under .leann/bench/ by default and appended to CSV files for later plotting.

Benchmarks

1. HNSW RNG Recompute Benchmark

bench_hnsw_rng_recompute.py evaluates incremental update latency under four random-neighbour (RNG) configurations. Each scenario uses the same dataset but changes the forward / reverse RNG pruning flags and whether the embedding cache is enabled:

Scenario name	Forward RNG	Reverse RNG	ZMQ embedding cache
`baseline`	Enabled	Enabled	Enabled
`no_cache_baseline`	Enabled	Enabled	Disabled
`disable_forward_rng`	Disabled	Enabled	Enabled
`disable_forward_and_reverse_rng`	Disabled	Disabled	Enabled

For each scenario the script:

(Re)builds a is_recompute=True index and writes it to .leann/bench/.
Starts leann_backend_hnsw.hnsw_embedding_server for remote embeddings.
Appends the requested updates using the scenario’s RNG flags.
Records total time, latency per passage, ZMQ fetch counts, and stage-level timings before appending a row to the CSV output.

Run:

LEANN_HNSW_LOG_PATH=.leann/bench/hnsw_server.log \
LEANN_LOG_LEVEL=INFO \
uv run -m benchmarks.update.bench_hnsw_rng_recompute \
  --runs 1 \
  --index-path .leann/bench/test.leann \
  --initial-files data/PrideandPrejudice.txt \
  --update-files data/huawei_pangu.md \
  --max-initial 300 \
  --max-updates 1 \
  --add-timeout 120

Output:

benchmarks/update/bench_results.csv – per-scenario timing statistics (including ms/passage) for each run.
.leann/bench/hnsw_server.log – detailed ZMQ/server logs (path controlled by LEANN_HNSW_LOG_PATH). The reference CSVs checked into this branch were generated on a workstation with an NVIDIA RTX 4090 GPU; throughput numbers will differ on other hardware.

2. Sequential vs. Offline Update Benchmark

bench_update_vs_offline_search.py compares two end-to-end strategies on the same dataset:

Scenario A – Sequential Update
- Start an embedding server.
- Sequentially call index.add(); each call fetches embeddings via ZMQ and mutates the HNSW graph.
- After all inserts, run a search on the updated graph.
- Metrics recorded: update time (add_total_s), post-update search time (search_time_s), combined total (total_time_s), and per-passage latency.
Scenario B – Offline Embedding + Concurrent Search
- Stop Scenario A’s server and start a fresh embedding server.
- Spawn two threads: one generates embeddings for the new passages offline (graph unchanged); the other computes the query embedding and searches the existing graph.
- Merge offline similarities with the graph search results to emulate late fusion, then report the merged top‑k preview.
- Metrics recorded: embedding time (emb_time_s), search time (search_time_s), concurrent makespan (makespan_s), and scenario total.

Run (both scenarios):

uv run -m benchmarks.update.bench_update_vs_offline_search \
  --index-path .leann/bench/offline_vs_update.leann \
  --max-initial 300 \
  --num-updates 1

You can pass --only A or --only B to run a single scenario. The script will print timing summaries to stdout and append the results to CSV.

Output:

benchmarks/update/offline_vs_update.csv – per-scenario timing statistics for Scenario A and B.
Console output includes Scenario B’s merged top‑k preview for quick sanity checks. The sample results committed here come from runs on an RTX 4090-equipped machine; expect variations if you benchmark on different GPUs.

3. Visualisation

plot_bench_results.py combines the RNG benchmark and the update strategy benchmark into a single two-panel plot.

Run:

uv run -m benchmarks.update.plot_bench_results \
  --csv benchmarks/update/bench_results.csv \
  --csv-right benchmarks/update/offline_vs_update.csv \
  --out benchmarks/update/bench_latency_from_csv.png

Options:

--broken-y – Enable a broken Y-axis (default: true when appropriate).
--csv – RNG benchmark results CSV (left panel).
--csv-right – Update strategy results CSV (right panel).
--out – Output image path (PNG/PDF supported).

Output:

benchmarks/update/bench_latency_from_csv.png – visual comparison of the two suites.
benchmarks/update/bench_latency_from_csv.pdf – PDF version, suitable for slides/papers.

Parameters & Environment

Common CLI Flags

--max-initial – Number of initial passages used to seed the index.
--max-updates / --num-updates – Number of passages to treat as updates.
--index-path – Base path (without extension) where the LEANN index is stored.
--runs – Number of repetitions (RNG benchmark only).

Environment Variables

LEANN_HNSW_LOG_PATH – File to receive embedding-server logs (optional).
LEANN_LOG_LEVEL – Logging verbosity (DEBUG/INFO/WARNING/ERROR).
CUDA_VISIBLE_DEVICES – Set to empty string if you want to force CPU execution of the embedding model.

With these scripts you can easily replicate LEANN’s update benchmarks, compare multiple RNG strategies, and evaluate whether sequential updates or offline fusion better match your latency/accuracy trade-offs.

README.md Unescape Escape

Update Benchmarks

Benchmarks

1. HNSW RNG Recompute Benchmark

2. Sequential vs. Offline Update Benchmark

3. Visualisation

Parameters & Environment

Common CLI Flags

Environment Variables

README.md