Files
LEANN/benchmarks/update
Andy Lee d4f5f2896f Faster Update (#148)
* stash

* stash

* add std err in add and trace progress

* fix.

* docs

* style: format

* docs

* better figs

* better figs

* update results

* fotmat

---------

Co-authored-by: yichuan-w <yichuan-w@users.noreply.github.com>
2025-11-05 13:37:47 -08:00
..
2025-11-05 13:37:47 -08:00
2025-11-05 13:37:47 -08:00
2025-11-05 13:37:47 -08:00
2025-11-05 13:37:47 -08:00
2025-11-05 13:37:47 -08:00

Update Benchmarks

This directory hosts two benchmark suites that exercise LEANNs HNSW “update + search” pipeline under different assumptions:

  1. RNG recompute latency measure how random-neighbour pruning and cache settings influence incremental add() latency when embeddings are fetched over the ZMQ embedding server.
  2. Update strategy comparison compare a fully sequential update pipeline against an offline approach that keeps the graph static and fuses results.

Both suites build a non-compact, is_recompute=True index so that new embeddings are pulled from the embedding server. Benchmark outputs are written under .leann/bench/ by default and appended to CSV files for later plotting.

Benchmarks

1. HNSW RNG Recompute Benchmark

bench_hnsw_rng_recompute.py evaluates incremental update latency under four random-neighbour (RNG) configurations. Each scenario uses the same dataset but changes the forward / reverse RNG pruning flags and whether the embedding cache is enabled:

Scenario name Forward RNG Reverse RNG ZMQ embedding cache
baseline Enabled Enabled Enabled
no_cache_baseline Enabled Enabled Disabled
disable_forward_rng Disabled Enabled Enabled
disable_forward_and_reverse_rng Disabled Disabled Enabled

For each scenario the script:

  1. (Re)builds a is_recompute=True index and writes it to .leann/bench/.
  2. Starts leann_backend_hnsw.hnsw_embedding_server for remote embeddings.
  3. Appends the requested updates using the scenarios RNG flags.
  4. Records total time, latency per passage, ZMQ fetch counts, and stage-level timings before appending a row to the CSV output.

Run:

LEANN_HNSW_LOG_PATH=.leann/bench/hnsw_server.log \
LEANN_LOG_LEVEL=INFO \
uv run -m benchmarks.update.bench_hnsw_rng_recompute \
  --runs 1 \
  --index-path .leann/bench/test.leann \
  --initial-files data/PrideandPrejudice.txt \
  --update-files data/huawei_pangu.md \
  --max-initial 300 \
  --max-updates 1 \
  --add-timeout 120

Output:

  • benchmarks/update/bench_results.csv per-scenario timing statistics (including ms/passage) for each run.
  • .leann/bench/hnsw_server.log detailed ZMQ/server logs (path controlled by LEANN_HNSW_LOG_PATH). The reference CSVs checked into this branch were generated on a workstation with an NVIDIA RTX 4090 GPU; throughput numbers will differ on other hardware.

2. Sequential vs. Offline Update Benchmark

bench_update_vs_offline_search.py compares two end-to-end strategies on the same dataset:

  • Scenario A Sequential Update

    • Start an embedding server.
    • Sequentially call index.add(); each call fetches embeddings via ZMQ and mutates the HNSW graph.
    • After all inserts, run a search on the updated graph.
    • Metrics recorded: update time (add_total_s), post-update search time (search_time_s), combined total (total_time_s), and per-passage latency.
  • Scenario B Offline Embedding + Concurrent Search

    • Stop Scenario As server and start a fresh embedding server.
    • Spawn two threads: one generates embeddings for the new passages offline (graph unchanged); the other computes the query embedding and searches the existing graph.
    • Merge offline similarities with the graph search results to emulate late fusion, then report the merged topk preview.
    • Metrics recorded: embedding time (emb_time_s), search time (search_time_s), concurrent makespan (makespan_s), and scenario total.

Run (both scenarios):

uv run -m benchmarks.update.bench_update_vs_offline_search \
  --index-path .leann/bench/offline_vs_update.leann \
  --max-initial 300 \
  --num-updates 1

You can pass --only A or --only B to run a single scenario. The script will print timing summaries to stdout and append the results to CSV.

Output:

  • benchmarks/update/offline_vs_update.csv per-scenario timing statistics for Scenario A and B.
  • Console output includes Scenario Bs merged topk preview for quick sanity checks. The sample results committed here come from runs on an RTX 4090-equipped machine; expect variations if you benchmark on different GPUs.

3. Visualisation

plot_bench_results.py combines the RNG benchmark and the update strategy benchmark into a single two-panel plot.

Run:

uv run -m benchmarks.update.plot_bench_results \
  --csv benchmarks/update/bench_results.csv \
  --csv-right benchmarks/update/offline_vs_update.csv \
  --out benchmarks/update/bench_latency_from_csv.png

Options:

  • --broken-y Enable a broken Y-axis (default: true when appropriate).
  • --csv RNG benchmark results CSV (left panel).
  • --csv-right Update strategy results CSV (right panel).
  • --out Output image path (PNG/PDF supported).

Output:

  • benchmarks/update/bench_latency_from_csv.png visual comparison of the two suites.
  • benchmarks/update/bench_latency_from_csv.pdf PDF version, suitable for slides/papers.

Parameters & Environment

Common CLI Flags

  • --max-initial Number of initial passages used to seed the index.
  • --max-updates / --num-updates Number of passages to treat as updates.
  • --index-path Base path (without extension) where the LEANN index is stored.
  • --runs Number of repetitions (RNG benchmark only).

Environment Variables

  • LEANN_HNSW_LOG_PATH File to receive embedding-server logs (optional).
  • LEANN_LOG_LEVEL Logging verbosity (DEBUG/INFO/WARNING/ERROR).
  • CUDA_VISIBLE_DEVICES Set to empty string if you want to force CPU execution of the embedding model.

With these scripts you can easily replicate LEANNs update benchmarks, compare multiple RNG strategies, and evaluate whether sequential updates or offline fusion better match your latency/accuracy trade-offs.