LEANN/benchmarks/update/README.md

# Update Benchmarks

This directory hosts two benchmark suites that exercise LEANN’s HNSW “update +
search” pipeline under different assumptions:

1. **RNG recompute latency** – measure how random-neighbour pruning and cache
   settings influence incremental `add()` latency when embeddings are fetched
   over the ZMQ embedding server.
2. **Update strategy comparison** – compare a fully sequential update pipeline
   against an offline approach that keeps the graph static and fuses results.

Both suites build a non-compact, `is_recompute=True` index so that new
embeddings are pulled from the embedding server. Benchmark outputs are written
under `.leann/bench/` by default and appended to CSV files for later plotting.

## Benchmarks

### 1. HNSW RNG Recompute Benchmark

`bench_hnsw_rng_recompute.py` evaluates incremental update latency under four
random-neighbour (RNG) configurations. Each scenario uses the same dataset but
changes the forward / reverse RNG pruning flags and whether the embedding cache
is enabled:

| Scenario name                      | Forward RNG | Reverse RNG | ZMQ embedding cache |
| ---------------------------------- | ----------- | ----------- | ------------------- |
| `baseline`                         | Enabled     | Enabled     | Enabled             |
| `no_cache_baseline`                | Enabled     | Enabled     | **Disabled**        |
| `disable_forward_rng`              | **Disabled**| Enabled     | Enabled             |
| `disable_forward_and_reverse_rng`  | **Disabled**| **Disabled**| Enabled             |

For each scenario the script:
1. (Re)builds a `is_recompute=True` index and writes it to `.leann/bench/`.
2. Starts `leann_backend_hnsw.hnsw_embedding_server` for remote embeddings.
3. Appends the requested updates using the scenario’s RNG flags.
4. Records total time, latency per passage, ZMQ fetch counts, and stage-level
   timings before appending a row to the CSV output.

**Run:**
```bash
LEANN_HNSW_LOG_PATH=.leann/bench/hnsw_server.log \
LEANN_LOG_LEVEL=INFO \
uv run -m benchmarks.update.bench_hnsw_rng_recompute \
  --runs 1 \
  --index-path .leann/bench/test.leann \
  --initial-files data/PrideandPrejudice.txt \
  --update-files data/huawei_pangu.md \
  --max-initial 300 \
  --max-updates 1 \
  --add-timeout 120
```

**Output:**
- `benchmarks/update/bench_results.csv` – per-scenario timing statistics
  (including ms/passage) for each run.
- `.leann/bench/hnsw_server.log` – detailed ZMQ/server logs (path controlled by
  `LEANN_HNSW_LOG_PATH`).
  _The reference CSVs checked into this branch were generated on a workstation with an NVIDIA RTX 4090 GPU; throughput numbers will differ on other hardware._

### 2. Sequential vs. Offline Update Benchmark

`bench_update_vs_offline_search.py` compares two end-to-end strategies on the
same dataset:

- **Scenario A – Sequential Update**
  - Start an embedding server.
  - Sequentially call `index.add()`; each call fetches embeddings via ZMQ and
    mutates the HNSW graph.
  - After all inserts, run a search on the updated graph.
  - Metrics recorded: update time (`add_total_s`), post-update search time
    (`search_time_s`), combined total (`total_time_s`), and per-passage
    latency.

- **Scenario B – Offline Embedding + Concurrent Search**
  - Stop Scenario A’s server and start a fresh embedding server.
  - Spawn two threads: one generates embeddings for the new passages offline
    (graph unchanged); the other computes the query embedding and searches the
    existing graph.
  - Merge offline similarities with the graph search results to emulate late
    fusion, then report the merged top‑k preview.
  - Metrics recorded: embedding time (`emb_time_s`), search time
    (`search_time_s`), concurrent makespan (`makespan_s`), and scenario total.

**Run (both scenarios):**
```bash
uv run -m benchmarks.update.bench_update_vs_offline_search \
  --index-path .leann/bench/offline_vs_update.leann \
  --max-initial 300 \
  --num-updates 1
```

You can pass `--only A` or `--only B` to run a single scenario. The script will
print timing summaries to stdout and append the results to CSV.

**Output:**
- `benchmarks/update/offline_vs_update.csv` – per-scenario timing statistics for
  Scenario A and B.
- Console output includes Scenario B’s merged top‑k preview for quick sanity
  checks.
  _The sample results committed here come from runs on an RTX 4090-equipped machine; expect variations if you benchmark on different GPUs._

### 3. Visualisation

`plot_bench_results.py` combines the RNG benchmark and the update strategy
benchmark into a single two-panel plot.

**Run:**
```bash
uv run -m benchmarks.update.plot_bench_results \
  --csv benchmarks/update/bench_results.csv \
  --csv-right benchmarks/update/offline_vs_update.csv \
  --out benchmarks/update/bench_latency_from_csv.png
```

**Options:**
- `--broken-y` – Enable a broken Y-axis (default: true when appropriate).
- `--csv` – RNG benchmark results CSV (left panel).
- `--csv-right` – Update strategy results CSV (right panel).
- `--out` – Output image path (PNG/PDF supported).

**Output:**
- `benchmarks/update/bench_latency_from_csv.png` – visual comparison of the two
  suites.
- `benchmarks/update/bench_latency_from_csv.pdf` – PDF version, suitable for
  slides/papers.

## Parameters & Environment

### Common CLI Flags
- `--max-initial` – Number of initial passages used to seed the index.
- `--max-updates` / `--num-updates` – Number of passages to treat as updates.
- `--index-path` – Base path (without extension) where the LEANN index is stored.
- `--runs` – Number of repetitions (RNG benchmark only).

### Environment Variables
- `LEANN_HNSW_LOG_PATH` – File to receive embedding-server logs (optional).
- `LEANN_LOG_LEVEL` – Logging verbosity (DEBUG/INFO/WARNING/ERROR).
- `CUDA_VISIBLE_DEVICES` – Set to empty string if you want to force CPU
  execution of the embedding model.

With these scripts you can easily replicate LEANN’s update benchmarks, compare
multiple RNG strategies, and evaluate whether sequential updates or offline
fusion better match your latency/accuracy trade-offs.