Experiments (#68)

* feat: finance bench * docs: results * chore: ignroe data README * feat: fix financebench * feat: laion, also required idmaps support * style: format * style: format * fix: resolve ruff linting errors - Remove unused variables in benchmark scripts - Rename unused loop variables to follow convention * feat: enron email bench * experiments for running DiskANN & BM25 on Arch 4090 * style: format * chore(ci): remove paru-bin submodule and config to fix checkout --recurse-submodules * docs: data * docs: data updated * fix: as package * fix(ci): only run pre-commit * chore: use http url of astchunk; use group for some dev deps * fix(ci): should checkout modules as well since `uv sync` checks * fix(ci): run with lint only * fix: find links to install wheels available * CI: force local wheels in uv install step * CI: install local wheels via file paths * CI: pick wheels matching current Python tag * CI: handle python tag mismatches for local wheels * CI: use matrix python venv and set macOS deployment target * CI: revert install step to match main * CI: use uv group install with local wheel selection * CI: rely on setup-uv for Python and tighten group install * CI: install build deps with uv python interpreter * CI: use temporary uv venv for build deps * CI: add build venv scripts path for wheel repair
2025-09-24 11:19:04 -07:00
parent 01475c10a0
commit fecee94af1
30 changed files with 6869 additions and 1439 deletions
--- a/benchmarks/bm25_diskann_baselines/README.md
+++ b/benchmarks/bm25_diskann_baselines/README.md
@@ -0,0 +1,23 @@
+BM25 vs DiskANN Baselines
+
+```bash
+aws s3 sync s3://powerrag-diskann-rpj-wiki-20250824-224037-194d640c/bm25_rpj_wiki/index_en_only/ benchmarks/data/indices/bm25_index/
+aws s3 sync s3://powerrag-diskann-rpj-wiki-20250824-224037-194d640c/diskann_rpj_wiki/ benchmarks/data/indices/diskann_rpj_wiki/
+```
+
+- Dataset: `benchmarks/data/queries/nq_open.jsonl` (Natural Questions)
+- Machine-specific; results measured locally with the current repo.
+
+DiskANN (NQ queries, search-only)
+- Command: `uv run --script benchmarks/bm25_diskann_baselines/run_diskann.py`
+- Settings: `recompute_embeddings=False`, embeddings precomputed (excluded from timing), batching off, caching off (`cache_mechanism=2`, `num_nodes_to_cache=0`)
+- Result: avg 0.011093 s/query, QPS 90.15 (p50 0.010731 s, p95 0.015000 s)
+
+BM25
+- Command: `uv run --script benchmarks/bm25_diskann_baselines/run_bm25.py`
+- Settings: `k=10`, `k1=0.9`, `b=0.4`, queries=100
+- Result: avg 0.028589 s/query, QPS 34.97 (p50 0.026060 s, p90 0.043695 s, p95 0.053260 s, p99 0.055257 s)
+
+Notes
+- DiskANN measures search-only latency on real NQ queries (embeddings computed beforehand and excluded from timing).
+- Use `benchmarks/bm25_diskann_baselines/run_diskann.py` for DiskANN; `benchmarks/bm25_diskann_baselines/run_bm25.py` for BM25.