docs: Weaken DiskANN emphasis in README

- Change backend description to emphasize HNSW as default
- DiskANN positioned as optional for billion-scale datasets
- Simplify evaluation commands to be more generic
This commit is contained in:
Andy Lee
2025-08-04 17:51:21 -07:00
parent 063c687ff7
commit e872dd1d23

View File

@@ -516,7 +516,7 @@ Options:
- **Dynamic batching:** Efficiently batch embedding computations for GPU utilization - **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
- **Two-level search:** Smart graph traversal that prioritizes promising nodes - **Two-level search:** Smart graph traversal that prioritizes promising nodes
**Backends:** DiskANN or HNSW - pick what works for your data size. **Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
## Benchmarks ## Benchmarks
@@ -536,8 +536,7 @@ Options:
```bash ```bash
uv pip install -e ".[dev]" # Install dev dependencies uv pip install -e ".[dev]" # Install dev dependencies
python benchmarks/run_evaluation.py data/indices/dpr/dpr_diskann # DPR dataset python benchmarks/run_evaluation.py # Will auto-download evaluation data and run benchmarks
python benchmarks/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index # Wikipedia
``` ```
The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data! The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!