chore: sync changes; fix Ruff import order; update examples, benchmarks, and dependencies

- Fix import order in packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_backend.py (Ruff I001) - Update benchmarks/run_evaluation.py - Update apps/base_rag_example.py and leann-core API usage - Add benchmarks/data/README.md - Update uv.lock - Misc cleanup - Note: added paru-bin as an embedded git repo; consider making it a submodule (git rm --cached paru-bin) if unintended
2025-08-18 15:49:16 -07:00
parent be405a5851
commit 0d232021f9
6 changed files with 3630 additions and 3774 deletions
--- a/benchmarks/data/README.md
+++ b/benchmarks/data/README.md
@@ -0,0 +1,44 @@
+---
+license: mit
+---
+
+# LEANN-RAG Evaluation Data
+
+This repository contains the necessary data to run the recall evaluation scripts for the [LEANN-RAG](https://huggingface.co/LEANN-RAG) project.
+
+## Dataset Components
+
+This dataset is structured into three main parts:
+
+1.  **Pre-built LEANN Indices**:
+    *   `dpr/`: A pre-built index for the DPR dataset.
+    *   `rpj_wiki/`: A pre-built index for the RPJ-Wiki dataset.
+    These indices were created using the `leann-core` library and are required by the `LeannSearcher`.
+
+2.  **Ground Truth Data**:
+    *   `ground_truth/`: Contains the ground truth files (`flat_results_nq_k3.json`) for both the DPR and RPJ-Wiki datasets. These files map queries to the original passage IDs from the Natural Questions benchmark, evaluated using the Contriever model.
+
+3.  **Queries**:
+    *   `queries/`: Contains the `nq_open.jsonl` file with the Natural Questions queries used for the evaluation.
+
+## Usage
+
+To use this data, you can download it locally using the `huggingface-hub` library. First, install the library:
+
+```bash
+pip install huggingface-hub
+```
+
+Then, you can download the entire dataset to a local directory (e.g., `data/`) with the following Python script:
+
+```python
+from huggingface_hub import snapshot_download
+
+snapshot_download(
+    repo_id="LEANN-RAG/leann-rag-evaluation-data",
+    repo_type="dataset",
+    local_dir="data"
+)
+```
+
+This will download all the necessary files into a local `data` folder, preserving the repository structure. The evaluation scripts in the main [LEANN-RAG Space](https://huggingface.co/LEANN-RAG) are configured to work with this data structure.
--- a/benchmarks/run_evaluation.py
+++ b/benchmarks/run_evaluation.py
@@ -12,7 +12,7 @@ import time
 from pathlib import Path

 import numpy as np
-from leann.api import LeannBuilder, LeannSearcher
+from leann.api import LeannBuilder, LeannChat, LeannSearcher


 def download_data_if_needed(data_root: Path, download_embeddings: bool = False):
@@ -197,6 +197,19 @@ def main():
    parser.add_argument(
        "--ef-search", type=int, default=120, help="The 'efSearch' parameter for HNSW."
    )
+    parser.add_argument(
+        "--llm-type",
+        type=str,
+        choices=["ollama", "hf", "openai", "gemini", "simulated"],
+        default="ollama",
+        help="LLM backend type to optionally query during evaluation (default: ollama)",
+    )
+    parser.add_argument(
+        "--llm-model",
+        type=str,
+        default="qwen3:1.7b",
+        help="LLM model identifier for the chosen backend (default: qwen3:1.7b)",
+    )
    args = parser.parse_args()

    # --- Path Configuration ---
@@ -318,9 +331,14 @@ def main():

        for i in range(num_eval_queries):
            start_time = time.time()
-            new_results = searcher.search(queries[i], top_k=args.top_k, ef=args.ef_search)
+            new_results = searcher.search(queries[i], top_k=args.top_k, complexity=args.ef_search)
            search_times.append(time.time() - start_time)

+            # Optional: also call the LLM with configurable backend/model (does not affect recall)
+            llm_config = {"type": args.llm_type, "model": args.llm_model}
+            chat = LeannChat(args.index_path, llm_config=llm_config, searcher=searcher)
+            answer = chat.ask(queries[i], top_k=args.top_k, complexity=args.ef_search)
+            print(f"Answer: {answer}")
            # Correct Recall Calculation: Based on TEXT content
            new_texts = {result.text for result in new_results}