Files
LEANN/benchmarks/data
yichuan-w 0d232021f9 chore: sync changes; fix Ruff import order; update examples, benchmarks, and dependencies
- Fix import order in packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_backend.py (Ruff I001)

- Update benchmarks/run_evaluation.py

- Update apps/base_rag_example.py and leann-core API usage

- Add benchmarks/data/README.md

- Update uv.lock

- Misc cleanup

- Note: added paru-bin as an embedded git repo; consider making it a submodule (git rm --cached paru-bin) if unintended
2025-08-18 15:51:13 -07:00
..
2025-08-04 22:50:32 -07:00

license
license
mit

LEANN-RAG Evaluation Data

This repository contains the necessary data to run the recall evaluation scripts for the LEANN-RAG project.

Dataset Components

This dataset is structured into three main parts:

  1. Pre-built LEANN Indices:

    • dpr/: A pre-built index for the DPR dataset.
    • rpj_wiki/: A pre-built index for the RPJ-Wiki dataset. These indices were created using the leann-core library and are required by the LeannSearcher.
  2. Ground Truth Data:

    • ground_truth/: Contains the ground truth files (flat_results_nq_k3.json) for both the DPR and RPJ-Wiki datasets. These files map queries to the original passage IDs from the Natural Questions benchmark, evaluated using the Contriever model.
  3. Queries:

    • queries/: Contains the nq_open.jsonl file with the Natural Questions queries used for the evaluation.

Usage

To use this data, you can download it locally using the huggingface-hub library. First, install the library:

pip install huggingface-hub

Then, you can download the entire dataset to a local directory (e.g., data/) with the following Python script:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="LEANN-RAG/leann-rag-evaluation-data",
    repo_type="dataset",
    local_dir="data"
)

This will download all the necessary files into a local data folder, preserving the repository structure. The evaluation scripts in the main LEANN-RAG Space are configured to work with this data structure.