Files
LEANN/.gitignore
Yichuan Wang dde2221513 [EXP] Update the benchmark code (#71)
* chore(hnsw): reorder imports to satisfy ruff I001

* chore: sync changes; fix Ruff import order; update examples, benchmarks, and dependencies

- Fix import order in packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_backend.py (Ruff I001)

- Update benchmarks/run_evaluation.py

- Update apps/base_rag_example.py and leann-core API usage

- Add benchmarks/data/README.md

- Update uv.lock

- Misc cleanup

- Note: added paru-bin as an embedded git repo; consider making it a submodule (git rm --cached paru-bin) if unintended

* chore: remove unintended embedded repo paru-bin and ignore it

Fix CI: avoid missing .gitmodules entry by removing gitlink and adding to .gitignore.

* ci: retrigger after removing unintended gitlink (paru-bin)

* feat(benchmarks): add --batch-size option and plumb through to HNSW search (default 0)

* feat(hnsw): add batch_size to LeannSearcher.search and LeannChat.ask; forward only for HNSW backend

* chore(logging): surface recompute and batching params; enable INFO logging in benchmark

* feat(embeddings): add optional manual tokenization path (HF tokenizer+model) with mean pooling; default remains SentenceTransformer.encode

* fix micro bench and fix pre commit

* update readme

---------

Co-authored-by: yichuan-w <yichuan-w@users.noreply.github.com>
2025-08-20 17:31:46 -07:00

99 lines
1.3 KiB
Plaintext
Executable File

raw_data/
scaling_out/
scaling_out_old/
sanity_check/
demo/indices/
# .vscode/
*.log
*pycache*
outputs/
*.pkl
*.pdf
*.idx
*.map
.history/
lm_eval.egg-info/
demo/experiment_results/**/*.json
*.jsonl
*.eml
*.emlx
*.json
!.vscode/*.json
*.sh
*.txt
!CMakeLists.txt
latency_breakdown*.json
experiment_results/eval_results/diskann/*.json
aws/
.venv/
.cursor/rules/
*.egg-info/
skip_reorder_comparison/
analysis_results/
build/
.cache/
nprobe_logs/
micro/results
micro/contriever-INT8
data/*
!data/2501.14312v1 (1).pdf
!data/2506.08276v1.pdf
!data/PrideandPrejudice.txt
!data/huawei_pangu.md
!data/ground_truth/
!data/indices/
!data/queries/
!data/.gitattributes
*.qdstrm
benchmark_results/
results/
frac_*.png
final_in_*.png
embedding_comparison_results/
*.ind
*.gz
*.fvecs
*.ivecs
*.index
*.bin
*.old
read_graph
analyze_diskann_graph
degree_distribution.png
micro/degree_distribution.png
policy_results_*
results_*/
experiment_results/
.DS_Store
# The above are inherited from old Power RAG repo
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info
# Virtual environments
.venv
.env
test_indices*/
test_*.py
!tests/**
packages/leann-backend-diskann/third_party/DiskANN/_deps/
*.meta.json
*.passages.json
batchtest.py
tests/__pytest_cache__/
tests/__pycache__/
paru-bin/
benchmarks/data/