* Add timing instrumentation and multi-dataset support for multi-vector retrieval - Add timing measurements for search operations (load and core time) - Increase embedding batch size from 1 to 32 for better performance - Add explicit memory cleanup with del all_embeddings - Support loading and merging multiple datasets with different splits - Add CLI arguments for search method selection (ann/exact/exact-all) - Auto-detect image field names across different dataset structures - Print candidate doc counts for performance monitoring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * update vidore * reproduce docvqa results * reproduce docvqa results and add debug file * fix: format colqwen_forward.py to pass pre-commit checks --------- Co-authored-by: Claude <noreply@anthropic.com>
112 lines
1.8 KiB
Plaintext
Executable File
112 lines
1.8 KiB
Plaintext
Executable File
raw_data/
|
|
scaling_out/
|
|
scaling_out_old/
|
|
sanity_check/
|
|
demo/indices/
|
|
# .vscode/
|
|
*.log
|
|
*pycache*
|
|
outputs/
|
|
*.pkl
|
|
*.pdf
|
|
*.idx
|
|
*.map
|
|
.history/
|
|
lm_eval.egg-info/
|
|
demo/experiment_results/**/*.json
|
|
*.jsonl
|
|
*.eml
|
|
*.emlx
|
|
*.json
|
|
*.png
|
|
!.vscode/*.json
|
|
*.sh
|
|
*.txt
|
|
!CMakeLists.txt
|
|
!llms.txt
|
|
latency_breakdown*.json
|
|
experiment_results/eval_results/diskann/*.json
|
|
aws/
|
|
.venv/
|
|
.cursor/rules/
|
|
*.egg-info/
|
|
skip_reorder_comparison/
|
|
analysis_results/
|
|
build/
|
|
.cache/
|
|
nprobe_logs/
|
|
micro/results
|
|
micro/contriever-INT8
|
|
data/*
|
|
!data/2501.14312v1 (1).pdf
|
|
!data/2506.08276v1.pdf
|
|
!data/PrideandPrejudice.txt
|
|
!data/huawei_pangu.md
|
|
!data/ground_truth/
|
|
!data/indices/
|
|
!data/queries/
|
|
!data/.gitattributes
|
|
*.qdstrm
|
|
benchmark_results/
|
|
results/
|
|
frac_*.png
|
|
final_in_*.png
|
|
embedding_comparison_results/
|
|
*.ind
|
|
*.gz
|
|
*.fvecs
|
|
*.ivecs
|
|
*.index
|
|
*.bin
|
|
*.old
|
|
|
|
read_graph
|
|
analyze_diskann_graph
|
|
degree_distribution.png
|
|
micro/degree_distribution.png
|
|
|
|
policy_results_*
|
|
results_*/
|
|
experiment_results/
|
|
.DS_Store
|
|
|
|
# The above are inherited from old Power RAG repo
|
|
|
|
# Python-generated files
|
|
__pycache__/
|
|
*.py[oc]
|
|
build/
|
|
dist/
|
|
wheels/
|
|
*.egg-info
|
|
|
|
# Virtual environments
|
|
.venv
|
|
.env
|
|
|
|
test_indices*/
|
|
test_*.py
|
|
!tests/**
|
|
packages/leann-backend-diskann/third_party/DiskANN/_deps/
|
|
|
|
*.meta.json
|
|
*.passages.json
|
|
*.npy
|
|
*.db
|
|
batchtest.py
|
|
tests/__pytest_cache__/
|
|
tests/__pycache__/
|
|
benchmarks/data/
|
|
|
|
## multi vector
|
|
apps/multimodal/vision-based-pdf-multi-vector/multi-vector-colpali-native-weaviate.py
|
|
|
|
# Ignore all PDFs (keep data exceptions above) and do not track demo PDFs
|
|
# If you need to commit a specific demo PDF, remove this negation locally.
|
|
# The following line used to force-add a large demo PDF; remove it to satisfy pre-commit:
|
|
# !apps/multimodal/vision-based-pdf-multi-vector/pdfs/2004.12832v2.pdf
|
|
!apps/multimodal/vision-based-pdf-multi-vector/fig/*
|
|
|
|
# AUR build directory (Arch Linux)
|
|
paru-bin/
|