Files
Yichuan Wang 76cc798e3e Feat/multi vector timing and dataset improvements (#181)
* Add timing instrumentation and multi-dataset support for multi-vector retrieval

- Add timing measurements for search operations (load and core time)
- Increase embedding batch size from 1 to 32 for better performance
- Add explicit memory cleanup with del all_embeddings
- Support loading and merging multiple datasets with different splits
- Add CLI arguments for search method selection (ann/exact/exact-all)
- Auto-detect image field names across different dataset structures
- Print candidate doc counts for performance monitoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* update vidore

* reproduce docvqa results

* reproduce docvqa results and add debug file

* fix: format colqwen_forward.py to pass pre-commit checks

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-03 01:10:49 -08:00

112 lines
1.8 KiB
Plaintext
Executable File

raw_data/
scaling_out/
scaling_out_old/
sanity_check/
demo/indices/
# .vscode/
*.log
*pycache*
outputs/
*.pkl
*.pdf
*.idx
*.map
.history/
lm_eval.egg-info/
demo/experiment_results/**/*.json
*.jsonl
*.eml
*.emlx
*.json
*.png
!.vscode/*.json
*.sh
*.txt
!CMakeLists.txt
!llms.txt
latency_breakdown*.json
experiment_results/eval_results/diskann/*.json
aws/
.venv/
.cursor/rules/
*.egg-info/
skip_reorder_comparison/
analysis_results/
build/
.cache/
nprobe_logs/
micro/results
micro/contriever-INT8
data/*
!data/2501.14312v1 (1).pdf
!data/2506.08276v1.pdf
!data/PrideandPrejudice.txt
!data/huawei_pangu.md
!data/ground_truth/
!data/indices/
!data/queries/
!data/.gitattributes
*.qdstrm
benchmark_results/
results/
frac_*.png
final_in_*.png
embedding_comparison_results/
*.ind
*.gz
*.fvecs
*.ivecs
*.index
*.bin
*.old
read_graph
analyze_diskann_graph
degree_distribution.png
micro/degree_distribution.png
policy_results_*
results_*/
experiment_results/
.DS_Store
# The above are inherited from old Power RAG repo
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info
# Virtual environments
.venv
.env
test_indices*/
test_*.py
!tests/**
packages/leann-backend-diskann/third_party/DiskANN/_deps/
*.meta.json
*.passages.json
*.npy
*.db
batchtest.py
tests/__pytest_cache__/
tests/__pycache__/
benchmarks/data/
## multi vector
apps/multimodal/vision-based-pdf-multi-vector/multi-vector-colpali-native-weaviate.py
# Ignore all PDFs (keep data exceptions above) and do not track demo PDFs
# If you need to commit a specific demo PDF, remove this negation locally.
# The following line used to force-add a large demo PDF; remove it to satisfy pre-commit:
# !apps/multimodal/vision-based-pdf-multi-vector/pdfs/2004.12832v2.pdf
!apps/multimodal/vision-based-pdf-multi-vector/fig/*
# AUR build directory (Arch Linux)
paru-bin/