LEANN/apps/multimodal/vision-based-pdf-multi-vector/multi-vector-leann-similarity-map.py at 76cc798e3e99f2cb456baf5874e4a0e6958e912d

Files

Yichuan Wang 76cc798e3e Feat/multi vector timing and dataset improvements (#181 )

* Add timing instrumentation and multi-dataset support for multi-vector retrieval

- Add timing measurements for search operations (load and core time)
- Increase embedding batch size from 1 to 32 for better performance
- Add explicit memory cleanup with del all_embeddings
- Support loading and merging multiple datasets with different splits
- Add CLI arguments for search method selection (ann/exact/exact-all)
- Auto-detect image field names across different dataset structures
- Print candidate doc counts for performance monitoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* update vidore

* reproduce docvqa results

* reproduce docvqa results and add debug file

* fix: format colqwen_forward.py to pass pre-commit checks

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-12-03 01:10:49 -08:00

27 KiB

Raw Blame History

View Raw

27 KiB Raw Blame History

27 KiB

Raw Blame History