LEANN

Author	SHA1	Message	Date
yichuan-w	9996c29618	format	2025-12-20 01:27:54 +00:00
yichuan-w	a878d2459b	Format code style in leann_multi_vector.py for better readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-17 09:02:48 +00:00
yichuan-w	6c39a3427f	Add custom folder support and improve image loading for multi-vector retrieval - Enhanced _load_images_from_dir with recursive search support and better error handling - Added support for WebP format and RGB conversion for all image modes - Added custom folder CLI arguments (--custom-folder, --recursive, --rebuild-index) - Improved documentation and removed completed TODO comment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-17 08:53:41 +00:00
Yichuan Wang	a0bbf831db	Add ColQwen2.5 model support and improve model selection (#183 ) - Add ColQwen2.5 and ColQwen2_5_Processor imports - Implement smart model type detection for colqwen2, colqwen2.5, and colpali - Add task name aliases for easier benchmark invocation - Add safe model name handling for file paths and index naming - Support custom model paths including LoRA adapters - Improve model choice validation and error handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 03:36:55 -08:00
Yichuan Wang	76cc798e3e	Feat/multi vector timing and dataset improvements (#181 ) * Add timing instrumentation and multi-dataset support for multi-vector retrieval - Add timing measurements for search operations (load and core time) - Increase embedding batch size from 1 to 32 for better performance - Add explicit memory cleanup with del all_embeddings - Support loading and merging multiple datasets with different splits - Add CLI arguments for search method selection (ann/exact/exact-all) - Auto-detect image field names across different dataset structures - Print candidate doc counts for performance monitoring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * update vidore * reproduce docvqa results * reproduce docvqa results and add debug file * fix: format colqwen_forward.py to pass pre-commit checks --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-03 01:10:49 -08:00
Yichuan Wang	d599566fd7	Revert "[Multi-vector]Add timing instrumentation and multi-dataset support fo…" (#180 ) This reverts commit `00770aebbb`.	2025-12-03 01:09:39 -08:00
Yichuan Wang	00770aebbb	[Multi-vector]Add timing instrumentation and multi-dataset support for multi-vector… (#161 ) * Add timing instrumentation and multi-dataset support for multi-vector retrieval - Add timing measurements for search operations (load and core time) - Increase embedding batch size from 1 to 32 for better performance - Add explicit memory cleanup with del all_embeddings - Support loading and merging multiple datasets with different splits - Add CLI arguments for search method selection (ann/exact/exact-all) - Auto-detect image field names across different dataset structures - Print candidate doc counts for performance monitoring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * update vidore * reproduce docvqa results * reproduce docvqa results and add debug file --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-03 00:55:42 -08:00
yichuan-w	3766ad1fd2	robust multi-vector	2025-11-09 02:34:53 +00:00
yichuan-w	dc6c9f696e	update some search in copali	2025-11-08 08:53:03 +00:00
yichuan520030910320	01475c10a0	add img	2025-09-23 23:25:05 -07:00
yichuan520030910320	576beb13db	add doc about multimodal	2025-09-23 23:21:03 -07:00
Yichuan Wang	edde0cdeb2	[Feat] ColQwen intergration (#111 ) * add colqwen stuff * add colqwen stuff and pass ruff * remove ipynb	2025-09-23 17:51:29 -07:00

12 Commits