* Add ty type checker to CI and fix type errors
- Add ty (Astral's fast Python type checker) to GitHub CI workflow
- Fix type annotations across all RAG apps:
- Update load_data return types from list[str] to list[dict[str, Any]]
- Fix base_rag_example.py to properly handle dict format from create_text_chunks
- Fix type errors in leann-core:
- chunking_utils.py: Add explicit type annotations
- cli.py: Fix return type annotations for PDF extraction functions
- interactive_utils.py: Fix readline import type handling
- Fix type errors in apps:
- wechat_history.py: Fix return type annotations
- document_rag.py, code_rag.py: Replace **kwargs with explicit arguments
- Add ty configuration to pyproject.toml
This resolves the bug introduced in PR #157 where create_text_chunks()
changed to return list[dict] but callers were not updated.
* Fix remaining ty type errors
- Fix slack_mcp_reader.py channel parameter can be None
- Fix embedding_compute.py ContextProp type issue
- Fix searcher_base.py method override signatures
- Fix chunking_utils.py chunk_text assignment
- Fix slack_rag.py and twitter_rag.py return types
- Fix email.py and image_rag.py method overrides
* Fix multimodal benchmark scripts type errors
- Fix undefined LeannRetriever -> LeannMultiVector
- Add proper type casts for HuggingFace Dataset iteration
- Cast task config values to correct types
- Add type annotations for dataset row dicts
* Enable ty check for multimodal scripts in CI
All type errors in multimodal scripts have been fixed, so we can now
include them in the CI type checking.
* Fix all test type errors and enable ty check on tests
- Fix test_basic.py: search() takes str not list
- Fix test_cli_prompt_template.py: add type: ignore for Mock assignments
- Fix test_prompt_template_persistence.py: match BaseSearcher.search signature
- Fix test_prompt_template_e2e.py: add type narrowing asserts after skip
- Fix test_readme_examples.py: use explicit kwargs instead of **model_args
- Fix metadata_filter.py: allow Optional[MetadataFilters]
- Update CI to run ty check on tests
* Format code with ruff
* Format searcher_base.py
* Add custom folder support and improve image loading for multi-vector retrieval
- Enhanced _load_images_from_dir with recursive search support and better error handling
- Added support for WebP format and RGB conversion for all image modes
- Added custom folder CLI arguments (--custom-folder, --recursive, --rebuild-index)
- Improved documentation and removed completed TODO comment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Format code style in leann_multi_vector.py for better readability
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* docs: polish README performance tip section
- Fix typo: 'matrilize' -> 'materialize'
- Improve clarity and formatting of --no-recompute flag explanation
- Add code block for better readability
* format
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Add custom folder support and improve image loading for multi-vector retrieval
- Enhanced _load_images_from_dir with recursive search support and better error handling
- Added support for WebP format and RGB conversion for all image modes
- Added custom folder CLI arguments (--custom-folder, --recursive, --rebuild-index)
- Improved documentation and removed completed TODO comment
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Format code style in leann_multi_vector.py for better readability
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add ColQwen2.5 and ColQwen2_5_Processor imports
- Implement smart model type detection for colqwen2, colqwen2.5, and colpali
- Add task name aliases for easier benchmark invocation
- Add safe model name handling for file paths and index naming
- Support custom model paths including LoRA adapters
- Improve model choice validation and error handling
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
* Add timing instrumentation and multi-dataset support for multi-vector retrieval
- Add timing measurements for search operations (load and core time)
- Increase embedding batch size from 1 to 32 for better performance
- Add explicit memory cleanup with del all_embeddings
- Support loading and merging multiple datasets with different splits
- Add CLI arguments for search method selection (ann/exact/exact-all)
- Auto-detect image field names across different dataset structures
- Print candidate doc counts for performance monitoring
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* update vidore
* reproduce docvqa results
* reproduce docvqa results and add debug file
* fix: format colqwen_forward.py to pass pre-commit checks
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Add timing instrumentation and multi-dataset support for multi-vector retrieval
- Add timing measurements for search operations (load and core time)
- Increase embedding batch size from 1 to 32 for better performance
- Add explicit memory cleanup with del all_embeddings
- Support loading and merging multiple datasets with different splits
- Add CLI arguments for search method selection (ann/exact/exact-all)
- Auto-detect image field names across different dataset structures
- Print candidate doc counts for performance monitoring
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* update vidore
* reproduce docvqa results
* reproduce docvqa results and add debug file
---------
Co-authored-by: Claude <noreply@anthropic.com>