Add COLQWEN_GUIDE.md to docs/ directory for proper documentation structure. This file is referenced in the README and needs to be tracked in git.
5.4 KiB
5.4 KiB
ColQwen Integration Guide
Easy-to-use multimodal PDF retrieval with ColQwen2/ColPali models.
Quick Start
🍎 Mac Users: ColQwen is optimized for Apple Silicon with MPS acceleration for faster inference!
1. Install Dependencies
uv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn
brew install poppler # macOS only, for PDF processing
2. Basic Usage
# Build index from PDFs
python -m apps.colqwen_rag build --pdfs ./my_papers/ --index research_papers
# Search with text queries
python -m apps.colqwen_rag search research_papers "How does attention mechanism work?"
# Interactive Q&A
python -m apps.colqwen_rag ask research_papers --interactive
Commands
Build Index
python -m apps.colqwen_rag build \
--pdfs ./pdf_directory/ \
--index my_index \
--model colqwen2 \
--pages-dir ./page_images/ # Optional: save page images
Options:
--pdfs: Directory containing PDF files (or single PDF path)--index: Name for the index (required)--model:colqwen2(default) orcolpali--pages-dir: Directory to save page images (optional)
Search Index
python -m apps.colqwen_rag search my_index "your question here" --top-k 5
Options:
--top-k: Number of results to return (default: 5)--model: Model used for search (should match build model)
Interactive Q&A
python -m apps.colqwen_rag ask my_index --interactive
Commands in interactive mode:
- Type your questions naturally
help: Show available commandsquit/exit/q: Exit interactive mode
🧪 Test & Reproduce Results
Run the reproduction test for issue #119:
python test_colqwen_reproduction.py
This will:
- ✅ Check dependencies
- 📥 Download sample PDF (Attention Is All You Need paper)
- 🏗️ Build test index
- 🔍 Run sample queries
- 📊 Show how to generate similarity maps
🎨 Advanced: Similarity Maps
For visual similarity analysis, use the existing advanced script:
cd apps/multimodal/vision-based-pdf-multi-vector/
python multi-vector-leann-similarity-map.py
Edit the script to customize:
QUERY: Your questionMODEL: "colqwen2" or "colpali"USE_HF_DATASET: Use HuggingFace dataset or local PDFsSIMILARITY_MAP: Generate heatmapsANSWER: Enable Qwen-VL answer generation
🔧 How It Works
ColQwen2 vs ColPali
- ColQwen2 (
vidore/colqwen2-v1.0): Latest vision-language model - ColPali (
vidore/colpali-v1.2): Proven multimodal retriever
Architecture
- PDF → Images: Convert PDF pages to images (150 DPI)
- Vision Encoding: Process images with ColQwen2/ColPali
- Multi-Vector Index: Build LEANN HNSW index with multiple embeddings per page
- Query Processing: Encode text queries with same model
- Similarity Search: Find most relevant pages/regions
- Visual Maps: Generate attention heatmaps (optional)
Device Support
- CUDA: Best performance with GPU acceleration
- MPS: Apple Silicon Mac support
- CPU: Fallback for any system (slower)
Auto-detection: CUDA > MPS > CPU
📊 Performance Tips
For Best Performance:
# Use ColQwen2 for latest features
--model colqwen2
# Save page images for reuse
--pages-dir ./cached_pages/
# Adjust batch size based on GPU memory
# (automatically handled)
For Large Document Sets:
- Process PDFs in batches
- Use SSD storage for index files
- Consider using CUDA if available
🔗 Related Resources
- Fast-PLAID: https://github.com/lightonai/fast-plaid
- Pylate: https://github.com/lightonai/pylate
- ColBERT: https://github.com/stanford-futuredata/ColBERT
- ColPali Paper: Vision-Language Models for Document Retrieval
- Issue #119: https://github.com/yichuan-w/LEANN/issues/119
🐛 Troubleshooting
PDF Conversion Issues (macOS)
# Install poppler
brew install poppler
which pdfinfo && pdfinfo -v
Memory Issues
- Reduce batch size (automatically handled)
- Use CPU instead of GPU:
export CUDA_VISIBLE_DEVICES="" - Process fewer PDFs at once
Model Download Issues
- Ensure internet connection for first run
- Models are cached after first download
- Use HuggingFace mirrors if needed
Import Errors
# Ensure all dependencies installed
uv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn
# Check PyTorch installation
python -c "import torch; print(torch.__version__)"
💡 Examples
Research Paper Analysis
# Index your research papers
python -m apps.colqwen_rag build --pdfs ~/Papers/AI/ --index ai_papers
# Ask research questions
python -m apps.colqwen_rag search ai_papers "What are the limitations of transformer models?"
python -m apps.colqwen_rag search ai_papers "How does BERT compare to GPT?"
Document Q&A
# Index business documents
python -m apps.colqwen_rag build --pdfs ~/Documents/Reports/ --index reports
# Interactive analysis
python -m apps.colqwen_rag ask reports --interactive
Visual Analysis
# Generate similarity maps for specific queries
cd apps/multimodal/vision-based-pdf-multi-vector/
# Edit multi-vector-leann-similarity-map.py with your query
python multi-vector-leann-similarity-map.py
# Check ./figures/ for generated heatmaps
🎯 This integration makes ColQwen as easy to use as other LEANN features while maintaining the full power of multimodal document understanding!