# ColQwen Integration Guide Easy-to-use multimodal PDF retrieval with ColQwen2/ColPali models. ## Quick Start > **๐ŸŽ Mac Users**: ColQwen is optimized for Apple Silicon with MPS acceleration for faster inference! ### 1. Install Dependencies ```bash uv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn brew install poppler # macOS only, for PDF processing ``` ### 2. Basic Usage ```bash # Build index from PDFs python -m apps.colqwen_rag build --pdfs ./my_papers/ --index research_papers # Search with text queries python -m apps.colqwen_rag search research_papers "How does attention mechanism work?" # Interactive Q&A python -m apps.colqwen_rag ask research_papers --interactive ``` ## Commands ### Build Index ```bash python -m apps.colqwen_rag build \ --pdfs ./pdf_directory/ \ --index my_index \ --model colqwen2 \ --pages-dir ./page_images/ # Optional: save page images ``` **Options:** - `--pdfs`: Directory containing PDF files (or single PDF path) - `--index`: Name for the index (required) - `--model`: `colqwen2` (default) or `colpali` - `--pages-dir`: Directory to save page images (optional) ### Search Index ```bash python -m apps.colqwen_rag search my_index "your question here" --top-k 5 ``` **Options:** - `--top-k`: Number of results to return (default: 5) - `--model`: Model used for search (should match build model) ### Interactive Q&A ```bash python -m apps.colqwen_rag ask my_index --interactive ``` **Commands in interactive mode:** - Type your questions naturally - `help`: Show available commands - `quit`/`exit`/`q`: Exit interactive mode ## ๐Ÿงช Test & Reproduce Results Run the reproduction test for issue #119: ```bash python test_colqwen_reproduction.py ``` This will: 1. โœ… Check dependencies 2. ๐Ÿ“ฅ Download sample PDF (Attention Is All You Need paper) 3. ๐Ÿ—๏ธ Build test index 4. ๐Ÿ” Run sample queries 5. ๐Ÿ“Š Show how to generate similarity maps ## ๐ŸŽจ Advanced: Similarity Maps For visual similarity analysis, use the existing advanced script: ```bash cd apps/multimodal/vision-based-pdf-multi-vector/ python multi-vector-leann-similarity-map.py ``` Edit the script to customize: - `QUERY`: Your question - `MODEL`: "colqwen2" or "colpali" - `USE_HF_DATASET`: Use HuggingFace dataset or local PDFs - `SIMILARITY_MAP`: Generate heatmaps - `ANSWER`: Enable Qwen-VL answer generation ## ๐Ÿ”ง How It Works ### ColQwen2 vs ColPali - **ColQwen2** (`vidore/colqwen2-v1.0`): Latest vision-language model - **ColPali** (`vidore/colpali-v1.2`): Proven multimodal retriever ### Architecture 1. **PDF โ†’ Images**: Convert PDF pages to images (150 DPI) 2. **Vision Encoding**: Process images with ColQwen2/ColPali 3. **Multi-Vector Index**: Build LEANN HNSW index with multiple embeddings per page 4. **Query Processing**: Encode text queries with same model 5. **Similarity Search**: Find most relevant pages/regions 6. **Visual Maps**: Generate attention heatmaps (optional) ### Device Support - **CUDA**: Best performance with GPU acceleration - **MPS**: Apple Silicon Mac support - **CPU**: Fallback for any system (slower) Auto-detection: CUDA > MPS > CPU ## ๐Ÿ“Š Performance Tips ### For Best Performance: ```bash # Use ColQwen2 for latest features --model colqwen2 # Save page images for reuse --pages-dir ./cached_pages/ # Adjust batch size based on GPU memory # (automatically handled) ``` ### For Large Document Sets: - Process PDFs in batches - Use SSD storage for index files - Consider using CUDA if available ## ๐Ÿ”— Related Resources - **Fast-PLAID**: https://github.com/lightonai/fast-plaid - **Pylate**: https://github.com/lightonai/pylate - **ColBERT**: https://github.com/stanford-futuredata/ColBERT - **ColPali Paper**: Vision-Language Models for Document Retrieval - **Issue #119**: https://github.com/yichuan-w/LEANN/issues/119 ## ๐Ÿ› Troubleshooting ### PDF Conversion Issues (macOS) ```bash # Install poppler brew install poppler which pdfinfo && pdfinfo -v ``` ### Memory Issues - Reduce batch size (automatically handled) - Use CPU instead of GPU: `export CUDA_VISIBLE_DEVICES=""` - Process fewer PDFs at once ### Model Download Issues - Ensure internet connection for first run - Models are cached after first download - Use HuggingFace mirrors if needed ### Import Errors ```bash # Ensure all dependencies installed uv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn # Check PyTorch installation python -c "import torch; print(torch.__version__)" ``` ## ๐Ÿ’ก Examples ### Research Paper Analysis ```bash # Index your research papers python -m apps.colqwen_rag build --pdfs ~/Papers/AI/ --index ai_papers # Ask research questions python -m apps.colqwen_rag search ai_papers "What are the limitations of transformer models?" python -m apps.colqwen_rag search ai_papers "How does BERT compare to GPT?" ``` ### Document Q&A ```bash # Index business documents python -m apps.colqwen_rag build --pdfs ~/Documents/Reports/ --index reports # Interactive analysis python -m apps.colqwen_rag ask reports --interactive ``` ### Visual Analysis ```bash # Generate similarity maps for specific queries cd apps/multimodal/vision-based-pdf-multi-vector/ # Edit multi-vector-leann-similarity-map.py with your query python multi-vector-leann-similarity-map.py # Check ./figures/ for generated heatmaps ``` --- **๐ŸŽฏ This integration makes ColQwen as easy to use as other LEANN features while maintaining the full power of multimodal document understanding!**