Files

aakash 0175bc9c20 docs: Add ColQwen guide to docs directory

Add COLQWEN_GUIDE.md to docs/ directory for proper documentation structure.
This file is referenced in the README and needs to be tracked in git.

2025-12-07 09:57:14 -08:00

5.4 KiB

Raw Permalink Blame History

ColQwen Integration Guide

Easy-to-use multimodal PDF retrieval with ColQwen2/ColPali models.

Quick Start

🍎 Mac Users: ColQwen is optimized for Apple Silicon with MPS acceleration for faster inference!

1. Install Dependencies

uv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn
brew install poppler  # macOS only, for PDF processing

2. Basic Usage

# Build index from PDFs
python -m apps.colqwen_rag build --pdfs ./my_papers/ --index research_papers

# Search with text queries
python -m apps.colqwen_rag search research_papers "How does attention mechanism work?"

# Interactive Q&A
python -m apps.colqwen_rag ask research_papers --interactive

Commands

Build Index

python -m apps.colqwen_rag build \
  --pdfs ./pdf_directory/ \
  --index my_index \
  --model colqwen2 \
  --pages-dir ./page_images/  # Optional: save page images

Options:

--pdfs: Directory containing PDF files (or single PDF path)
--index: Name for the index (required)
--model: colqwen2 (default) or colpali
--pages-dir: Directory to save page images (optional)

Search Index

python -m apps.colqwen_rag search my_index "your question here" --top-k 5

Options:

--top-k: Number of results to return (default: 5)
--model: Model used for search (should match build model)

Interactive Q&A

python -m apps.colqwen_rag ask my_index --interactive

Commands in interactive mode:

Type your questions naturally
help: Show available commands
quit/exit/q: Exit interactive mode

🧪 Test & Reproduce Results

Run the reproduction test for issue #119:

python test_colqwen_reproduction.py

This will:

✅ Check dependencies
📥 Download sample PDF (Attention Is All You Need paper)
🏗️ Build test index
🔍 Run sample queries
📊 Show how to generate similarity maps

🎨 Advanced: Similarity Maps

For visual similarity analysis, use the existing advanced script:

cd apps/multimodal/vision-based-pdf-multi-vector/
python multi-vector-leann-similarity-map.py

Edit the script to customize:

QUERY: Your question
MODEL: "colqwen2" or "colpali"
USE_HF_DATASET: Use HuggingFace dataset or local PDFs
SIMILARITY_MAP: Generate heatmaps
ANSWER: Enable Qwen-VL answer generation

🔧 How It Works

ColQwen2 vs ColPali

ColQwen2 (vidore/colqwen2-v1.0): Latest vision-language model
ColPali (vidore/colpali-v1.2): Proven multimodal retriever

Architecture

PDF → Images: Convert PDF pages to images (150 DPI)
Vision Encoding: Process images with ColQwen2/ColPali
Multi-Vector Index: Build LEANN HNSW index with multiple embeddings per page
Query Processing: Encode text queries with same model
Similarity Search: Find most relevant pages/regions
Visual Maps: Generate attention heatmaps (optional)

Device Support

CUDA: Best performance with GPU acceleration
MPS: Apple Silicon Mac support
CPU: Fallback for any system (slower)

Auto-detection: CUDA > MPS > CPU

📊 Performance Tips

For Best Performance:

# Use ColQwen2 for latest features
--model colqwen2

# Save page images for reuse
--pages-dir ./cached_pages/

# Adjust batch size based on GPU memory
# (automatically handled)

For Large Document Sets:

Process PDFs in batches
Use SSD storage for index files
Consider using CUDA if available

Fast-PLAID: https://github.com/lightonai/fast-plaid
Pylate: https://github.com/lightonai/pylate
ColBERT: https://github.com/stanford-futuredata/ColBERT
ColPali Paper: Vision-Language Models for Document Retrieval
Issue #119: https://github.com/yichuan-w/LEANN/issues/119

🐛 Troubleshooting

PDF Conversion Issues (macOS)

# Install poppler
brew install poppler
which pdfinfo && pdfinfo -v

Memory Issues

Reduce batch size (automatically handled)
Use CPU instead of GPU: export CUDA_VISIBLE_DEVICES=""
Process fewer PDFs at once

Model Download Issues

Ensure internet connection for first run
Models are cached after first download
Use HuggingFace mirrors if needed

Import Errors

# Ensure all dependencies installed
uv pip install colpali_engine pdf2image pillow matplotlib qwen_vl_utils einops seaborn

# Check PyTorch installation
python -c "import torch; print(torch.__version__)"

💡 Examples

Research Paper Analysis

# Index your research papers
python -m apps.colqwen_rag build --pdfs ~/Papers/AI/ --index ai_papers

# Ask research questions
python -m apps.colqwen_rag search ai_papers "What are the limitations of transformer models?"
python -m apps.colqwen_rag search ai_papers "How does BERT compare to GPT?"

Document Q&A

# Index business documents
python -m apps.colqwen_rag build --pdfs ~/Documents/Reports/ --index reports

# Interactive analysis
python -m apps.colqwen_rag ask reports --interactive

Visual Analysis

# Generate similarity maps for specific queries
cd apps/multimodal/vision-based-pdf-multi-vector/
# Edit multi-vector-leann-similarity-map.py with your query
python multi-vector-leann-similarity-map.py
# Check ./figures/ for generated heatmaps

🎯 This integration makes ColQwen as easy to use as other LEANN features while maintaining the full power of multimodal document understanding!

5.4 KiB Raw Permalink Blame History