# LAION Multimodal Benchmark A multimodal benchmark for evaluating image retrieval performance using LEANN with CLIP embeddings on LAION dataset subset. ## Overview This benchmark evaluates: - **Image retrieval timing** using caption-based queries - **Recall@K performance** for image search - **Complexity analysis** across different search parameters - **Index size and storage efficiency** ## Dataset Configuration - **Dataset**: LAION-400M subset (10,000 images) - **Embeddings**: Pre-computed CLIP ViT-B/32 (512 dimensions) - **Queries**: 200 random captions from the dataset - **Ground Truth**: Self-recall (query caption → original image) ## Quick Start ### 1. Setup the benchmark ```bash cd benchmarks/laion python setup_laion.py --num-samples 10000 --num-queries 200 ``` This will: - Create dummy LAION data (10K samples) - Generate CLIP embeddings (512-dim) - Build LEANN index with HNSW backend - Create 200 evaluation queries ### 2. Run evaluation ```bash # Run all evaluation stages python evaluate_laion.py --index data/laion_index.leann # Run specific stages python evaluate_laion.py --index data/laion_index.leann --stage timing python evaluate_laion.py --index data/laion_index.leann --stage recall python evaluate_laion.py --index data/laion_index.leann --stage complexity ``` ### 3. Save results ```bash python evaluate_laion.py --index data/laion_index.leann --output results.json ``` ## Configuration Options ### Setup Options ```bash python setup_laion.py \ --num-samples 10000 \ --num-queries 200 \ --index-path data/laion_index.leann \ --backend hnsw ``` ### Evaluation Options ```bash python evaluate_laion.py \ --index data/laion_index.leann \ --queries data/evaluation_queries.jsonl \ --complexity 64 \ --top-k 3 \ --num-samples 100 \ --stage all ``` ## Evaluation Stages ### Stage 1: Index Analysis - Analyzes index file sizes and metadata - Reports storage efficiency ### Stage 2: Search Timing - Measures average search latency - Tests with configurable complexity and top-k - Reports searches per second ### Stage 3: Recall Evaluation - Evaluates Recall@K using ground truth - Self-recall: query caption should retrieve original image ### Stage 4: Complexity Analysis - Tests performance across different complexity levels [16, 32, 64, 128] - Analyzes speed vs. accuracy tradeoffs ## Output Metrics ### Timing Metrics - Average/median/min/max search time - Standard deviation - Searches per second - Latency in milliseconds ### Recall Metrics - Recall@K percentage - Number of queries with ground truth ### Index Metrics - Total index size (MB) - Component breakdown (index, passages, metadata) - Backend and embedding model info ## Example Results ``` 🎯 LAION MULTIMODAL BENCHMARK RESULTS ============================================================ 📏 Index Information: Total size: 145.2 MB Backend: hnsw Embedding model: clip-vit-b-32 Total passages: 10000 ⚡ Search Performance: Total queries: 200 Average search time: 0.023s Median search time: 0.021s Min/Max search time: 0.012s / 0.089s Std dev: 0.008s Complexity: 64 Top-K: 3 📊 Recall Performance: Recall@3: 85.5% Queries with ground truth: 200 ⚙️ Complexity Analysis: Complexity 16: 0.015s avg Complexity 32: 0.019s avg Complexity 64: 0.023s avg Complexity 128: 0.031s avg 🚀 Performance Summary: Searches per second: 43.5 Latency (ms): 23.0ms ``` ## Directory Structure ``` benchmarks/laion/ ├── setup_laion.py # Setup script ├── evaluate_laion.py # Evaluation script ├── README.md # This file └── data/ # Generated data ├── laion_images/ # Image files (placeholder) ├── laion_metadata.jsonl # Image metadata ├── laion_passages.jsonl # LEANN passages ├── laion_embeddings.npy # CLIP embeddings ├── evaluation_queries.jsonl # Evaluation queries └── laion_index.leann/ # LEANN index files ``` ## Notes - Current implementation uses dummy data for demonstration - For real LAION data, implement actual download logic in `setup_laion.py` - CLIP embeddings are randomly generated - replace with real CLIP model for production - Adjust `num_samples` and `num_queries` based on available resources - Consider using `--num-samples` during evaluation for faster testing