docs: add a link

docs: Address all configuration guide feedback
- Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations
2025-08-04 20:10:14 -07:00 · 2025-08-04 20:01:23 -07:00 · 2025-08-04 19:29:17 -07:00 · 2025-08-04 17:53:27 -07:00 · 2025-08-04 17:51:21 -07:00 · 2025-08-04 17:46:17 -07:00
7 changed files with 249 additions and 12 deletions
--- a/README.md
+++ b/README.md
@@ -170,6 +170,8 @@ ollama pull llama3.2:1b
 LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.
 📚 **Need configuration best practices?** Check our [Configuration Guide](docs/configuration-guide.md) for detailed optimization tips, model selection advice, and solutions to common issues like slow embeddings or poor search quality.
 <details>
 <summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>
@@ -514,7 +516,7 @@ Options:
 - **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
 - **Two-level search:** Smart graph traversal that prioritizes promising nodes
-**Backends:** DiskANN or HNSW - pick what works for your data size.
+**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
 ## Benchmarks
@@ -534,8 +536,7 @@ Options:
 ```bash
 uv pip install -e ".[dev]"  # Install dev dependencies
-python benchmarks/run_evaluation.py data/indices/dpr/dpr_diskann      # DPR dataset
+python benchmarks/run_evaluation.py    # Will auto-download evaluation data and run benchmarks
 python benchmarks/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index  # Wikipedia
 ```
 The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!
--- a/benchmarks/data/.gitattributes
+++ b/benchmarks/data/.gitattributes
--- a/docs/configuration-guide.md
+++ b/docs/configuration-guide.md
@@ -0,0 +1,236 @@
 # LEANN Configuration Guide
 This guide helps you optimize LEANN for different use cases and understand the trade-offs between various configuration options.
 ## Getting Started: Simple is Better
 When first trying LEANN, start with a small dataset to quickly validate your approach:
 **For document RAG**: The default `data/` directory works perfectly - includes 2 AI research papers, Pride and Prejudice literature, and a technical report
 ```bash
 python -m apps.document_rag --query "What techniques does LEANN use?"
 ```
 **For other data sources**: Limit the dataset size for quick testing
 ```bash
 # WeChat: Test with recent messages only
 python -m apps.wechat_rag --max-items 100 --query "What did we discuss about the project timeline?"
 # Browser history: Last few days
 python -m apps.browser_rag --max-items 500 --query "Find documentation about vector databases"
 # Email: Recent inbox
 python -m apps.email_rag --max-items 200 --query "Who sent updates about the deployment status?"
 ```
 Once validated, scale up gradually:
 - 100 documents → 1,000 → 10,000 → full dataset (`--max-items -1`)
 - This helps identify issues early before committing to long processing times
 ## Embedding Model Selection: Understanding the Trade-offs
 Based on our experience developing LEANN, embedding models fall into three categories:
 ### Small Models (< 100M parameters)
 **Example**: `sentence-transformers/all-MiniLM-L6-v2` (22M params)
 - **Pros**: Lightweight, fast for both indexing and inference
 - **Cons**: Lower semantic understanding, may miss nuanced relationships
 - **Use when**: Speed is critical, handling simple queries, interactive mode, or just experimenting with LEANN. If time is not a constraint, consider using a larger/better embedding model
 ### Medium Models (100M-500M parameters)
 **Example**: `facebook/contriever` (110M params), `BAAI/bge-base-en-v1.5` (110M params)
 - **Pros**: Balanced performance, good multilingual support, reasonable speed
 - **Cons**: Requires more compute than small models
 - **Use when**: Need quality results without extreme compute requirements, general-purpose RAG applications
 ### Large Models (500M+ parameters)
 **Example**: `Qwen/Qwen3-Embedding-0.6B` (600M params), `intfloat/multilingual-e5-large` (560M params)
 - **Pros**: Best semantic understanding, captures complex relationships, excellent multilingual support. **Qwen3-Embedding-0.6B achieves nearly OpenAI API performance!**
 - **Cons**: Slower inference, longer index build times
 - **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
 ### Quick Start: OpenAI Embeddings (Fastest Setup)
 For immediate testing without local model downloads:
 ```bash
 # Set OpenAI embeddings (requires OPENAI_API_KEY)
 --embedding-mode openai --embedding-model text-embedding-3-small
 ```
 <details>
 <summary><strong>Cloud vs Local Trade-offs</strong></summary>
 **OpenAI Embeddings** (`text-embedding-3-small/large`)
 - **Pros**: No local compute needed, consistently fast, high quality
 - **Cons**: Requires API key, costs money, data leaves your system, [known limitations with certain languages](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
 - **When to use**: Prototyping, non-sensitive data, need immediate results
 **Local Embeddings**
 - **Pros**: Complete privacy, no ongoing costs, full control, can sometimes outperform OpenAI embeddings
 - **Cons**: Slower than cloud APIs, requires local compute resources
 - **When to use**: Production systems, sensitive data, cost-sensitive applications
 </details>
 ## Index Selection: Matching Your Scale
 ### HNSW (Hierarchical Navigable Small World)
 **Best for**: Small to medium datasets (< 10M vectors) - **Default and recommended for extreme low storage**
 - Full recomputation required
 - High memory usage during build phase
 - Excellent recall (95%+)
 ```bash
 # Optimal for most use cases
 --backend-name hnsw --graph-degree 32 --build-complexity 64
 ```
 ### DiskANN
 **Best for**: Large datasets (> 10M vectors, 10GB+ index size) - **⚠️ Beta version, still in active development**
 - Uses Product Quantization (PQ) for coarse filtering during graph traversal
 - Novel approach: stores only PQ codes, performs rerank with exact computation in final step
 - Implements a corner case of double-queue: prunes all neighbors and recomputes at the end
 ```bash
 # For billion-scale deployments
 --backend-name diskann --graph-degree 64 --build-complexity 128
 ```
 ## LLM Selection: Engine and Model Comparison
 ### LLM Engines
 **OpenAI** (`--llm openai`)
 - **Pros**: Best quality, consistent performance, no local resources needed
 - **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
 - **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
 - **Note**: Our current default, but we recommend switching to Ollama for most use cases
 **Ollama** (`--llm ollama`)
 - **Pros**: Fully local, free, privacy-preserving, good model variety
 - **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
 - **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
 **HuggingFace** (`--llm hf`)
 - **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
 - **Cons**: More complex initial setup
 - **Models**: `Qwen/Qwen3-1.7B-FP8`
 ## Parameter Tuning Guide
 ### Search Complexity Parameters
 **`--build-complexity`** (index building)
 - Controls thoroughness during index construction
 - Higher = better recall but slower build
 - Recommendations:
  - 32: Quick prototyping
  - 64: Balanced (default)
  - 128: Production systems
  - 256: Maximum quality
 **`--search-complexity`** (query time)
 - Controls search thoroughness
 - Higher = better results but slower
 - Recommendations:
  - 16: Fast/Interactive search
  - 32: High quality with diversity
  - 64+: Maximum accuracy
 ### Top-K Selection
 **`--top-k`** (number of retrieved chunks)
 - More chunks = better context but slower LLM processing
 - Should be always smaller than `--search-complexity`
 - Guidelines:
  - 10-20: General questions (default: 20)
  - 30+: Complex multi-hop reasoning requiring comprehensive context
 **Trade-off formula**:
 - Retrieval time ∝ log(n) × search_complexity
 - LLM processing time ∝ top_k × chunk_size
 - Total context = top_k × chunk_size tokens
 ### Graph Degree (HNSW/DiskANN)
 **`--graph-degree`**
 - Number of connections per node in the graph
 - Higher = better recall but more memory
 - HNSW: 16-32 (default: 32)
 - DiskANN: 32-128 (default: 64)
 ## Performance Optimization Checklist
 ### If Embedding is Too Slow
 1. **Switch to smaller model**:
   ```bash
   # From large model
   --embedding-model Qwen/Qwen3-Embedding-0.6B
   # To small model
   --embedding-model sentence-transformers/all-MiniLM-L6-v2
   ```
 2. **Limit dataset size for testing**:
   ```bash
   --max-items 1000  # Process first 1k items only
   ```
 3. **Use MLX on Apple Silicon** (optional optimization):
   ```bash
   --embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
   ```
 ### If Search Quality is Poor
 1. **Increase retrieval count**:
   ```bash
   --top-k 30  # Retrieve more candidates
   ```
 2. **Upgrade embedding model**:
   ```bash
   # For English
   --embedding-model BAAI/bge-base-en-v1.5
   # For multilingual
   --embedding-model intfloat/multilingual-e5-large
   ```
 ## Understanding the Trade-offs
 Every configuration choice involves trade-offs:
 | Factor | Small/Fast | Large/Quality |
 |--------|------------|---------------|
 | Embedding Model | `all-MiniLM-L6-v2` | `Qwen/Qwen3-Embedding-0.6B` |
 | Chunk Size | 512 tokens | 128 tokens |
 | Index Type | HNSW | DiskANN |
 | LLM | `qwen3:1.7b` | `gpt-4o` |
 The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
 ## Deep Dive: Critical Configuration Decisions
 ### When to Disable Recomputation
 LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
 ```bash
 --no-recompute  # Disable selective recomputation
 ```
 **Trade-offs**:
 - **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
 - **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search
 **Disable when**:
 - You have abundant storage and memory
 - Need extremely low latency (< 100ms)
 - Running a read-heavy workload where storage cost is acceptable
 ## Further Reading
 - [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
 - [LEANN Technical Paper](https://arxiv.org/abs/2506.08276)
 - [DiskANN Original Paper](https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
--- a/docs/features.md
+++ b/docs/features.md
@@ -5,7 +5,7 @@
 - **🔄 Real-time Embeddings** - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
 - **📈 Scalable Architecture** - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
 - **🎯 Graph Pruning** - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
- **🏗️ Pluggable Backends** - DiskANN, HNSW/FAISS with unified API
+- **🏗️ Pluggable Backends** - HNSW/FAISS (default), with optional DiskANN for large-scale deployments
 ## 🛠️ Technical Highlights
 - **🔄 Recompute Mode** - Highest accuracy scenarios while eliminating vector storage overhead
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -2,8 +2,8 @@
 ## 🎯 Q2 2025
 - [X] DiskANN backend with MIPS/L2/Cosine support
 - [X] HNSW backend integration
 - [X] DiskANN backend with MIPS/L2/Cosine support
 - [X] Real-time embedding pipeline
 - [X] Memory-efficient graph pruning
--- a/packages/leann-backend-diskann/third_party/DiskANN
+++ b/packages/leann-backend-diskann/third_party/DiskANN
--- a/uv.lock
+++ b/uv.lock
@@ -1650,7 +1650,7 @@ name = "importlib-metadata"
 version = "8.7.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "zipp" },
+    { name = "zipp", marker = "python_full_version < '3.10'" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641 }
 wheels = [
@@ -2155,7 +2155,7 @@ wheels = [
 [[package]]
 name = "leann-backend-diskann"
-version = "0.1.15"
+version = "0.2.0"
 source = { editable = "packages/leann-backend-diskann" }
 dependencies = [
    { name = "leann-core" },
@@ -2167,14 +2167,14 @@ dependencies = [
 [package.metadata]
 requires-dist = [
-    { name = "leann-core", specifier = "==0.1.15" },
+    { name = "leann-core", specifier = "==0.2.0" },
    { name = "numpy" },
    { name = "protobuf", specifier = ">=3.19.0" },
 ]
 [[package]]
 name = "leann-backend-hnsw"
-version = "0.1.15"
+version = "0.2.0"
 source = { editable = "packages/leann-backend-hnsw" }
 dependencies = [
    { name = "leann-core" },
@@ -2187,7 +2187,7 @@ dependencies = [
 [package.metadata]
 requires-dist = [
-    { name = "leann-core", specifier = "==0.1.15" },
+    { name = "leann-core", specifier = "==0.2.0" },
    { name = "msgpack", specifier = ">=1.0.0" },
    { name = "numpy" },
    { name = "pyzmq", specifier = ">=23.0.0" },
@@ -2195,7 +2195,7 @@ requires-dist = [
 [[package]]
 name = "leann-core"
-version = "0.1.15"
+version = "0.2.0"
 source = { editable = "packages/leann-core" }
 dependencies = [
    { name = "accelerate" },
Author	SHA1	Message	Date
Andy Lee	8eee90bf80	docs: add a link	2025-08-04 20:10:14 -07:00
Andy Lee	649d4ad03e	docs: Address all configuration guide feedback - Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations	2025-08-04 20:01:23 -07:00
Andy Lee	d9b6f195c5	docs: Improve configuration guide based on feedback - List specific files in default data/ directory (2 AI papers, literature, tech report) - Update examples to use English and better RAG-suitable queries - Change full dataset reference to use --max-items -1 - Adjust small model guidance about upgrading to larger models when time allows - Update top-k defaults to reflect actual default of 20 - Ensure consistent use of full model name Qwen/Qwen3-Embedding-0.6B - Reorder optimization steps, move MLX to third position - Remove incorrect chunk size tuning guidance - Change README from 'Having trouble' to 'Need best practices'	2025-08-04 19:29:17 -07:00
Andy Lee	00f506c0bd	docs: Adjust DiskANN positioning in features and roadmap - features.md: Put HNSW/FAISS first as default, DiskANN as optional - roadmap.md: Reorder to show HNSW integration before DiskANN - Consistent with positioning DiskANN as advanced option for large-scale use	2025-08-04 17:53:27 -07:00
Andy Lee	e872dd1d23	docs: Weaken DiskANN emphasis in README - Change backend description to emphasize HNSW as default - DiskANN positioned as optional for billion-scale datasets - Simplify evaluation commands to be more generic	2025-08-04 17:51:21 -07:00
Andy Lee	063c687ff7	chore: move evaluation data .gitattributes to correct location	2025-08-04 17:46:17 -07:00
Andy Lee	bb8ecd54d7	feat: add comprehensive configuration guide and update README - Create docs/configuration-guide.md with detailed guidance on: - Embedding model selection (small/medium/large) - Index selection (HNSW vs DiskANN) - LLM engine and model comparison - Parameter tuning (build/search complexity, top-k) - Performance optimization tips - Deep dive into LEANN's recomputation feature - Update README.md to link to the configuration guide - Include latest 2025 model recommendations (Qwen3, DeepSeek-R1, O3-mini)	2025-08-04 17:41:27 -07:00
Andy Lee	716217ae24	docs: config guidance	2025-08-04 16:21:13 -07:00