docs: config guidance

2025-08-04 16:21:13 -07:00
parent dd71ac8d71
commit 716217ae24
3 changed files with 283 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -170,6 +170,13 @@ ollama pull llama3.2:1b

 LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.

+📚 **Having trouble with configuration?** Check our [Configuration Guide](docs/configuration-guide.md) for:
+- Quick start configurations for each use case
+- Solutions for "embedding too slow" issues
+- How to choose the right chat model
+- Fixing poor search quality
+- Performance optimization tips
+
 <details>
 <summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>

--- a/docs/configuration-guide.md
+++ b/docs/configuration-guide.md
@@ -0,0 +1,275 @@
+# LEANN Configuration Guide
+
+This guide helps you optimize LEANN for different use cases and understand the trade-offs between various configuration options.
+
+## Getting Started: Simple is Better
+
+When first trying LEANN, start with a small dataset to quickly validate your approach. Use the default `data/` directory which contains just a few files - this lets you test the full pipeline in minutes rather than hours.
+
+```bash
+# Quick test with minimal data
+python -m apps.document_rag --max-items 100 --query "What techniques does LEANN use?"
+```
+
+Once validated, scale up gradually:
+- 100 documents → 1,000 → 10,000 → full dataset
+- This helps identify issues early before committing to long processing times
+
+## Embedding Model Selection: Understanding the Trade-offs
+
+Based on our experience developing LEANN, embedding models fall into three categories:
+
+### Small Models (384-768 dims)
+**Example**: `sentence-transformers/all-MiniLM-L6-v2`
+- **Pros**: Fast inference (10-50ms, 384 dims), good for real-time applications
+- **Cons**: Lower semantic understanding, may miss nuanced relationships
+- **Use when**: Speed is critical, handling simple queries
+
+### Medium Models (768-1024 dims)
+**Example**: `facebook/contriever`
+- **Pros**: Balanced performance, good multilingual support, reasonable speed
+- **Cons**: Requires more compute than small models
+- **Use when**: Need quality results without extreme compute requirements
+
+### Large Models (1024+ dims)
+**Example**: `Qwen/Qwen3-Embedding`
+- **Pros**: Best semantic understanding, captures complex relationships, excellent multilingual support
+- **Cons**: Slow inference, high memory usage, may overfit on small datasets
+- **Use when**: Quality is paramount and you have sufficient compute
+
+### Cloud vs Local Trade-offs
+
+**OpenAI Embeddings** (`text-embedding-3-small/large`)
+- **Pros**: No local compute needed, consistently fast, high quality
+- **Cons**: Requires API key, costs money, data leaves your system, [known limitations with certain languages](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
+- **When to use**: Prototyping, non-sensitive data, need immediate results
+
+**Local Embeddings**
+- **Pros**: Complete privacy, no ongoing costs, full control
+- **Cons**: Requires GPU for good performance, setup complexity
+- **When to use**: Production systems, sensitive data, cost-sensitive applications
+
+## Index Selection: Matching Your Scale
+
+### HNSW (Hierarchical Navigable Small World)
+**Best for**: Small to medium datasets (< 10M vectors)
+- Fast search (1-10ms latency)
+- Full recomputation required (no double queue optimization)
+- High memory usage during build phase
+- Excellent recall (95%+)
+
+```bash
+# Optimal for most use cases
+--backend-name hnsw --graph-degree 32 --build-complexity 64
+```
+
+### DiskANN
+**Best for**: Large datasets (> 10M vectors, 10GB+ index size)
+- Uses Product Quantization (PQ) for coarse filtering in double queue architecture
+- Extremely fast search through selective recomputation
+
+```bash
+# For billion-scale deployments
+--backend-name diskann --graph-degree 64 --build-complexity 128
+```
+
+## LLM Selection: Engine and Model Comparison
+
+### LLM Engines
+
+**OpenAI** (`--llm openai`)
+- **Pros**: Best quality, consistent performance, no local resources needed
+- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
+- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
+
+**Ollama** (`--llm ollama`)
+- **Pros**: Fully local, free, privacy-preserving, good model variety
+- **Cons**: Requires local GPU/CPU resources, slower than cloud
+- **Models**: `qwen3:1.7b` (best general quality), `deepseek-r1:1.5b` (reasoning)
+
+**HuggingFace** (`--llm hf`)
+- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
+- **Cons**: API rate limits, local mode needs significant resources, more complex setup
+- **Models**: `Qwen/Qwen3-1.7B-FP8`
+
+
+### Model Size Trade-offs
+
+| Model Size | Speed | Quality | Memory | Use Case |
+|------------|-------|---------|---------|----------|
+| 1B params | 50-100 tok/s | Basic | 2-4GB | Quick answers, simple queries |
+| 3B params | 20-50 tok/s | Good | 4-8GB | General purpose RAG |
+| 7B params | 10-20 tok/s | Excellent | 8-16GB | Complex reasoning |
+| 13B+ params | 5-10 tok/s | Best | 16-32GB+ | Research, detailed analysis |
+
+## Parameter Tuning Guide
+
+### Search Complexity Parameters
+
+**`--build-complexity`** (index building)
+- Controls thoroughness during index construction
+- Higher = better recall but slower build
+- Recommendations:
+  - 32: Quick prototyping
+  - 64: Balanced (default)
+  - 128: Production systems
+  - 256: Maximum quality
+
+**`--search-complexity`** (query time)
+- Controls search thoroughness
+- Higher = better results but slower
+- Recommendations:
+  - 16: Fast/Interactive search (500-1000ms on consumer hardware)
+  - 32: High quality with diversity (1000-2000ms)
+  - 64+: Maximum accuracy (2000ms+)
+
+### Top-K Selection
+
+**`--top-k`** (number of retrieved chunks)
+- More chunks = better context but slower LLM processing
+- Should be always smaller than `--search-complexity`
+- Guidelines:
+  - 3-5: Simple factual queries
+  - 5-10: General questions (default)
+  - 10+: Complex multi-hop reasoning
+
+**Trade-off formula**:
+- Retrieval time ∝ log(n) × search_complexity
+- LLM processing time ∝ top_k × chunk_size
+- Total context = top_k × chunk_size tokens
+
+### Graph Degree (HNSW/DiskANN)
+
+**`--graph-degree`**
+- Number of connections per node in the graph
+- Higher = better recall but more memory
+- HNSW: 16-32 (default: 32)
+- DiskANN: 32-128 (default: 64)
+
+## Common Configurations by Use Case
+
+### 1. Quick Experimentation
+```bash
+python -m apps.document_rag \
+  --max-items 1000 \
+  --embedding-model sentence-transformers/all-MiniLM-L6-v2 \
+  --backend-name hnsw \
+  --llm ollama --llm-model llama3.2:1b
+```
+
+### 2. Personal Knowledge Base
+```bash
+python -m apps.document_rag \
+  --embedding-model facebook/contriever \
+  --chunk-size 512 --chunk-overlap 128 \
+  --backend-name hnsw \
+  --llm ollama --llm-model llama3.2:3b
+```
+
+### 3. Production RAG System
+```bash
+python -m apps.document_rag \
+  --embedding-model BAAI/bge-base-en-v1.5 \
+  --chunk-size 256 --chunk-overlap 64 \
+  --backend-name diskann \
+  --llm openai --llm-model gpt-4o-mini \
+  --top-k 20 --search-complexity 64
+```
+
+### 4. Multi-lingual Support (e.g., WeChat)
+```bash
+python -m apps.wechat_rag \
+  --embedding-model intfloat/multilingual-e5-base \
+  --chunk-size 192 --chunk-overlap 48 \
+  --backend-name hnsw \
+  --llm ollama --llm-model qwen3:8b
+```
+
+## Performance Optimization Checklist
+
+### If Embedding is Too Slow
+
+1. **Switch to smaller model**:
+   ```bash
+   # From large model
+   --embedding-model Qwen/Qwen3-Embedding
+   # To small model
+   --embedding-model sentence-transformers/all-MiniLM-L6-v2
+   ```
+
+2. **Use MLX on Apple Silicon**:
+   ```bash
+   --embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
+   ```
+
+3. **Process in batches**:
+   ```bash
+   --max-items 10000  # Process incrementally
+   ```
+
+### If Search Quality is Poor
+
+1. **Increase retrieval count**:
+   ```bash
+   --top-k 30  # Retrieve more candidates
+   ```
+
+2. **Tune chunk size for your content**:
+   - Technical docs: `--chunk-size 512`
+   - Chat messages: `--chunk-size 128`
+   - Mixed content: `--chunk-size 256`
+
+3. **Upgrade embedding model**:
+   ```bash
+   # For English
+   --embedding-model BAAI/bge-base-en-v1.5
+   # For multilingual
+   --embedding-model intfloat/multilingual-e5-large
+   ```
+
+## Understanding the Trade-offs
+
+Every configuration choice involves trade-offs:
+
+| Factor | Small/Fast | Large/Quality |
+|--------|------------|---------------|
+| Embedding Model | all-MiniLM-L6-v2 | BAAI/bge-large |
+| Chunk Size | 128 tokens | 512 tokens |
+| Index Type | HNSW | DiskANN |
+| LLM | llama3.2:1b | gpt-4o |
+
+The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
+
+## Deep Dive: Critical Configuration Decisions
+
+### When to Disable Recomputation
+
+LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
+
+```bash
+--no-recompute  # Disable selective recomputation
+```
+
+**Trade-offs**:
+- **With recomputation** (default): Exact distances, best quality, higher latency
+- **Without recomputation**: Approximate distances via PQ, 2-5x faster, significantly lower memory and storage usage
+
+**Disable when**:
+- QPS requirements > 1000/sec
+- Slight accuracy loss is acceptable
+- Running on resource-constrained hardware
+
+## Performance Monitoring
+
+Key metrics to watch:
+- Index build time
+- Query latency (p50, p95, p99)
+- Memory usage during build and search
+- Disk I/O patterns (for DiskANN)
+- Recomputation ratio (% of candidates recomputed)
+
+## Further Reading
+
+- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
+- [LEANN Technical Paper](https://arxiv.org/abs/2506.08276)
+- [DiskANN Original Paper](https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
--- a/packages/leann-backend-diskann/third_party/DiskANN
+++ b/packages/leann-backend-diskann/third_party/DiskANN