docs: add a link

docs: Address all configuration guide feedback
- Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations
2025-08-04 20:10:14 -07:00 · 2025-08-04 20:01:23 -07:00 · 2025-08-04 19:29:17 -07:00 · 2025-08-04 17:53:27 -07:00 · 2025-08-04 17:51:21 -07:00 · 2025-08-04 17:46:17 -07:00
17 changed files with 367 additions and 42 deletions
--- a/README.md
+++ b/README.md
@@ -170,6 +170,8 @@ ollama pull llama3.2:1b

 LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.

+📚 **Need configuration best practices?** Check our [Configuration Guide](docs/configuration-guide.md) for detailed optimization tips, model selection advice, and solutions to common issues like slow embeddings or poor search quality.
+
 <details>
 <summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>

@@ -514,7 +516,7 @@ Options:
 - **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
 - **Two-level search:** Smart graph traversal that prioritizes promising nodes

-**Backends:** DiskANN or HNSW - pick what works for your data size.
+**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.

 ## Benchmarks

@@ -534,8 +536,7 @@ Options:

 ```bash
 uv pip install -e ".[dev]"  # Install dev dependencies
-python benchmarks/run_evaluation.py data/indices/dpr/dpr_diskann      # DPR dataset
-python benchmarks/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index  # Wikipedia
+python benchmarks/run_evaluation.py    # Will auto-download evaluation data and run benchmarks
 ```

 The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!
--- a/apps/document_rag.py
+++ b/apps/document_rag.py
@@ -99,7 +99,9 @@ if __name__ == "__main__":
    print("- 'What are the main techniques LEANN uses?'")
    print("- 'What is the technique DLPM?'")
    print("- 'Who does Elizabeth Bennet marry?'")
-    print("- 'What is the problem of developing pan gu model? (盘古大模型开发中遇到什么问题?)'")
+    print(
+        "- 'What is the problem of developing pan gu model Huawei meets? (盘古大模型开发中遇到什么问题?)'"
+    )
    print("\nOr run without --query for interactive mode\n")

    rag = DocumentRAG()
--- a/benchmarks/data/.gitattributes
+++ b/benchmarks/data/.gitattributes
--- a/docs/configuration-guide.md
+++ b/docs/configuration-guide.md
@@ -0,0 +1,236 @@
+# LEANN Configuration Guide
+
+This guide helps you optimize LEANN for different use cases and understand the trade-offs between various configuration options.
+
+## Getting Started: Simple is Better
+
+When first trying LEANN, start with a small dataset to quickly validate your approach:
+
+**For document RAG**: The default `data/` directory works perfectly - includes 2 AI research papers, Pride and Prejudice literature, and a technical report
+```bash
+python -m apps.document_rag --query "What techniques does LEANN use?"
+```
+
+**For other data sources**: Limit the dataset size for quick testing
+```bash
+# WeChat: Test with recent messages only
+python -m apps.wechat_rag --max-items 100 --query "What did we discuss about the project timeline?"
+
+# Browser history: Last few days
+python -m apps.browser_rag --max-items 500 --query "Find documentation about vector databases"
+
+# Email: Recent inbox
+python -m apps.email_rag --max-items 200 --query "Who sent updates about the deployment status?"
+```
+
+Once validated, scale up gradually:
+- 100 documents → 1,000 → 10,000 → full dataset (`--max-items -1`)
+- This helps identify issues early before committing to long processing times
+
+## Embedding Model Selection: Understanding the Trade-offs
+
+Based on our experience developing LEANN, embedding models fall into three categories:
+
+### Small Models (< 100M parameters)
+**Example**: `sentence-transformers/all-MiniLM-L6-v2` (22M params)
+- **Pros**: Lightweight, fast for both indexing and inference
+- **Cons**: Lower semantic understanding, may miss nuanced relationships
+- **Use when**: Speed is critical, handling simple queries, interactive mode, or just experimenting with LEANN. If time is not a constraint, consider using a larger/better embedding model
+
+### Medium Models (100M-500M parameters)
+**Example**: `facebook/contriever` (110M params), `BAAI/bge-base-en-v1.5` (110M params)
+- **Pros**: Balanced performance, good multilingual support, reasonable speed
+- **Cons**: Requires more compute than small models
+- **Use when**: Need quality results without extreme compute requirements, general-purpose RAG applications
+
+### Large Models (500M+ parameters)
+**Example**: `Qwen/Qwen3-Embedding-0.6B` (600M params), `intfloat/multilingual-e5-large` (560M params)
+- **Pros**: Best semantic understanding, captures complex relationships, excellent multilingual support. **Qwen3-Embedding-0.6B achieves nearly OpenAI API performance!**
+- **Cons**: Slower inference, longer index build times
+- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
+
+### Quick Start: OpenAI Embeddings (Fastest Setup)
+
+For immediate testing without local model downloads:
+```bash
+# Set OpenAI embeddings (requires OPENAI_API_KEY)
+--embedding-mode openai --embedding-model text-embedding-3-small
+```
+
+<details>
+<summary><strong>Cloud vs Local Trade-offs</strong></summary>
+
+**OpenAI Embeddings** (`text-embedding-3-small/large`)
+- **Pros**: No local compute needed, consistently fast, high quality
+- **Cons**: Requires API key, costs money, data leaves your system, [known limitations with certain languages](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
+- **When to use**: Prototyping, non-sensitive data, need immediate results
+
+**Local Embeddings**
+- **Pros**: Complete privacy, no ongoing costs, full control, can sometimes outperform OpenAI embeddings
+- **Cons**: Slower than cloud APIs, requires local compute resources
+- **When to use**: Production systems, sensitive data, cost-sensitive applications
+
+</details>
+
+## Index Selection: Matching Your Scale
+
+### HNSW (Hierarchical Navigable Small World)
+**Best for**: Small to medium datasets (< 10M vectors) - **Default and recommended for extreme low storage**
+- Full recomputation required
+- High memory usage during build phase
+- Excellent recall (95%+)
+
+```bash
+# Optimal for most use cases
+--backend-name hnsw --graph-degree 32 --build-complexity 64
+```
+
+### DiskANN
+**Best for**: Large datasets (> 10M vectors, 10GB+ index size) - **⚠️ Beta version, still in active development**
+- Uses Product Quantization (PQ) for coarse filtering during graph traversal
+- Novel approach: stores only PQ codes, performs rerank with exact computation in final step
+- Implements a corner case of double-queue: prunes all neighbors and recomputes at the end
+
+```bash
+# For billion-scale deployments
+--backend-name diskann --graph-degree 64 --build-complexity 128
+```
+
+## LLM Selection: Engine and Model Comparison
+
+### LLM Engines
+
+**OpenAI** (`--llm openai`)
+- **Pros**: Best quality, consistent performance, no local resources needed
+- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
+- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
+- **Note**: Our current default, but we recommend switching to Ollama for most use cases
+
+**Ollama** (`--llm ollama`)
+- **Pros**: Fully local, free, privacy-preserving, good model variety
+- **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
+- **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
+
+**HuggingFace** (`--llm hf`)
+- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
+- **Cons**: More complex initial setup
+- **Models**: `Qwen/Qwen3-1.7B-FP8`
+
+## Parameter Tuning Guide
+
+### Search Complexity Parameters
+
+**`--build-complexity`** (index building)
+- Controls thoroughness during index construction
+- Higher = better recall but slower build
+- Recommendations:
+  - 32: Quick prototyping
+  - 64: Balanced (default)
+  - 128: Production systems
+  - 256: Maximum quality
+
+**`--search-complexity`** (query time)
+- Controls search thoroughness
+- Higher = better results but slower
+- Recommendations:
+  - 16: Fast/Interactive search
+  - 32: High quality with diversity
+  - 64+: Maximum accuracy
+
+### Top-K Selection
+
+**`--top-k`** (number of retrieved chunks)
+- More chunks = better context but slower LLM processing
+- Should be always smaller than `--search-complexity`
+- Guidelines:
+  - 10-20: General questions (default: 20)
+  - 30+: Complex multi-hop reasoning requiring comprehensive context
+
+**Trade-off formula**:
+- Retrieval time ∝ log(n) × search_complexity
+- LLM processing time ∝ top_k × chunk_size
+- Total context = top_k × chunk_size tokens
+
+### Graph Degree (HNSW/DiskANN)
+
+**`--graph-degree`**
+- Number of connections per node in the graph
+- Higher = better recall but more memory
+- HNSW: 16-32 (default: 32)
+- DiskANN: 32-128 (default: 64)
+
+
+## Performance Optimization Checklist
+
+### If Embedding is Too Slow
+
+1. **Switch to smaller model**:
+   ```bash
+   # From large model
+   --embedding-model Qwen/Qwen3-Embedding-0.6B
+   # To small model
+   --embedding-model sentence-transformers/all-MiniLM-L6-v2
+   ```
+
+2. **Limit dataset size for testing**:
+   ```bash
+   --max-items 1000  # Process first 1k items only
+   ```
+
+3. **Use MLX on Apple Silicon** (optional optimization):
+   ```bash
+   --embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
+   ```
+
+### If Search Quality is Poor
+
+1. **Increase retrieval count**:
+   ```bash
+   --top-k 30  # Retrieve more candidates
+   ```
+
+2. **Upgrade embedding model**:
+   ```bash
+   # For English
+   --embedding-model BAAI/bge-base-en-v1.5
+   # For multilingual
+   --embedding-model intfloat/multilingual-e5-large
+   ```
+
+## Understanding the Trade-offs
+
+Every configuration choice involves trade-offs:
+
+| Factor | Small/Fast | Large/Quality |
+|--------|------------|---------------|
+| Embedding Model | `all-MiniLM-L6-v2` | `Qwen/Qwen3-Embedding-0.6B` |
+| Chunk Size | 512 tokens | 128 tokens |
+| Index Type | HNSW | DiskANN |
+| LLM | `qwen3:1.7b` | `gpt-4o` |
+
+The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
+
+## Deep Dive: Critical Configuration Decisions
+
+### When to Disable Recomputation
+
+LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
+
+```bash
+--no-recompute  # Disable selective recomputation
+```
+
+**Trade-offs**:
+- **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
+- **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search
+
+**Disable when**:
+- You have abundant storage and memory
+- Need extremely low latency (< 100ms)
+- Running a read-heavy workload where storage cost is acceptable
+
+## Further Reading
+
+- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
+- [LEANN Technical Paper](https://arxiv.org/abs/2506.08276)
+- [DiskANN Original Paper](https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
--- a/docs/features.md
+++ b/docs/features.md
@@ -5,7 +5,7 @@
 - **🔄 Real-time Embeddings** - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
 - **📈 Scalable Architecture** - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
 - **🎯 Graph Pruning** - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
- **🏗️ Pluggable Backends** - DiskANN, HNSW/FAISS with unified API
+- **🏗️ Pluggable Backends** - HNSW/FAISS (default), with optional DiskANN for large-scale deployments

 ## 🛠️ Technical Highlights
 - **🔄 Recompute Mode** - Highest accuracy scenarios while eliminating vector storage overhead
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -2,8 +2,8 @@

 ## 🎯 Q2 2025

- [X] DiskANN backend with MIPS/L2/Cosine support
 - [X] HNSW backend integration
+- [X] DiskANN backend with MIPS/L2/Cosine support
 - [X] Real-time embedding pipeline
 - [X] Memory-efficient graph pruning

--- a/packages/leann-backend-diskann/leann_backend_diskann/diskann_backend.py
+++ b/packages/leann-backend-diskann/leann_backend_diskann/diskann_backend.py
@@ -7,6 +7,7 @@ from pathlib import Path
 from typing import Any, Literal

 import numpy as np
+import psutil
 from leann.interface import (
    LeannBackendBuilderInterface,
    LeannBackendFactoryInterface,
@@ -84,6 +85,43 @@ def _write_vectors_to_bin(data: np.ndarray, file_path: Path):
        f.write(data.tobytes())


+def _calculate_smart_memory_config(data: np.ndarray) -> tuple[float, float]:
+    """
+    Calculate smart memory configuration for DiskANN based on data size and system specs.
+
+    Args:
+        data: The embedding data array
+
+    Returns:
+        tuple: (search_memory_maximum, build_memory_maximum) in GB
+    """
+    num_vectors, dim = data.shape
+
+    # Calculate embedding storage size
+    embedding_size_bytes = num_vectors * dim * 4  # float32 = 4 bytes
+    embedding_size_gb = embedding_size_bytes / (1024**3)
+
+    # search_memory_maximum: 1/10 of embedding size for optimal PQ compression
+    # This controls Product Quantization size - smaller means more compression
+    search_memory_gb = max(0.1, embedding_size_gb / 10)  # At least 100MB
+
+    # build_memory_maximum: Based on available system RAM for sharding control
+    # This controls how much memory DiskANN uses during index construction
+    available_memory_gb = psutil.virtual_memory().available / (1024**3)
+    total_memory_gb = psutil.virtual_memory().total / (1024**3)
+
+    # Use 50% of available memory, but at least 2GB and at most 75% of total
+    build_memory_gb = max(2.0, min(available_memory_gb * 0.5, total_memory_gb * 0.75))
+
+    logger.info(
+        f"Smart memory config - Data: {embedding_size_gb:.2f}GB, "
+        f"Search mem: {search_memory_gb:.2f}GB (PQ control), "
+        f"Build mem: {build_memory_gb:.2f}GB (sharding control)"
+    )
+
+    return search_memory_gb, build_memory_gb
+
+
@register_backend("diskann")
 class DiskannBackend(LeannBackendFactoryInterface):
    @staticmethod
@@ -121,6 +159,16 @@ class DiskannBuilder(LeannBackendBuilderInterface):
                f"Unsupported distance_metric '{build_kwargs.get('distance_metric', 'unknown')}'."
            )

+        # Calculate smart memory configuration if not explicitly provided
+        if (
+            "search_memory_maximum" not in build_kwargs
+            or "build_memory_maximum" not in build_kwargs
+        ):
+            smart_search_mem, smart_build_mem = _calculate_smart_memory_config(data)
+        else:
+            smart_search_mem = build_kwargs.get("search_memory_maximum", 4.0)
+            smart_build_mem = build_kwargs.get("build_memory_maximum", 8.0)
+
        try:
            from . import _diskannpy as diskannpy  # type: ignore

@@ -131,8 +179,8 @@ class DiskannBuilder(LeannBackendBuilderInterface):
                    index_prefix,
                    build_kwargs.get("complexity", 64),
                    build_kwargs.get("graph_degree", 32),
-                    build_kwargs.get("search_memory_maximum", 4.0),
-                    build_kwargs.get("build_memory_maximum", 8.0),
+                    build_kwargs.get("search_memory_maximum", smart_search_mem),
+                    build_kwargs.get("build_memory_maximum", smart_build_mem),
                    build_kwargs.get("num_threads", 8),
                    build_kwargs.get("pq_disk_bytes", 0),
                    "",
--- a/packages/leann-backend-diskann/pyproject.toml
+++ b/packages/leann-backend-diskann/pyproject.toml
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-diskann"
-version = "0.1.16"
-dependencies = ["leann-core==0.1.16", "numpy", "protobuf>=3.19.0"]
+version = "0.2.0"
+dependencies = ["leann-core==0.2.0", "numpy", "protobuf>=3.19.0"]

 [tool.scikit-build]
 # Key: simplified CMake path
--- a/packages/leann-backend-diskann/third_party/DiskANN
+++ b/packages/leann-backend-diskann/third_party/DiskANN
--- a/packages/leann-backend-hnsw/pyproject.toml
+++ b/packages/leann-backend-hnsw/pyproject.toml
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-hnsw"
-version = "0.1.16"
+version = "0.2.0"
 description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
 dependencies = [
-    "leann-core==0.1.16",
+    "leann-core==0.2.0",
    "numpy",
    "pyzmq>=23.0.0",
    "msgpack>=1.0.0",
--- a/packages/leann-core/pyproject.toml
+++ b/packages/leann-core/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann-core"
-version = "0.1.16"
+version = "0.2.0"
 description = "Core API and plugin system for LEANN"
 readme = "README.md"
 requires-python = ">=3.9"
--- a/packages/leann-core/src/leann/api.py
+++ b/packages/leann-core/src/leann/api.py
@@ -636,7 +636,10 @@ class LeannChat:
            "Please provide the best answer you can based on this context and your knowledge."
        )

+        ask_time = time.time()
        ans = self.llm.ask(prompt, **llm_kwargs)
+        ask_time = time.time() - ask_time
+        logger.info(f"  Ask time: {ask_time} seconds")
        return ans

    def start_interactive(self):
--- a/packages/leann-core/src/leann/chat.py
+++ b/packages/leann-core/src/leann/chat.py
@@ -358,7 +358,11 @@ def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
                error_msg += f"\n\nModel '{model_name}' was not found in Ollama's library."

                if suggestions:
-                    error_msg += "\n\nDid you mean one of these installed models?\n"
+                    error_msg += (
+                        "\n\nDid you mean one of these installed models?\n"
+                        + "\nTry to use ollama pull to install the model you need\n"
+                    )
+
                    for i, suggestion in enumerate(suggestions, 1):
                        error_msg += f"  {i}. {suggestion}\n"
                else:
@@ -542,14 +546,41 @@ class HFChat(LLMInterface):
            self.device = "cpu"
            logger.info("No GPU detected. Using CPU.")

-        # Load tokenizer and model
-        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-        self.model = AutoModelForCausalLM.from_pretrained(
-            model_name,
-            torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
-            device_map="auto" if self.device != "cpu" else None,
-            trust_remote_code=True,
-        )
+        # Load tokenizer and model with timeout protection
+        try:
+            import signal
+
+            def timeout_handler(signum, frame):
+                raise TimeoutError("Model download/loading timed out")
+
+            # Set timeout for model loading (60 seconds)
+            old_handler = signal.signal(signal.SIGALRM, timeout_handler)
+            signal.alarm(60)
+
+            try:
+                logger.info(f"Loading tokenizer for {model_name}...")
+                self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+
+                logger.info(f"Loading model {model_name}...")
+                self.model = AutoModelForCausalLM.from_pretrained(
+                    model_name,
+                    torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
+                    device_map="auto" if self.device != "cpu" else None,
+                    trust_remote_code=True,
+                )
+                logger.info(f"Successfully loaded {model_name}")
+            finally:
+                signal.alarm(0)  # Cancel the alarm
+                signal.signal(signal.SIGALRM, old_handler)  # Restore old handler
+
+        except TimeoutError:
+            logger.error(f"Model loading timed out for {model_name}")
+            raise RuntimeError(
+                f"Model loading timed out for {model_name}. Please check your internet connection or try a smaller model."
+            )
+        except Exception as e:
+            logger.error(f"Failed to load model {model_name}: {e}")
+            raise

        # Move model to device if not using device_map
        if self.device != "cpu" and "device_map" not in str(self.model):
--- a/packages/leann-core/src/leann/embedding_server_manager.py
+++ b/packages/leann-core/src/leann/embedding_server_manager.py
@@ -354,13 +354,21 @@ class EmbeddingServerManager:
        self.server_process.terminate()

        try:
-            self.server_process.wait(timeout=5)
+            self.server_process.wait(timeout=3)
            logger.info(f"Server process {self.server_process.pid} terminated.")
        except subprocess.TimeoutExpired:
            logger.warning(
-                f"Server process {self.server_process.pid} did not terminate gracefully, killing it."
+                f"Server process {self.server_process.pid} did not terminate gracefully within 3 seconds, killing it."
            )
            self.server_process.kill()
+            try:
+                self.server_process.wait(timeout=2)
+                logger.info(f"Server process {self.server_process.pid} killed successfully.")
+            except subprocess.TimeoutExpired:
+                logger.error(
+                    f"Failed to kill server process {self.server_process.pid} - it may be hung"
+                )
+                # Don't hang indefinitely

        # Clean up process resources to prevent resource tracker warnings
        try:
--- a/packages/leann/README.md
+++ b/packages/leann/README.md
@@ -5,11 +5,8 @@ LEANN is a revolutionary vector database that democratizes personal AI. Transfor
 ## Installation

 ```bash
-# Default installation (HNSW backend, recommended)
+# Default installation (includes both HNSW and DiskANN backends)
 uv pip install leann
-
-# With DiskANN backend (for large-scale deployments)
-uv pip install leann[diskann]
 ```

 ## Quick Start
@@ -19,8 +16,8 @@ from leann import LeannBuilder, LeannSearcher, LeannChat
 from pathlib import Path
 INDEX_PATH = str(Path("./").resolve() / "demo.leann")

-# Build an index
-builder = LeannBuilder(backend_name="hnsw")
+# Build an index (choose backend: "hnsw" or "diskann")
+builder = LeannBuilder(backend_name="hnsw")  # or "diskann" for large-scale deployments
 builder.add_text("LEANN saves 97% storage compared to traditional vector databases.")
 builder.add_text("Tung Tung Tung Sahur called—they need their banana‑crocodile hybrid back")
 builder.build_index(INDEX_PATH)
--- a/packages/leann/pyproject.toml
+++ b/packages/leann/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann"
-version = "0.1.16"
+version = "0.2.0"
 description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
 readme = "README.md"
 requires-python = ">=3.9"
@@ -24,16 +24,15 @@ classifiers = [
    "Programming Language :: Python :: 3.12",
 ]

-# Default installation: core + hnsw
+# Default installation: core + hnsw + diskann
 dependencies = [
    "leann-core>=0.1.0",
    "leann-backend-hnsw>=0.1.0",
+    "leann-backend-diskann>=0.1.0",
 ]

 [project.optional-dependencies]
-diskann = [
-    "leann-backend-diskann>=0.1.0",
-]
+# All backends now included by default

 [project.urls]
 Repository = "https://github.com/yichuan-w/LEANN"
--- a/uv.lock
+++ b/uv.lock
@@ -1650,7 +1650,7 @@ name = "importlib-metadata"
 version = "8.7.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "zipp" },
+    { name = "zipp", marker = "python_full_version < '3.10'" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641 }
 wheels = [
@@ -2155,7 +2155,7 @@ wheels = [

 [[package]]
 name = "leann-backend-diskann"
-version = "0.1.15"
+version = "0.2.0"
 source = { editable = "packages/leann-backend-diskann" }
 dependencies = [
    { name = "leann-core" },
@@ -2167,14 +2167,14 @@ dependencies = [

 [package.metadata]
 requires-dist = [
-    { name = "leann-core", specifier = "==0.1.15" },
+    { name = "leann-core", specifier = "==0.2.0" },
    { name = "numpy" },
    { name = "protobuf", specifier = ">=3.19.0" },
 ]

 [[package]]
 name = "leann-backend-hnsw"
-version = "0.1.15"
+version = "0.2.0"
 source = { editable = "packages/leann-backend-hnsw" }
 dependencies = [
    { name = "leann-core" },
@@ -2187,7 +2187,7 @@ dependencies = [

 [package.metadata]
 requires-dist = [
-    { name = "leann-core", specifier = "==0.1.15" },
+    { name = "leann-core", specifier = "==0.2.0" },
    { name = "msgpack", specifier = ">=1.0.0" },
    { name = "numpy" },
    { name = "pyzmq", specifier = ">=23.0.0" },
@@ -2195,7 +2195,7 @@ requires-dist = [

 [[package]]
 name = "leann-core"
-version = "0.1.15"
+version = "0.2.0"
 source = { editable = "packages/leann-core" }
 dependencies = [
    { name = "accelerate" },
Author	SHA1	Message	Date
Andy Lee	8eee90bf80	docs: add a link	2025-08-04 20:10:14 -07:00
Andy Lee	649d4ad03e	docs: Address all configuration guide feedback - Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations	2025-08-04 20:01:23 -07:00
Andy Lee	d9b6f195c5	docs: Improve configuration guide based on feedback - List specific files in default data/ directory (2 AI papers, literature, tech report) - Update examples to use English and better RAG-suitable queries - Change full dataset reference to use --max-items -1 - Adjust small model guidance about upgrading to larger models when time allows - Update top-k defaults to reflect actual default of 20 - Ensure consistent use of full model name Qwen/Qwen3-Embedding-0.6B - Reorder optimization steps, move MLX to third position - Remove incorrect chunk size tuning guidance - Change README from 'Having trouble' to 'Need best practices'	2025-08-04 19:29:17 -07:00
Andy Lee	00f506c0bd	docs: Adjust DiskANN positioning in features and roadmap - features.md: Put HNSW/FAISS first as default, DiskANN as optional - roadmap.md: Reorder to show HNSW integration before DiskANN - Consistent with positioning DiskANN as advanced option for large-scale use	2025-08-04 17:53:27 -07:00
Andy Lee	e872dd1d23	docs: Weaken DiskANN emphasis in README - Change backend description to emphasize HNSW as default - DiskANN positioned as optional for billion-scale datasets - Simplify evaluation commands to be more generic	2025-08-04 17:51:21 -07:00
Andy Lee	063c687ff7	chore: move evaluation data .gitattributes to correct location	2025-08-04 17:46:17 -07:00
Andy Lee	bb8ecd54d7	feat: add comprehensive configuration guide and update README - Create docs/configuration-guide.md with detailed guidance on: - Embedding model selection (small/medium/large) - Index selection (HNSW vs DiskANN) - LLM engine and model comparison - Parameter tuning (build/search complexity, top-k) - Performance optimization tips - Deep dive into LEANN's recomputation feature - Update README.md to link to the configuration guide - Include latest 2025 model recommendations (Qwen3, DeepSeek-R1, O3-mini)	2025-08-04 17:41:27 -07:00
Andy Lee	716217ae24	docs: config guidance	2025-08-04 16:21:13 -07:00
Andy Lee	dd71ac8d71	feat: implement smart memory configuration for DiskANN (#16 ) - Add intelligent memory calculation based on data size and system specs - search_memory_maximum: 1/10 of embedding size (controls PQ compression) - build_memory_maximum: 50% of available RAM (controls sharding) - Provides optimal balance between performance and memory usage - Automatic fallback to default values if parameters are explicitly provided	2025-08-04 14:36:29 -07:00
GitHub Actions	8bee1d4100	chore: release v0.2.0	2025-08-04 21:34:31 +00:00
yichuan520030910320	33521d6d00	add logs	2025-08-04 14:15:52 -07:00
Andy Lee	8899734952	refactor: Unify examples interface with BaseRAGExample (#12 ) * refactor: Unify examples interface with BaseRAGExample - Create BaseRAGExample base class for all RAG examples - Refactor 4 examples to use unified interface: - document_rag.py (replaces main_cli_example.py) - email_rag.py (replaces mail_reader_leann.py) - browser_rag.py (replaces google_history_reader_leann.py) - wechat_rag.py (replaces wechat_history_reader_leann.py) - Maintain 100% parameter compatibility with original files - Add interactive mode support for all examples - Unify parameter names (--max-items replaces --max-emails/--max-entries) - Update README.md with new examples usage - Add PARAMETER_CONSISTENCY.md documenting all parameter mappings - Keep main_cli_example.py for backward compatibility with migration notice All default values, LeannBuilder parameters, and chunking settings remain identical to ensure full compatibility with existing indexes. * fix: Update CI tests for new unified examples interface - Rename test_main_cli.py to test_document_rag.py - Update all references from main_cli_example.py to document_rag.py - Update tests/README.md documentation The tests now properly test the new unified interface while maintaining the same test coverage and functionality. * fix: Fix pre-commit issues and update tests - Fix import sorting and unused imports - Update type annotations to use built-in types (list, dict) instead of typing.List/Dict - Fix trailing whitespace and end-of-file issues - Fix Chinese fullwidth comma to regular comma - Update test_main_cli.py to test_document_rag.py - Add backward compatibility test for main_cli_example.py - Pass all pre-commit hooks (ruff, ruff-format, etc.) * refactor: Remove old example scripts and migration references - Delete old example scripts (mail_reader_leann.py, google_history_reader_leann.py, etc.) - Remove migration hints and backward compatibility - Update tests to use new unified examples directly - Clean up all references to old script names - Users now only see the new unified interface * fix: Restore embedding-mode parameter to all examples - All examples now have --embedding-mode parameter (unified interface benefit) - Default is 'sentence-transformers' (consistent with original behavior) - Users can now use OpenAI or MLX embeddings with any data source - Maintains functional equivalence with original scripts * docs: Improve parameter categorization in README - Clearly separate core (shared) vs specific parameters - Move LLM and embedding examples to 'Example Commands' section - Add descriptive comments for all specific parameters - Keep only truly data-source-specific parameters in specific sections * docs: Make example commands more representative - Add default values to parameter descriptions - Replace generic examples with real-world use cases - Focus on data-source-specific features in examples - Remove redundant demonstrations of common parameters * docs: Reorganize parameter documentation structure - Move common parameters to a dedicated section before all examples - Rename sections to 'X-Specific Arguments' for clarity - Remove duplicate common parameters from individual examples - Better information architecture for users * docs: polish applications * docs: Add CLI installation instructions - Add two installation options: venv and global uv tool - Clearly explain when to use each option - Make CLI more accessible for daily use * docs: Clarify CLI global installation process - Explain the transition from venv to global installation - Add upgrade command for global installation - Make it clear that global install allows usage without venv activation * docs: Add collapsible section for CLI installation - Wrap CLI installation instructions in details/summary tags - Keep consistent with other collapsible sections in README - Improve document readability and navigation * style: format * docs: Fix collapsible sections - Make Common Parameters collapsible (as it's lengthy reference material) - Keep CLI Installation visible (important for users to see immediately) - Better information hierarchy * docs: Add introduction for Common Parameters section - Add 'Flexible Configuration' heading with descriptive sentence - Create parallel structure with 'Generation Model Setup' section - Improve document flow and readability * docs: nit * fix: Fix issues in unified examples - Add smart path detection for data directory - Fix add_texts -> add_text method call - Handle both running from project root and examples directory * fix: Fix async/await and add_text issues in unified examples - Remove incorrect await from chat.ask() calls (not async) - Fix add_texts -> add_text method calls - Verify search-complexity correctly maps to efSearch parameter - All examples now run successfully * feat: Address review comments - Add complexity parameter to LeannChat initialization (default: search_complexity) - Fix chunk-size default in README documentation (256, not 2048) - Add more index building parameters as CLI arguments: - --backend-name (hnsw/diskann) - --graph-degree (default: 32) - --build-complexity (default: 64) - --no-compact (disable compact storage) - --no-recompute (disable embedding recomputation) - Update README to document all new parameters * feat: Add chunk-size parameters and improve file type filtering - Add --chunk-size and --chunk-overlap parameters to all RAG examples - Preserve original default values for each data source: - Document: 256/128 (optimized for general documents) - Email: 256/25 (smaller overlap for email threads) - Browser: 256/128 (standard for web content) - WeChat: 192/64 (smaller chunks for chat messages) - Make --file-types optional filter instead of restriction in document_rag - Update README to clarify interactive mode and parameter usage - Fix LLM default model documentation (gpt-4o, not gpt-4o-mini) * feat: Update documentation based on review feedback - Add MLX embedding example to README - Clarify examples/data content description (two papers, Pride and Prejudice, Chinese README) - Move chunk parameters to common parameters section - Remove duplicate chunk parameters from document-specific section * docs: Emphasize diverse data sources in examples/data description * fix: update default embedding models for better performance - Change WeChat, Browser, and Email RAG examples to use all-MiniLM-L6-v2 - Previous Qwen/Qwen3-Embedding-0.6B was too slow for these use cases - all-MiniLM-L6-v2 is a fast 384-dim model, ideal for large-scale personal data * add response highlight * change rebuild logic * fix some example * feat: check if k is larger than #docs * fix: WeChat history reader bugs and refactor wechat_rag to use unified architecture * fix email wrong -1 to process all file * refactor: reorgnize all examples/ and test/ * refactor: reorganize examples and add link checker * fix: add init.py * fix: handle certificate errors in link checker * fix wechat * merge * docs: update README to use proper module imports for apps - Change from 'python apps/xxx.py' to 'python -m apps.xxx' - More professional and pythonic module calling - Ensures proper module resolution and imports - Better separation between apps/ (production tools) and examples/ (demos) --------- Co-authored-by: yichuan520030910320 <yichuan_wang@berkeley.edu>	2025-08-03 23:06:24 -07:00
Andy Lee	54df6310c5	fix: diskann build and prevent termination from hanging - Fix OpenMP library linking in DiskANN CMake configuration - Add timeout protection for HuggingFace model loading to prevent hangs - Improve embedding server process termination with better timeouts - Make DiskANN backend default enabled alongside HNSW - Update documentation to reflect both backends included by default	2025-08-03 21:16:52 -07:00