Compare commits
38 Commits
config
...
refactor-a
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0877960547 | ||
|
|
d68af63d05 | ||
|
|
b844aca968 | ||
|
|
85277ba67a | ||
|
|
e9562acdc2 | ||
|
|
7fd3db1ddb | ||
|
|
c1ccc51a75 | ||
|
|
b0239b6e4d | ||
|
|
58556ef44c | ||
|
|
87c930d705 | ||
|
|
86f919a6da | ||
|
|
f8d34663b4 | ||
|
|
568cf597f4 | ||
|
|
baf70dc411 | ||
|
|
7ad2ec39d6 | ||
|
|
31fd3c816a | ||
|
|
1f6c7f2f5a | ||
|
|
c1124eb349 | ||
|
|
274bbb19ea | ||
|
|
8c152c7a31 | ||
|
|
ce77eef13a | ||
|
|
9d77175ac8 | ||
|
|
7fbb6c98ef | ||
|
|
914a248c28 | ||
|
|
55fc5862f9 | ||
|
|
fd97b8dfa8 | ||
|
|
57959947a1 | ||
|
|
cc0c091ca5 | ||
|
|
ff389c7d8d | ||
|
|
6780a8eaba | ||
|
|
984056f126 | ||
|
|
bd4451bf50 | ||
|
|
34e313f64a | ||
|
|
ddc789b231 | ||
|
|
ff1b622bdd | ||
|
|
3cde4fc7b3 | ||
|
|
4e3bcda5fa | ||
|
|
46f6f76fc3 |
@@ -170,8 +170,6 @@ ollama pull llama3.2:1b
|
|||||||
|
|
||||||
LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.
|
LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.
|
||||||
|
|
||||||
📚 **Need configuration best practices?** Check our [Configuration Guide](docs/configuration-guide.md) for detailed optimization tips, model selection advice, and solutions to common issues like slow embeddings or poor search quality.
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>
|
<summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>
|
||||||
|
|
||||||
@@ -516,7 +514,7 @@ Options:
|
|||||||
- **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
|
- **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
|
||||||
- **Two-level search:** Smart graph traversal that prioritizes promising nodes
|
- **Two-level search:** Smart graph traversal that prioritizes promising nodes
|
||||||
|
|
||||||
**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
|
**Backends:** DiskANN or HNSW - pick what works for your data size.
|
||||||
|
|
||||||
## Benchmarks
|
## Benchmarks
|
||||||
|
|
||||||
@@ -536,7 +534,8 @@ Options:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
uv pip install -e ".[dev]" # Install dev dependencies
|
uv pip install -e ".[dev]" # Install dev dependencies
|
||||||
python benchmarks/run_evaluation.py # Will auto-download evaluation data and run benchmarks
|
python benchmarks/run_evaluation.py data/indices/dpr/dpr_diskann # DPR dataset
|
||||||
|
python benchmarks/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index # Wikipedia
|
||||||
```
|
```
|
||||||
|
|
||||||
The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!
|
The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!
|
||||||
|
|||||||
@@ -99,9 +99,7 @@ if __name__ == "__main__":
|
|||||||
print("- 'What are the main techniques LEANN uses?'")
|
print("- 'What are the main techniques LEANN uses?'")
|
||||||
print("- 'What is the technique DLPM?'")
|
print("- 'What is the technique DLPM?'")
|
||||||
print("- 'Who does Elizabeth Bennet marry?'")
|
print("- 'Who does Elizabeth Bennet marry?'")
|
||||||
print(
|
print("- 'What is the problem of developing pan gu model? (盘古大模型开发中遇到什么问题?)'")
|
||||||
"- 'What is the problem of developing pan gu model Huawei meets? (盘古大模型开发中遇到什么问题?)'"
|
|
||||||
)
|
|
||||||
print("\nOr run without --query for interactive mode\n")
|
print("\nOr run without --query for interactive mode\n")
|
||||||
|
|
||||||
rag = DocumentRAG()
|
rag = DocumentRAG()
|
||||||
|
|||||||
@@ -1,236 +0,0 @@
|
|||||||
# LEANN Configuration Guide
|
|
||||||
|
|
||||||
This guide helps you optimize LEANN for different use cases and understand the trade-offs between various configuration options.
|
|
||||||
|
|
||||||
## Getting Started: Simple is Better
|
|
||||||
|
|
||||||
When first trying LEANN, start with a small dataset to quickly validate your approach:
|
|
||||||
|
|
||||||
**For document RAG**: The default `data/` directory works perfectly - includes 2 AI research papers, Pride and Prejudice literature, and a technical report
|
|
||||||
```bash
|
|
||||||
python -m apps.document_rag --query "What techniques does LEANN use?"
|
|
||||||
```
|
|
||||||
|
|
||||||
**For other data sources**: Limit the dataset size for quick testing
|
|
||||||
```bash
|
|
||||||
# WeChat: Test with recent messages only
|
|
||||||
python -m apps.wechat_rag --max-items 100 --query "What did we discuss about the project timeline?"
|
|
||||||
|
|
||||||
# Browser history: Last few days
|
|
||||||
python -m apps.browser_rag --max-items 500 --query "Find documentation about vector databases"
|
|
||||||
|
|
||||||
# Email: Recent inbox
|
|
||||||
python -m apps.email_rag --max-items 200 --query "Who sent updates about the deployment status?"
|
|
||||||
```
|
|
||||||
|
|
||||||
Once validated, scale up gradually:
|
|
||||||
- 100 documents → 1,000 → 10,000 → full dataset (`--max-items -1`)
|
|
||||||
- This helps identify issues early before committing to long processing times
|
|
||||||
|
|
||||||
## Embedding Model Selection: Understanding the Trade-offs
|
|
||||||
|
|
||||||
Based on our experience developing LEANN, embedding models fall into three categories:
|
|
||||||
|
|
||||||
### Small Models (< 100M parameters)
|
|
||||||
**Example**: `sentence-transformers/all-MiniLM-L6-v2` (22M params)
|
|
||||||
- **Pros**: Lightweight, fast for both indexing and inference
|
|
||||||
- **Cons**: Lower semantic understanding, may miss nuanced relationships
|
|
||||||
- **Use when**: Speed is critical, handling simple queries, interactive mode, or just experimenting with LEANN. If time is not a constraint, consider using a larger/better embedding model
|
|
||||||
|
|
||||||
### Medium Models (100M-500M parameters)
|
|
||||||
**Example**: `facebook/contriever` (110M params), `BAAI/bge-base-en-v1.5` (110M params)
|
|
||||||
- **Pros**: Balanced performance, good multilingual support, reasonable speed
|
|
||||||
- **Cons**: Requires more compute than small models
|
|
||||||
- **Use when**: Need quality results without extreme compute requirements, general-purpose RAG applications
|
|
||||||
|
|
||||||
### Large Models (500M+ parameters)
|
|
||||||
**Example**: `Qwen/Qwen3-Embedding-0.6B` (600M params), `intfloat/multilingual-e5-large` (560M params)
|
|
||||||
- **Pros**: Best semantic understanding, captures complex relationships, excellent multilingual support. **Qwen3-Embedding-0.6B achieves nearly OpenAI API performance!**
|
|
||||||
- **Cons**: Slower inference, longer index build times
|
|
||||||
- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
|
|
||||||
|
|
||||||
### Quick Start: OpenAI Embeddings (Fastest Setup)
|
|
||||||
|
|
||||||
For immediate testing without local model downloads:
|
|
||||||
```bash
|
|
||||||
# Set OpenAI embeddings (requires OPENAI_API_KEY)
|
|
||||||
--embedding-mode openai --embedding-model text-embedding-3-small
|
|
||||||
```
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary><strong>Cloud vs Local Trade-offs</strong></summary>
|
|
||||||
|
|
||||||
**OpenAI Embeddings** (`text-embedding-3-small/large`)
|
|
||||||
- **Pros**: No local compute needed, consistently fast, high quality
|
|
||||||
- **Cons**: Requires API key, costs money, data leaves your system, [known limitations with certain languages](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
|
|
||||||
- **When to use**: Prototyping, non-sensitive data, need immediate results
|
|
||||||
|
|
||||||
**Local Embeddings**
|
|
||||||
- **Pros**: Complete privacy, no ongoing costs, full control, can sometimes outperform OpenAI embeddings
|
|
||||||
- **Cons**: Slower than cloud APIs, requires local compute resources
|
|
||||||
- **When to use**: Production systems, sensitive data, cost-sensitive applications
|
|
||||||
|
|
||||||
</details>
|
|
||||||
|
|
||||||
## Index Selection: Matching Your Scale
|
|
||||||
|
|
||||||
### HNSW (Hierarchical Navigable Small World)
|
|
||||||
**Best for**: Small to medium datasets (< 10M vectors) - **Default and recommended for extreme low storage**
|
|
||||||
- Full recomputation required
|
|
||||||
- High memory usage during build phase
|
|
||||||
- Excellent recall (95%+)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Optimal for most use cases
|
|
||||||
--backend-name hnsw --graph-degree 32 --build-complexity 64
|
|
||||||
```
|
|
||||||
|
|
||||||
### DiskANN
|
|
||||||
**Best for**: Large datasets (> 10M vectors, 10GB+ index size) - **⚠️ Beta version, still in active development**
|
|
||||||
- Uses Product Quantization (PQ) for coarse filtering during graph traversal
|
|
||||||
- Novel approach: stores only PQ codes, performs rerank with exact computation in final step
|
|
||||||
- Implements a corner case of double-queue: prunes all neighbors and recomputes at the end
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# For billion-scale deployments
|
|
||||||
--backend-name diskann --graph-degree 64 --build-complexity 128
|
|
||||||
```
|
|
||||||
|
|
||||||
## LLM Selection: Engine and Model Comparison
|
|
||||||
|
|
||||||
### LLM Engines
|
|
||||||
|
|
||||||
**OpenAI** (`--llm openai`)
|
|
||||||
- **Pros**: Best quality, consistent performance, no local resources needed
|
|
||||||
- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
|
|
||||||
- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
|
|
||||||
- **Note**: Our current default, but we recommend switching to Ollama for most use cases
|
|
||||||
|
|
||||||
**Ollama** (`--llm ollama`)
|
|
||||||
- **Pros**: Fully local, free, privacy-preserving, good model variety
|
|
||||||
- **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
|
|
||||||
- **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
|
|
||||||
|
|
||||||
**HuggingFace** (`--llm hf`)
|
|
||||||
- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
|
|
||||||
- **Cons**: More complex initial setup
|
|
||||||
- **Models**: `Qwen/Qwen3-1.7B-FP8`
|
|
||||||
|
|
||||||
## Parameter Tuning Guide
|
|
||||||
|
|
||||||
### Search Complexity Parameters
|
|
||||||
|
|
||||||
**`--build-complexity`** (index building)
|
|
||||||
- Controls thoroughness during index construction
|
|
||||||
- Higher = better recall but slower build
|
|
||||||
- Recommendations:
|
|
||||||
- 32: Quick prototyping
|
|
||||||
- 64: Balanced (default)
|
|
||||||
- 128: Production systems
|
|
||||||
- 256: Maximum quality
|
|
||||||
|
|
||||||
**`--search-complexity`** (query time)
|
|
||||||
- Controls search thoroughness
|
|
||||||
- Higher = better results but slower
|
|
||||||
- Recommendations:
|
|
||||||
- 16: Fast/Interactive search
|
|
||||||
- 32: High quality with diversity
|
|
||||||
- 64+: Maximum accuracy
|
|
||||||
|
|
||||||
### Top-K Selection
|
|
||||||
|
|
||||||
**`--top-k`** (number of retrieved chunks)
|
|
||||||
- More chunks = better context but slower LLM processing
|
|
||||||
- Should be always smaller than `--search-complexity`
|
|
||||||
- Guidelines:
|
|
||||||
- 10-20: General questions (default: 20)
|
|
||||||
- 30+: Complex multi-hop reasoning requiring comprehensive context
|
|
||||||
|
|
||||||
**Trade-off formula**:
|
|
||||||
- Retrieval time ∝ log(n) × search_complexity
|
|
||||||
- LLM processing time ∝ top_k × chunk_size
|
|
||||||
- Total context = top_k × chunk_size tokens
|
|
||||||
|
|
||||||
### Graph Degree (HNSW/DiskANN)
|
|
||||||
|
|
||||||
**`--graph-degree`**
|
|
||||||
- Number of connections per node in the graph
|
|
||||||
- Higher = better recall but more memory
|
|
||||||
- HNSW: 16-32 (default: 32)
|
|
||||||
- DiskANN: 32-128 (default: 64)
|
|
||||||
|
|
||||||
|
|
||||||
## Performance Optimization Checklist
|
|
||||||
|
|
||||||
### If Embedding is Too Slow
|
|
||||||
|
|
||||||
1. **Switch to smaller model**:
|
|
||||||
```bash
|
|
||||||
# From large model
|
|
||||||
--embedding-model Qwen/Qwen3-Embedding-0.6B
|
|
||||||
# To small model
|
|
||||||
--embedding-model sentence-transformers/all-MiniLM-L6-v2
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Limit dataset size for testing**:
|
|
||||||
```bash
|
|
||||||
--max-items 1000 # Process first 1k items only
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Use MLX on Apple Silicon** (optional optimization):
|
|
||||||
```bash
|
|
||||||
--embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
|
|
||||||
```
|
|
||||||
|
|
||||||
### If Search Quality is Poor
|
|
||||||
|
|
||||||
1. **Increase retrieval count**:
|
|
||||||
```bash
|
|
||||||
--top-k 30 # Retrieve more candidates
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Upgrade embedding model**:
|
|
||||||
```bash
|
|
||||||
# For English
|
|
||||||
--embedding-model BAAI/bge-base-en-v1.5
|
|
||||||
# For multilingual
|
|
||||||
--embedding-model intfloat/multilingual-e5-large
|
|
||||||
```
|
|
||||||
|
|
||||||
## Understanding the Trade-offs
|
|
||||||
|
|
||||||
Every configuration choice involves trade-offs:
|
|
||||||
|
|
||||||
| Factor | Small/Fast | Large/Quality |
|
|
||||||
|--------|------------|---------------|
|
|
||||||
| Embedding Model | `all-MiniLM-L6-v2` | `Qwen/Qwen3-Embedding-0.6B` |
|
|
||||||
| Chunk Size | 512 tokens | 128 tokens |
|
|
||||||
| Index Type | HNSW | DiskANN |
|
|
||||||
| LLM | `qwen3:1.7b` | `gpt-4o` |
|
|
||||||
|
|
||||||
The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
|
|
||||||
|
|
||||||
## Deep Dive: Critical Configuration Decisions
|
|
||||||
|
|
||||||
### When to Disable Recomputation
|
|
||||||
|
|
||||||
LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
--no-recompute # Disable selective recomputation
|
|
||||||
```
|
|
||||||
|
|
||||||
**Trade-offs**:
|
|
||||||
- **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
|
|
||||||
- **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search
|
|
||||||
|
|
||||||
**Disable when**:
|
|
||||||
- You have abundant storage and memory
|
|
||||||
- Need extremely low latency (< 100ms)
|
|
||||||
- Running a read-heavy workload where storage cost is acceptable
|
|
||||||
|
|
||||||
## Further Reading
|
|
||||||
|
|
||||||
- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
|
|
||||||
- [LEANN Technical Paper](https://arxiv.org/abs/2506.08276)
|
|
||||||
- [DiskANN Original Paper](https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
|
|
||||||
@@ -5,7 +5,7 @@
|
|||||||
- **🔄 Real-time Embeddings** - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
|
- **🔄 Real-time Embeddings** - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
|
||||||
- **📈 Scalable Architecture** - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
|
- **📈 Scalable Architecture** - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
|
||||||
- **🎯 Graph Pruning** - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
|
- **🎯 Graph Pruning** - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
|
||||||
- **🏗️ Pluggable Backends** - HNSW/FAISS (default), with optional DiskANN for large-scale deployments
|
- **🏗️ Pluggable Backends** - DiskANN, HNSW/FAISS with unified API
|
||||||
|
|
||||||
## 🛠️ Technical Highlights
|
## 🛠️ Technical Highlights
|
||||||
- **🔄 Recompute Mode** - Highest accuracy scenarios while eliminating vector storage overhead
|
- **🔄 Recompute Mode** - Highest accuracy scenarios while eliminating vector storage overhead
|
||||||
|
|||||||
@@ -2,8 +2,8 @@
|
|||||||
|
|
||||||
## 🎯 Q2 2025
|
## 🎯 Q2 2025
|
||||||
|
|
||||||
- [X] HNSW backend integration
|
|
||||||
- [X] DiskANN backend with MIPS/L2/Cosine support
|
- [X] DiskANN backend with MIPS/L2/Cosine support
|
||||||
|
- [X] HNSW backend integration
|
||||||
- [X] Real-time embedding pipeline
|
- [X] Real-time embedding pipeline
|
||||||
- [X] Memory-efficient graph pruning
|
- [X] Memory-efficient graph pruning
|
||||||
|
|
||||||
|
|||||||
@@ -7,7 +7,6 @@ from pathlib import Path
|
|||||||
from typing import Any, Literal
|
from typing import Any, Literal
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import psutil
|
|
||||||
from leann.interface import (
|
from leann.interface import (
|
||||||
LeannBackendBuilderInterface,
|
LeannBackendBuilderInterface,
|
||||||
LeannBackendFactoryInterface,
|
LeannBackendFactoryInterface,
|
||||||
@@ -85,43 +84,6 @@ def _write_vectors_to_bin(data: np.ndarray, file_path: Path):
|
|||||||
f.write(data.tobytes())
|
f.write(data.tobytes())
|
||||||
|
|
||||||
|
|
||||||
def _calculate_smart_memory_config(data: np.ndarray) -> tuple[float, float]:
|
|
||||||
"""
|
|
||||||
Calculate smart memory configuration for DiskANN based on data size and system specs.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
data: The embedding data array
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
tuple: (search_memory_maximum, build_memory_maximum) in GB
|
|
||||||
"""
|
|
||||||
num_vectors, dim = data.shape
|
|
||||||
|
|
||||||
# Calculate embedding storage size
|
|
||||||
embedding_size_bytes = num_vectors * dim * 4 # float32 = 4 bytes
|
|
||||||
embedding_size_gb = embedding_size_bytes / (1024**3)
|
|
||||||
|
|
||||||
# search_memory_maximum: 1/10 of embedding size for optimal PQ compression
|
|
||||||
# This controls Product Quantization size - smaller means more compression
|
|
||||||
search_memory_gb = max(0.1, embedding_size_gb / 10) # At least 100MB
|
|
||||||
|
|
||||||
# build_memory_maximum: Based on available system RAM for sharding control
|
|
||||||
# This controls how much memory DiskANN uses during index construction
|
|
||||||
available_memory_gb = psutil.virtual_memory().available / (1024**3)
|
|
||||||
total_memory_gb = psutil.virtual_memory().total / (1024**3)
|
|
||||||
|
|
||||||
# Use 50% of available memory, but at least 2GB and at most 75% of total
|
|
||||||
build_memory_gb = max(2.0, min(available_memory_gb * 0.5, total_memory_gb * 0.75))
|
|
||||||
|
|
||||||
logger.info(
|
|
||||||
f"Smart memory config - Data: {embedding_size_gb:.2f}GB, "
|
|
||||||
f"Search mem: {search_memory_gb:.2f}GB (PQ control), "
|
|
||||||
f"Build mem: {build_memory_gb:.2f}GB (sharding control)"
|
|
||||||
)
|
|
||||||
|
|
||||||
return search_memory_gb, build_memory_gb
|
|
||||||
|
|
||||||
|
|
||||||
@register_backend("diskann")
|
@register_backend("diskann")
|
||||||
class DiskannBackend(LeannBackendFactoryInterface):
|
class DiskannBackend(LeannBackendFactoryInterface):
|
||||||
@staticmethod
|
@staticmethod
|
||||||
@@ -159,16 +121,6 @@ class DiskannBuilder(LeannBackendBuilderInterface):
|
|||||||
f"Unsupported distance_metric '{build_kwargs.get('distance_metric', 'unknown')}'."
|
f"Unsupported distance_metric '{build_kwargs.get('distance_metric', 'unknown')}'."
|
||||||
)
|
)
|
||||||
|
|
||||||
# Calculate smart memory configuration if not explicitly provided
|
|
||||||
if (
|
|
||||||
"search_memory_maximum" not in build_kwargs
|
|
||||||
or "build_memory_maximum" not in build_kwargs
|
|
||||||
):
|
|
||||||
smart_search_mem, smart_build_mem = _calculate_smart_memory_config(data)
|
|
||||||
else:
|
|
||||||
smart_search_mem = build_kwargs.get("search_memory_maximum", 4.0)
|
|
||||||
smart_build_mem = build_kwargs.get("build_memory_maximum", 8.0)
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from . import _diskannpy as diskannpy # type: ignore
|
from . import _diskannpy as diskannpy # type: ignore
|
||||||
|
|
||||||
@@ -179,8 +131,8 @@ class DiskannBuilder(LeannBackendBuilderInterface):
|
|||||||
index_prefix,
|
index_prefix,
|
||||||
build_kwargs.get("complexity", 64),
|
build_kwargs.get("complexity", 64),
|
||||||
build_kwargs.get("graph_degree", 32),
|
build_kwargs.get("graph_degree", 32),
|
||||||
build_kwargs.get("search_memory_maximum", smart_search_mem),
|
build_kwargs.get("search_memory_maximum", 4.0),
|
||||||
build_kwargs.get("build_memory_maximum", smart_build_mem),
|
build_kwargs.get("build_memory_maximum", 8.0),
|
||||||
build_kwargs.get("num_threads", 8),
|
build_kwargs.get("num_threads", 8),
|
||||||
build_kwargs.get("pq_disk_bytes", 0),
|
build_kwargs.get("pq_disk_bytes", 0),
|
||||||
"",
|
"",
|
||||||
|
|||||||
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "leann-backend-diskann"
|
name = "leann-backend-diskann"
|
||||||
version = "0.2.0"
|
version = "0.1.16"
|
||||||
dependencies = ["leann-core==0.2.0", "numpy", "protobuf>=3.19.0"]
|
dependencies = ["leann-core==0.1.16", "numpy", "protobuf>=3.19.0"]
|
||||||
|
|
||||||
[tool.scikit-build]
|
[tool.scikit-build]
|
||||||
# Key: simplified CMake path
|
# Key: simplified CMake path
|
||||||
|
|||||||
Submodule packages/leann-backend-diskann/third_party/DiskANN updated: 67a2611ad1...af2a26481e
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "leann-backend-hnsw"
|
name = "leann-backend-hnsw"
|
||||||
version = "0.2.0"
|
version = "0.1.16"
|
||||||
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
|
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"leann-core==0.2.0",
|
"leann-core==0.1.16",
|
||||||
"numpy",
|
"numpy",
|
||||||
"pyzmq>=23.0.0",
|
"pyzmq>=23.0.0",
|
||||||
"msgpack>=1.0.0",
|
"msgpack>=1.0.0",
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "leann-core"
|
name = "leann-core"
|
||||||
version = "0.2.0"
|
version = "0.1.16"
|
||||||
description = "Core API and plugin system for LEANN"
|
description = "Core API and plugin system for LEANN"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.9"
|
requires-python = ">=3.9"
|
||||||
|
|||||||
@@ -636,10 +636,7 @@ class LeannChat:
|
|||||||
"Please provide the best answer you can based on this context and your knowledge."
|
"Please provide the best answer you can based on this context and your knowledge."
|
||||||
)
|
)
|
||||||
|
|
||||||
ask_time = time.time()
|
|
||||||
ans = self.llm.ask(prompt, **llm_kwargs)
|
ans = self.llm.ask(prompt, **llm_kwargs)
|
||||||
ask_time = time.time() - ask_time
|
|
||||||
logger.info(f" Ask time: {ask_time} seconds")
|
|
||||||
return ans
|
return ans
|
||||||
|
|
||||||
def start_interactive(self):
|
def start_interactive(self):
|
||||||
|
|||||||
@@ -358,11 +358,7 @@ def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
|
|||||||
error_msg += f"\n\nModel '{model_name}' was not found in Ollama's library."
|
error_msg += f"\n\nModel '{model_name}' was not found in Ollama's library."
|
||||||
|
|
||||||
if suggestions:
|
if suggestions:
|
||||||
error_msg += (
|
error_msg += "\n\nDid you mean one of these installed models?\n"
|
||||||
"\n\nDid you mean one of these installed models?\n"
|
|
||||||
+ "\nTry to use ollama pull to install the model you need\n"
|
|
||||||
)
|
|
||||||
|
|
||||||
for i, suggestion in enumerate(suggestions, 1):
|
for i, suggestion in enumerate(suggestions, 1):
|
||||||
error_msg += f" {i}. {suggestion}\n"
|
error_msg += f" {i}. {suggestion}\n"
|
||||||
else:
|
else:
|
||||||
@@ -546,41 +542,14 @@ class HFChat(LLMInterface):
|
|||||||
self.device = "cpu"
|
self.device = "cpu"
|
||||||
logger.info("No GPU detected. Using CPU.")
|
logger.info("No GPU detected. Using CPU.")
|
||||||
|
|
||||||
# Load tokenizer and model with timeout protection
|
# Load tokenizer and model
|
||||||
try:
|
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||||
import signal
|
self.model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_name,
|
||||||
def timeout_handler(signum, frame):
|
torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
|
||||||
raise TimeoutError("Model download/loading timed out")
|
device_map="auto" if self.device != "cpu" else None,
|
||||||
|
trust_remote_code=True,
|
||||||
# Set timeout for model loading (60 seconds)
|
)
|
||||||
old_handler = signal.signal(signal.SIGALRM, timeout_handler)
|
|
||||||
signal.alarm(60)
|
|
||||||
|
|
||||||
try:
|
|
||||||
logger.info(f"Loading tokenizer for {model_name}...")
|
|
||||||
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
||||||
|
|
||||||
logger.info(f"Loading model {model_name}...")
|
|
||||||
self.model = AutoModelForCausalLM.from_pretrained(
|
|
||||||
model_name,
|
|
||||||
torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
|
|
||||||
device_map="auto" if self.device != "cpu" else None,
|
|
||||||
trust_remote_code=True,
|
|
||||||
)
|
|
||||||
logger.info(f"Successfully loaded {model_name}")
|
|
||||||
finally:
|
|
||||||
signal.alarm(0) # Cancel the alarm
|
|
||||||
signal.signal(signal.SIGALRM, old_handler) # Restore old handler
|
|
||||||
|
|
||||||
except TimeoutError:
|
|
||||||
logger.error(f"Model loading timed out for {model_name}")
|
|
||||||
raise RuntimeError(
|
|
||||||
f"Model loading timed out for {model_name}. Please check your internet connection or try a smaller model."
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Failed to load model {model_name}: {e}")
|
|
||||||
raise
|
|
||||||
|
|
||||||
# Move model to device if not using device_map
|
# Move model to device if not using device_map
|
||||||
if self.device != "cpu" and "device_map" not in str(self.model):
|
if self.device != "cpu" and "device_map" not in str(self.model):
|
||||||
|
|||||||
@@ -354,21 +354,13 @@ class EmbeddingServerManager:
|
|||||||
self.server_process.terminate()
|
self.server_process.terminate()
|
||||||
|
|
||||||
try:
|
try:
|
||||||
self.server_process.wait(timeout=3)
|
self.server_process.wait(timeout=5)
|
||||||
logger.info(f"Server process {self.server_process.pid} terminated.")
|
logger.info(f"Server process {self.server_process.pid} terminated.")
|
||||||
except subprocess.TimeoutExpired:
|
except subprocess.TimeoutExpired:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
f"Server process {self.server_process.pid} did not terminate gracefully within 3 seconds, killing it."
|
f"Server process {self.server_process.pid} did not terminate gracefully, killing it."
|
||||||
)
|
)
|
||||||
self.server_process.kill()
|
self.server_process.kill()
|
||||||
try:
|
|
||||||
self.server_process.wait(timeout=2)
|
|
||||||
logger.info(f"Server process {self.server_process.pid} killed successfully.")
|
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
logger.error(
|
|
||||||
f"Failed to kill server process {self.server_process.pid} - it may be hung"
|
|
||||||
)
|
|
||||||
# Don't hang indefinitely
|
|
||||||
|
|
||||||
# Clean up process resources to prevent resource tracker warnings
|
# Clean up process resources to prevent resource tracker warnings
|
||||||
try:
|
try:
|
||||||
|
|||||||
@@ -5,8 +5,11 @@ LEANN is a revolutionary vector database that democratizes personal AI. Transfor
|
|||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Default installation (includes both HNSW and DiskANN backends)
|
# Default installation (HNSW backend, recommended)
|
||||||
uv pip install leann
|
uv pip install leann
|
||||||
|
|
||||||
|
# With DiskANN backend (for large-scale deployments)
|
||||||
|
uv pip install leann[diskann]
|
||||||
```
|
```
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
@@ -16,8 +19,8 @@ from leann import LeannBuilder, LeannSearcher, LeannChat
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
INDEX_PATH = str(Path("./").resolve() / "demo.leann")
|
INDEX_PATH = str(Path("./").resolve() / "demo.leann")
|
||||||
|
|
||||||
# Build an index (choose backend: "hnsw" or "diskann")
|
# Build an index
|
||||||
builder = LeannBuilder(backend_name="hnsw") # or "diskann" for large-scale deployments
|
builder = LeannBuilder(backend_name="hnsw")
|
||||||
builder.add_text("LEANN saves 97% storage compared to traditional vector databases.")
|
builder.add_text("LEANN saves 97% storage compared to traditional vector databases.")
|
||||||
builder.add_text("Tung Tung Tung Sahur called—they need their banana‑crocodile hybrid back")
|
builder.add_text("Tung Tung Tung Sahur called—they need their banana‑crocodile hybrid back")
|
||||||
builder.build_index(INDEX_PATH)
|
builder.build_index(INDEX_PATH)
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "leann"
|
name = "leann"
|
||||||
version = "0.2.0"
|
version = "0.1.16"
|
||||||
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
|
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.9"
|
requires-python = ">=3.9"
|
||||||
@@ -24,15 +24,16 @@ classifiers = [
|
|||||||
"Programming Language :: Python :: 3.12",
|
"Programming Language :: Python :: 3.12",
|
||||||
]
|
]
|
||||||
|
|
||||||
# Default installation: core + hnsw + diskann
|
# Default installation: core + hnsw
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"leann-core>=0.1.0",
|
"leann-core>=0.1.0",
|
||||||
"leann-backend-hnsw>=0.1.0",
|
"leann-backend-hnsw>=0.1.0",
|
||||||
"leann-backend-diskann>=0.1.0",
|
|
||||||
]
|
]
|
||||||
|
|
||||||
[project.optional-dependencies]
|
[project.optional-dependencies]
|
||||||
# All backends now included by default
|
diskann = [
|
||||||
|
"leann-backend-diskann>=0.1.0",
|
||||||
|
]
|
||||||
|
|
||||||
[project.urls]
|
[project.urls]
|
||||||
Repository = "https://github.com/yichuan-w/LEANN"
|
Repository = "https://github.com/yichuan-w/LEANN"
|
||||||
|
|||||||
12
uv.lock
generated
12
uv.lock
generated
@@ -1650,7 +1650,7 @@ name = "importlib-metadata"
|
|||||||
version = "8.7.0"
|
version = "8.7.0"
|
||||||
source = { registry = "https://pypi.org/simple" }
|
source = { registry = "https://pypi.org/simple" }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "zipp", marker = "python_full_version < '3.10'" },
|
{ name = "zipp" },
|
||||||
]
|
]
|
||||||
sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641 }
|
sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641 }
|
||||||
wheels = [
|
wheels = [
|
||||||
@@ -2155,7 +2155,7 @@ wheels = [
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "leann-backend-diskann"
|
name = "leann-backend-diskann"
|
||||||
version = "0.2.0"
|
version = "0.1.15"
|
||||||
source = { editable = "packages/leann-backend-diskann" }
|
source = { editable = "packages/leann-backend-diskann" }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "leann-core" },
|
{ name = "leann-core" },
|
||||||
@@ -2167,14 +2167,14 @@ dependencies = [
|
|||||||
|
|
||||||
[package.metadata]
|
[package.metadata]
|
||||||
requires-dist = [
|
requires-dist = [
|
||||||
{ name = "leann-core", specifier = "==0.2.0" },
|
{ name = "leann-core", specifier = "==0.1.15" },
|
||||||
{ name = "numpy" },
|
{ name = "numpy" },
|
||||||
{ name = "protobuf", specifier = ">=3.19.0" },
|
{ name = "protobuf", specifier = ">=3.19.0" },
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "leann-backend-hnsw"
|
name = "leann-backend-hnsw"
|
||||||
version = "0.2.0"
|
version = "0.1.15"
|
||||||
source = { editable = "packages/leann-backend-hnsw" }
|
source = { editable = "packages/leann-backend-hnsw" }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "leann-core" },
|
{ name = "leann-core" },
|
||||||
@@ -2187,7 +2187,7 @@ dependencies = [
|
|||||||
|
|
||||||
[package.metadata]
|
[package.metadata]
|
||||||
requires-dist = [
|
requires-dist = [
|
||||||
{ name = "leann-core", specifier = "==0.2.0" },
|
{ name = "leann-core", specifier = "==0.1.15" },
|
||||||
{ name = "msgpack", specifier = ">=1.0.0" },
|
{ name = "msgpack", specifier = ">=1.0.0" },
|
||||||
{ name = "numpy" },
|
{ name = "numpy" },
|
||||||
{ name = "pyzmq", specifier = ">=23.0.0" },
|
{ name = "pyzmq", specifier = ">=23.0.0" },
|
||||||
@@ -2195,7 +2195,7 @@ requires-dist = [
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "leann-core"
|
name = "leann-core"
|
||||||
version = "0.2.0"
|
version = "0.1.15"
|
||||||
source = { editable = "packages/leann-core" }
|
source = { editable = "packages/leann-core" }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "accelerate" },
|
{ name = "accelerate" },
|
||||||
|
|||||||
Reference in New Issue
Block a user