15 KiB
Executable File
🚀 LEANN: A Low-Storage Vector Index
⚡ Storage Saving RAG sytem on Consumer Device
Quick Start • Features • Benchmarks • Paper
🌟 What is Leann?
Leann revolutionizes Retrieval-Augmented Generation (RAG) by eliminating the storage bottleneck of traditional vector databases. Instead of pre-computing and storing billions of embeddings, Leann dynamically computes embeddings at query time using optimized graph-based search algorithms.
🎯 Why Leann?
Traditional RAG systems face a fundamental trade-off:
- 💾 Storage: Storing embeddings for millions of documents requires massive disk space
- 🔄 Memory overhead: The indexes LlamaIndex uses usually face high memory overhead (e.g., in-memory vector databases)
- 💰 Cost: Vector databases are expensive to scale
Leann revolutionizes this with Graph-based recomputation and cutting-edge system optimizations:
- ✅ Zero embedding storage - Only graph structure is persisted, reducing storage by 94-97%
- ✅ Real-time computation - Embeddings computed on-demand with low latency
- ✅ Memory efficient - Runs on consumer hardware with theoretical zero memory overhead
- ✅ Graph-based optimization - Advanced pruning techniques for efficient search while keeping low storage cost, with batching and overlapping strategies using low-precision search to optimize latency
- ✅ Pluggable backends - Support for DiskANN, HNSW, and other ANN algorithms (welcome contributions!)
🚀 Quick Start
Installation
git clone git@github.com:yichuan520030910320/LEANN-RAG.git leann
cd leann
git submodule update --init --recursive
macOS:
brew install llvm libomp boost protobuf
export CC=$(brew --prefix llvm)/bin/clang
export CXX=$(brew --prefix llvm)/bin/clang++
uv sync
Linux (Ubuntu/Debian):
sudo apt-get install libomp-dev libboost-all-dev protobuf-compiler libabsl-dev libmkl-full-dev libaio-dev
uv sync
30-Second Example
from leann.api import LeannBuilder, LeannSearcher
# 1. Build index (no embeddings stored!)
builder = LeannBuilder(backend_name="diskann")
builder.add_text("Python is a powerful programming language")
builder.add_text("Machine learning transforms industries")
builder.add_text("Neural networks process complex data")
builder.build_index("knowledge.leann")
# 2. Search with real-time embeddings
searcher = LeannSearcher("knowledge.leann")
results = searcher.search("programming languages", top_k=2)
for result in results:
print(f"Score: {result['score']:.3f} - {result['text']}")
Run the Demo
uv run examples/document_search.py
or you want to use python
source .venv/bin/activate
python ./examples/main_cli_example.py
PDF RAG Demo (using LlamaIndex for document parsing and Leann for indexing/search)
This demo showcases how to build a RAG system for PDF documents using Leann.
- Place your PDF files (and other supported formats like .docx, .pptx, .xlsx) into the
examples/data/directory. - Ensure you have an
OPENAI_API_KEYset in your environment variables or in a.envfile for the LLM to function.
uv run examples/main_cli_example.py
Regenerating Protobuf Files
If you modify any .proto files (such as embedding.proto), or if you see errors about protobuf version mismatch, regenerate the C++ protobuf files to match your installed version:
cd packages/leann-backend-diskann
protoc --cpp_out=third_party/DiskANN/include --proto_path=third_party embedding.proto
protoc --cpp_out=third_party/DiskANN/src --proto_path=third_party embedding.proto
This ensures the generated files are compatible with your system's protobuf library.
✨ Features
🔥 Core Features
- 🔄 Real-time Embeddings - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
- 📈 Scalable Architecture - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
- 🎯 Graph Pruning - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
- 🏗️ Pluggable Backends - DiskANN, HNSW/FAISS with unified API
🛠️ Technical Highlights
- 🔄 Recompute Mode - Highest accuracy scenarios while eliminating vector storage overhead
- ⚡ Zero-copy Operations - Minimize IPC overhead by transferring distances instead of embeddings
- 🚀 High-throughput Embedding Pipeline - Optimized batched processing for maximum efficiency
- 🎯 Two-level Search - Novel coarse-to-fine search overlap for accelerated query processing (optional)
- 💾 Memory-mapped Indices - Fast startup with raw text mapping to reduce memory overhead
- 🚀 MLX Support - Ultra-fast recompute with quantized embedding models, accelerating building and search by 10-100x
🎨 Developer Experience
- Simple Python API - Get started in minutes
- Extensible backend system - Easy to add new algorithms
- Comprehensive examples - From basic usage to production deployment
Applications on your MacBook
light weight RAG on your apple email
LEANN can create a searchable index of your Apple Mail emails, allowing you to query your email history using natural language.
Quick Start
📋 Click to expand: Command Examples
# Use default mail path (works for most macOS setups)
python examples/mail_reader_leann.py
# Specify your own mail path
python examples/mail_reader_leann.py --mail-path "/Users/yourname/Library/Mail/V10/..."
# Run with custom index directory
python examples/mail_reader_leann.py --index-dir "./my_mail_index"
# Limit number of emails processed (useful for testing)
python examples/mail_reader_leann.py --max-emails 1000
# Run a single query
python examples/mail_reader_leann.py --query "Find emails about project deadlines"
Finding Your Mail Path
🔍 Click to expand: How to find your mail path
The default mail path is configured for a typical macOS setup. If you need to find your specific mail path:
- Open Terminal
- Run:
find ~/Library/Mail -name "Messages" -type d | head -5 - Use the parent directory(ended with Data) of the Messages folder as your
--mail-path
Example Queries
💬 Click to expand: Example queries you can try
Once the index is built, you can ask questions like:
- "Show me emails about meeting schedules"
- "Find emails from my boss about deadlines"
- "What did John say about the project timeline?"
- "Show me emails about travel expenses"
📊 Benchmarks
How to Reproduce Evaluation Results
Reproducing our benchmarks is straightforward. The evaluation script is designed to be self-contained, automatically downloading all necessary data on its first run.
1. Environment Setup
First, ensure you have followed the installation instructions in the Quick Start section. This will install all core dependencies.
Next, install the optional development dependencies, which include the huggingface-hub library required for automatic data download:
# This command installs all development dependencies
uv pip install -e ".[dev]"
2. Run the Evaluation
Simply run the evaluation script. The first time you run it, it will detect that the data is missing, download it from Hugging Face Hub, and then proceed with the evaluation.
To evaluate the DPR dataset:
python examples/run_evaluation.py data/indices/dpr/dpr_diskann
To evaluate the RPJ-Wiki dataset:
python examples/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index
The script will print the recall and search time for each query, followed by the average results.
Memory Usage Comparison
| System | 1M Documents | 10M Documents | 100M Documents |
|---|---|---|---|
| Traditional Vector DB | 3.1 GB | 31 GB | 310 GB |
| Leann | 180 MB | 1.2 GB | 8.4 GB |
| Reduction | 94.2% | 96.1% | 97.3% |
Query Performance
| Backend | Index Size | Query Time | Recall@10 |
|---|---|---|---|
| DiskANN | 1M docs | 12ms | 0.95 |
| DiskANN + Recompute | 1M docs | 145ms | 0.98 |
| HNSW | 1M docs | 8ms | 0.93 |
Benchmarks run on AMD Ryzen 7 with 32GB RAM
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Query Text │───▶│ Embedding │───▶│ Graph-based │
│ │ │ Computation │ │ Search │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ ZMQ Server │ │ Pruned Graph │
│ (Cached) │ │ Index │
└──────────────┘ └──────────────┘
Key Components
- 🧠 Embedding Engine: Real-time transformer inference with caching
- 📊 Graph Index: Memory-efficient navigation structures
- 🔄 Search Coordinator: Orchestrates embedding + graph search
- ⚡ Backend Adapters: Pluggable algorithm implementations
🎓 Supported Models & Backends
🤖 Embedding Models
- sentence-transformers/all-mpnet-base-v2 (default)
- sentence-transformers/all-MiniLM-L6-v2 (lightweight)
- Any HuggingFace sentence-transformer model
- Custom model support via API
🔧 Search Backends
- DiskANN: Microsoft's billion-scale ANN algorithm
- HNSW: Hierarchical Navigable Small World graphs
- Coming soon: ScaNN, Faiss-IVF, NSG
📏 Distance Functions
- L2: Euclidean distance for precise similarity
- Cosine: Angular similarity for normalized vectors
- MIPS: Maximum Inner Product Search for recommendation systems
🔬 Paper
If you find Leann useful, please cite:
LEANN: A Low-Storage Vector Index
@misc{wang2025leannlowstoragevectorindex,
title={LEANN: A Low-Storage Vector Index},
author={Yichuan Wang and Shu Liu and Zhifei Li and Yongji Wu and Ziming Mao and Yilong Zhao and Xiao Yan and Zhiying Xu and Yang Zhou and Ion Stoica and Sewon Min and Matei Zaharia and Joseph E. Gonzalez},
year={2025},
eprint={2506.08276},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2506.08276},
}
🌍 Use Cases
💼 Enterprise RAG
# Handle millions of documents with limited resources
builder = LeannBuilder(
backend_name="diskann",
distance_metric="cosine",
graph_degree=64,
memory_budget="4GB"
)
🔬 Research & Experimentation
# Quick prototyping with different algorithms
for backend in ["diskann", "hnsw"]:
searcher = LeannSearcher(index_path, backend=backend)
evaluate_recall(searcher, queries, ground_truth)
🚀 Real-time Applications
# Sub-second response times
chat = LeannChat("knowledge.leann")
response = chat.ask("What is quantum computing?")
# Returns in <100ms with recompute mode
🤝 Contributing
We welcome contributions! Leann is built by the community, for the community.
Ways to Contribute
- 🐛 Bug Reports: Found an issue? Let us know!
- 💡 Feature Requests: Have an idea? We'd love to hear it!
- 🔧 Code Contributions: PRs welcome for all skill levels
- 📖 Documentation: Help make Leann more accessible
- 🧪 Benchmarks: Share your performance results
Development Setup
git clone git@github.com:yichuan520030910320/LEANN-RAG.git leann
cd leann
git submodule update --init --recursive
uv sync --dev
uv run pytest tests/
Quick Tests
# Sanity check all distance functions
uv run python tests/sanity_checks/test_distance_functions.py
# Verify L2 implementation
uv run python tests/sanity_checks/test_l2_verification.py
❓ FAQ
Common Issues
NCCL Topology Error
Problem: You encounter ncclTopoComputePaths error during document processing:
ncclTopoComputePaths (system=<optimized out>, comm=comm@entry=0x5555a82fa3c0) at graph/paths.cc:688
Solution: Set these environment variables before running your script:
export NCCL_TOPO_DUMP_FILE=/tmp/nccl_topo.xml
export NCCL_DEBUG=INFO
export NCCL_DEBUG_SUBSYS=INIT,GRAPH
export NCCL_IB_DISABLE=1
export NCCL_NET_PLUGIN=none
export NCCL_SOCKET_IFNAME=ens5
📈 Roadmap
🎯 Q2 2025
- DiskANN backend with MIPS/L2/Cosine support
- HNSW backend integration
- Real-time embedding pipeline
- Memory-efficient graph pruning
🚀 Q3 2025
- Advanced caching strategies
- GPU-accelerated embedding computation
- Add sleep-time-compute and summarize agent! to summarilze the file on computer!
- Add OpenAI recompute API
🌟 Q4 2025
- Integration with LangChain/LlamaIndex
- Visual similarity search
- Query rewrtiting, rerank and expansion
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- Microsoft Research for the DiskANN algorithm
- Meta AI for FAISS and optimization insights
- HuggingFace for the transformer ecosystem
- Our amazing contributors who make this possible
⭐ Star us on GitHub if Leann is useful for your research or applications!
Made with ❤️ by the Leann team