Compare commits

...

6 Commits

Author SHA1 Message Date
Andy Lee
f1aca0f756 fix(core): skip empty/invalid chunks before embedding; guard OpenAI embeddings
Avoid 400 errors from OpenAI when chunker yields empty strings by filtering
invalid texts in LeannBuilder.build_index. Add validation fail-fast in
OpenAI embedding path to surface upstream issues earlier. Keeps passages and
embeddings aligned during build.

Refs #54
2025-08-15 17:28:40 -07:00
Yichuan Wang
bee2167ee3 docs: update READMEs (MCP docs + conclusion polish)
- Polish conclusion in packages/leann-mcp/README.md
- Sync root README wording and links
2025-08-15 17:21:23 -07:00
yichuan520030910320
ef980d70b3 [MCP]update MCP of claude code 2025-08-15 14:29:59 -07:00
Andy Lee
db3c63c441 Docs/Core: Low-Resource Setups, SkyPilot Option, and No-Recompute (#45)
* docs: add SkyPilot template and instructions for running embeddings/index build on cloud GPU

* docs: add low-resource note in README; point to config guide; suggest OpenAI embeddings, SkyPilot remote build, and --no-recompute

* docs: consolidate low-resource guidance into config guide; README points to it

* cli: add --no-recompute and --no-recompute-embeddings flags; docs: clarify HNSW requires --no-compact when disabling recompute

* docs: dedupe recomputation guidance; keep single Low-resource setups section

* sky: expand leann-build.yaml with configurable params and flags (backend, recompute, compact, embedding options)

* hnsw: auto-disable compact when --no-recompute is used; docs: expand SkyPilot with -e overrides and copy-back example

* docs+sky: simplify SkyPilot flow (auto-build on launch, rsync copy-back); clarify HNSW auto non-compact when no-recompute

* feat: auto compact for hnsw when recompute

* reader: non-destructive portability (relative hints + fallback); fix comments; sky: refine yaml

* cli: unify flags to --recompute/--no-recompute for build/search/ask; docs: update references

* chore: remove

* hnsw: move pruned/no-recompute assertion into backend; api: drop global assertion; docs: will adjust after benchmarking

* cli: use argparse.BooleanOptionalAction for paired flags (--recompute/--compact) across build/search/ask

* docs: a real example on recompute

* benchmarks: fix and extend HNSW+DiskANN recompute vs no-recompute; docs: add fresh numbers and DiskANN notes

* benchmarks: unify HNSW & DiskANN into one clean script; isolate groups, fixed ports, warm-up, param complexity

* docs: diskann recompute

* core: auto-cleanup for LeannSearcher/LeannChat (__enter__/__exit__/__del__); ensure server terminate/kill robustness; benchmarks: use searcher.cleanup(); docs: suggest uv run

* fix: hang on warnings

* docs: boolean flags

* docs: leann help
2025-08-15 12:03:19 -07:00
yichuan520030910320
00eeadb9dd upd pkg 2025-08-14 14:39:45 -07:00
yichuan520030910320
42c8370709 add chunk size in leann build& fix batch size in oai& docs 2025-08-14 13:14:14 -07:00
18 changed files with 675 additions and 139 deletions

View File

@@ -31,7 +31,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
<img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%"> <img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%">
</p> </p>
> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison) > **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#-storage-comparison)
🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service". 🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -70,6 +70,8 @@ uv venv
source .venv/bin/activate source .venv/bin/activate
uv pip install leann uv pip install leann
``` ```
<!--
> Low-resource? See “Low-resource setups” in the [Configuration Guide](docs/configuration-guide.md#low-resource-setups). -->
<details> <details>
<summary> <summary>
@@ -184,34 +186,34 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
```bash ```bash
# Core Parameters (General preprocessing for all examples) # Core Parameters (General preprocessing for all examples)
--index-dir DIR # Directory to store the index (default: current directory) --index-dir DIR # Directory to store the index (default: current directory)
--query "YOUR QUESTION" # Single query mode. Omit for interactive chat (type 'quit' to exit), and now you can play with your index interactively --query "YOUR QUESTION" # Single query mode. Omit for interactive chat (type 'quit' to exit), and now you can play with your index interactively
--max-items N # Limit data preprocessing (default: -1, process all data) --max-items N # Limit data preprocessing (default: -1, process all data)
--force-rebuild # Force rebuild index even if it exists --force-rebuild # Force rebuild index even if it exists
# Embedding Parameters # Embedding Parameters
--embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small, mlx-community/Qwen3-Embedding-0.6B-8bit or nomic-embed-text --embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small, mlx-community/Qwen3-Embedding-0.6B-8bit or nomic-embed-text
--embedding-mode MODE # sentence-transformers, openai, mlx, or ollama --embedding-mode MODE # sentence-transformers, openai, mlx, or ollama
# LLM Parameters (Text generation models) # LLM Parameters (Text generation models)
--llm TYPE # LLM backend: openai, ollama, or hf (default: openai) --llm TYPE # LLM backend: openai, ollama, or hf (default: openai)
--llm-model MODEL # Model name (default: gpt-4o) e.g., gpt-4o-mini, llama3.2:1b, Qwen/Qwen2.5-1.5B-Instruct --llm-model MODEL # Model name (default: gpt-4o) e.g., gpt-4o-mini, llama3.2:1b, Qwen/Qwen2.5-1.5B-Instruct
--thinking-budget LEVEL # Thinking budget for reasoning models: low/medium/high (supported by o3, o3-mini, GPT-Oss:20b, and other reasoning models) --thinking-budget LEVEL # Thinking budget for reasoning models: low/medium/high (supported by o3, o3-mini, GPT-Oss:20b, and other reasoning models)
# Search Parameters # Search Parameters
--top-k N # Number of results to retrieve (default: 20) --top-k N # Number of results to retrieve (default: 20)
--search-complexity N # Search complexity for graph traversal (default: 32) --search-complexity N # Search complexity for graph traversal (default: 32)
# Chunking Parameters # Chunking Parameters
--chunk-size N # Size of text chunks (default varies by source: 256 for most, 192 for WeChat) --chunk-size N # Size of text chunks (default varies by source: 256 for most, 192 for WeChat)
--chunk-overlap N # Overlap between chunks (default varies: 25-128 depending on source) --chunk-overlap N # Overlap between chunks (default varies: 25-128 depending on source)
# Index Building Parameters # Index Building Parameters
--backend-name NAME # Backend to use: hnsw or diskann (default: hnsw) --backend-name NAME # Backend to use: hnsw or diskann (default: hnsw)
--graph-degree N # Graph degree for index construction (default: 32) --graph-degree N # Graph degree for index construction (default: 32)
--build-complexity N # Build complexity for index construction (default: 64) --build-complexity N # Build complexity for index construction (default: 64)
--no-compact # Disable compact index storage (compact storage IS enabled to save storage by default) --compact / --no-compact # Use compact storage (default: true). Must be `no-compact` for `no-recompute` build.
--no-recompute # Disable embedding recomputation (recomputation IS enabled to save storage by default) --recompute / --no-recompute # Enable/disable embedding recomputation (default: enabled). Should not do a `no-recompute` search in a `recompute` build.
``` ```
</details> </details>
@@ -424,21 +426,21 @@ Once the index is built, you can ask questions like:
**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE. **The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
**Key features:** **Key features:**
- 🔍 **Semantic code search** across your entire project - 🔍 **Semantic code search** across your entire project, fully local index and lightweight
- 📚 **Context-aware assistance** for debugging and development - 📚 **Context-aware assistance** for debugging and development
- 🚀 **Zero-config setup** with automatic language detection - 🚀 **Zero-config setup** with automatic language detection
```bash ```bash
# Install LEANN globally for MCP integration # Install LEANN globally for MCP integration
uv tool install leann-core uv tool install leann-core --with leann
claude mcp add --scope user leann-server -- leann_mcp
# Setup is automatic - just start using Claude Code! # Setup is automatic - just start using Claude Code!
``` ```
Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more: Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
![LEANN MCP Integration](assets/mcp_leann.png) ![LEANN MCP Integration](assets/mcp_leann.png)
**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md) **🔥 Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
## 🖥️ Command Line Interface ## 🖥️ Command Line Interface
@@ -455,7 +457,8 @@ leann --help
**To make it globally available:** **To make it globally available:**
```bash ```bash
# Install the LEANN CLI globally using uv tool # Install the LEANN CLI globally using uv tool
uv tool install leann-core uv tool install leann-core --with leann
# Now you can use leann from anywhere without activating venv # Now you can use leann from anywhere without activating venv
leann --help leann --help
@@ -482,27 +485,29 @@ leann list
``` ```
**Key CLI features:** **Key CLI features:**
- Auto-detects document formats (PDF, TXT, MD, DOCX) - Auto-detects document formats (PDF, TXT, MD, DOCX, PPTX + code files)
- Smart text chunking with overlap - Smart text chunking with overlap
- Multiple LLM providers (Ollama, OpenAI, HuggingFace) - Multiple LLM providers (Ollama, OpenAI, HuggingFace)
- Organized index storage in `~/.leann/indexes/` - Organized index storage in `.leann/indexes/` (project-local)
- Support for advanced search parameters - Support for advanced search parameters
<details> <details>
<summary><strong>📋 Click to expand: Complete CLI Reference</strong></summary> <summary><strong>📋 Click to expand: Complete CLI Reference</strong></summary>
You can use `leann --help`, or `leann build --help`, `leann search --help`, `leann ask --help` to get the complete CLI reference.
**Build Command:** **Build Command:**
```bash ```bash
leann build INDEX_NAME --docs DIRECTORY [OPTIONS] leann build INDEX_NAME --docs DIRECTORY|FILE [DIRECTORY|FILE ...] [OPTIONS]
Options: Options:
--backend {hnsw,diskann} Backend to use (default: hnsw) --backend {hnsw,diskann} Backend to use (default: hnsw)
--embedding-model MODEL Embedding model (default: facebook/contriever) --embedding-model MODEL Embedding model (default: facebook/contriever)
--graph-degree N Graph degree (default: 32) --graph-degree N Graph degree (default: 32)
--complexity N Build complexity (default: 64) --complexity N Build complexity (default: 64)
--force Force rebuild existing index --force Force rebuild existing index
--compact Use compact storage (default: true) --compact / --no-compact Use compact storage (default: true). Must be `no-compact` for `no-recompute` build.
--recompute Enable recomputation (default: true) --recompute / --no-recompute Enable recomputation (default: true)
``` ```
**Search Command:** **Search Command:**
@@ -510,9 +515,9 @@ Options:
leann search INDEX_NAME QUERY [OPTIONS] leann search INDEX_NAME QUERY [OPTIONS]
Options: Options:
--top-k N Number of results (default: 5) --top-k N Number of results (default: 5)
--complexity N Search complexity (default: 64) --complexity N Search complexity (default: 64)
--recompute-embeddings Use recomputation for highest accuracy --recompute / --no-recompute Enable/disable embedding recomputation (default: enabled). Should not do a `no-recompute` search in a `recompute` build.
--pruning-strategy {global,local,proportional} --pruning-strategy {global,local,proportional}
``` ```

View File

@@ -0,0 +1,148 @@
import argparse
import os
import time
from pathlib import Path
from leann import LeannBuilder, LeannSearcher
def _meta_exists(index_path: str) -> bool:
p = Path(index_path)
return (p.parent / f"{p.stem}.meta.json").exists()
def ensure_index(index_path: str, backend_name: str, num_docs: int, is_recompute: bool) -> None:
# if _meta_exists(index_path):
# return
kwargs = {}
if backend_name == "hnsw":
kwargs["is_compact"] = is_recompute
builder = LeannBuilder(
backend_name=backend_name,
embedding_model=os.getenv("LEANN_EMBED_MODEL", "facebook/contriever"),
embedding_mode=os.getenv("LEANN_EMBED_MODE", "sentence-transformers"),
graph_degree=32,
complexity=64,
is_recompute=is_recompute,
num_threads=4,
**kwargs,
)
for i in range(num_docs):
builder.add_text(
f"This is a test document number {i}. It contains some repeated text for benchmarking."
)
builder.build_index(index_path)
def _bench_group(
index_path: str,
recompute: bool,
query: str,
repeats: int,
complexity: int = 32,
top_k: int = 10,
) -> float:
# Independent searcher per group; fixed port when recompute
searcher = LeannSearcher(index_path=index_path)
# Warm-up once
_ = searcher.search(
query,
top_k=top_k,
complexity=complexity,
recompute_embeddings=recompute,
)
def _once() -> float:
t0 = time.time()
_ = searcher.search(
query,
top_k=top_k,
complexity=complexity,
recompute_embeddings=recompute,
)
return time.time() - t0
if repeats <= 1:
t = _once()
else:
vals = [_once() for _ in range(repeats)]
vals.sort()
t = vals[len(vals) // 2]
searcher.cleanup()
return t
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--num-docs", type=int, default=5000)
parser.add_argument("--repeats", type=int, default=3)
parser.add_argument("--complexity", type=int, default=32)
args = parser.parse_args()
base = Path.cwd() / ".leann" / "indexes" / f"bench_n{args.num_docs}"
base.parent.mkdir(parents=True, exist_ok=True)
# ---------- Build HNSW variants ----------
hnsw_r = str(base / f"hnsw_recompute_n{args.num_docs}.leann")
hnsw_nr = str(base / f"hnsw_norecompute_n{args.num_docs}.leann")
ensure_index(hnsw_r, "hnsw", args.num_docs, True)
ensure_index(hnsw_nr, "hnsw", args.num_docs, False)
# ---------- Build DiskANN variants ----------
diskann_r = str(base / "diskann_r.leann")
diskann_nr = str(base / "diskann_nr.leann")
ensure_index(diskann_r, "diskann", args.num_docs, True)
ensure_index(diskann_nr, "diskann", args.num_docs, False)
# ---------- Helpers ----------
def _size_for(prefix: str) -> int:
p = Path(prefix)
base_dir = p.parent
stem = p.stem
total = 0
for f in base_dir.iterdir():
if f.is_file() and f.name.startswith(stem):
total += f.stat().st_size
return total
# ---------- HNSW benchmark ----------
t_hnsw_r = _bench_group(
hnsw_r, True, "test document number 42", repeats=args.repeats, complexity=args.complexity
)
t_hnsw_nr = _bench_group(
hnsw_nr, False, "test document number 42", repeats=args.repeats, complexity=args.complexity
)
size_hnsw_r = _size_for(hnsw_r)
size_hnsw_nr = _size_for(hnsw_nr)
print("Benchmark results (HNSW):")
print(f" recompute=True: search_time={t_hnsw_r:.3f}s, size={size_hnsw_r / 1024 / 1024:.1f}MB")
print(
f" recompute=False: search_time={t_hnsw_nr:.3f}s, size={size_hnsw_nr / 1024 / 1024:.1f}MB"
)
print(" Expectation: no-recompute should be faster but larger on disk.")
# ---------- DiskANN benchmark ----------
t_diskann_r = _bench_group(
diskann_r, True, "DiskANN R test doc 123", repeats=args.repeats, complexity=args.complexity
)
t_diskann_nr = _bench_group(
diskann_nr,
False,
"DiskANN NR test doc 123",
repeats=args.repeats,
complexity=args.complexity,
)
size_diskann_r = _size_for(diskann_r)
size_diskann_nr = _size_for(diskann_nr)
print("\nBenchmark results (DiskANN):")
print(f" build(recompute=True, partition): size={size_diskann_r / 1024 / 1024:.1f}MB")
print(f" build(recompute=False): size={size_diskann_nr / 1024 / 1024:.1f}MB")
print(f" search recompute=True (final rerank): {t_diskann_r:.3f}s")
print(f" search recompute=False (PQ only): {t_diskann_nr:.3f}s")
if __name__ == "__main__":
main()

View File

@@ -10,6 +10,7 @@ This benchmark compares search performance between DiskANN and HNSW backends:
""" """
import gc import gc
import multiprocessing as mp
import tempfile import tempfile
import time import time
from pathlib import Path from pathlib import Path
@@ -17,6 +18,12 @@ from typing import Any
import numpy as np import numpy as np
# Prefer 'fork' start method to avoid POSIX semaphore leaks on macOS
try:
mp.set_start_method("fork", force=True)
except Exception:
pass
def create_test_texts(n_docs: int) -> list[str]: def create_test_texts(n_docs: int) -> list[str]:
"""Create synthetic test documents for benchmarking.""" """Create synthetic test documents for benchmarking."""
@@ -113,10 +120,10 @@ def benchmark_backend(
] ]
score_validity_rate = len(valid_scores) / len(all_scores) if all_scores else 0 score_validity_rate = len(valid_scores) / len(all_scores) if all_scores else 0
# Clean up # Clean up (ensure embedding server shutdown and object GC)
try: try:
if hasattr(searcher, "__del__"): if hasattr(searcher, "cleanup"):
searcher.__del__() searcher.cleanup()
del searcher del searcher
del builder del builder
gc.collect() gc.collect()
@@ -259,10 +266,21 @@ if __name__ == "__main__":
print(f"\n❌ Benchmark failed: {e}") print(f"\n❌ Benchmark failed: {e}")
sys.exit(1) sys.exit(1)
finally: finally:
# Ensure clean exit # Ensure clean exit (forceful to prevent rare hangs from atexit/threads)
try: try:
gc.collect() gc.collect()
print("\n🧹 Cleanup completed") print("\n🧹 Cleanup completed")
# Flush stdio to ensure message is visible before hard-exit
try:
import sys as _sys
_sys.stdout.flush()
_sys.stderr.flush()
except Exception:
pass
except Exception: except Exception:
pass pass
sys.exit(0) # Use os._exit to bypass atexit handlers that may hang in rare cases
import os as _os
_os._exit(0)

View File

@@ -97,29 +97,23 @@ ollama pull nomic-embed-text
``` ```
### DiskANN ### DiskANN
**Best for**: Performance-critical applications and large datasets - **Production-ready with automatic graph partitioning** **Best for**: Large datasets, especially when you want `recompute=True`.
**How it works:** **Key advantages:**
- **Product Quantization (PQ) + Real-time Reranking**: Uses compressed PQ codes for fast graph traversal, then recomputes exact embeddings for final candidates - **Faster search** on large datasets (3x+ speedup vs HNSW in many cases)
- **Automatic Graph Partitioning**: When `is_recompute=True`, automatically partitions large indices and safely removes redundant files to save storage - **Smart storage**: `recompute=True` enables automatic graph partitioning for smaller indexes
- **Superior Speed-Accuracy Trade-off**: Faster search than HNSW while maintaining high accuracy - **Better scaling**: Designed for 100k+ documents
**Trade-offs compared to HNSW:** **Recompute behavior:**
- **Faster search latency** (typically 2-8x speedup) - `recompute=True` (recommended): Pure PQ traversal + final reranking - faster and enables partitioning
- **Better scaling** for large datasets - `recompute=False`: PQ + partial real distances during traversal - slower but higher accuracy
-**Smart storage management** with automatic partitioning
-**Better graph locality** with `--ldg-times` parameter for SSD optimization
- ⚠️ **Slightly larger index size** due to PQ tables and graph metadata
```bash ```bash
# Recommended for most use cases # Recommended for most use cases
--backend-name diskann --graph-degree 32 --build-complexity 64 --backend-name diskann --graph-degree 32 --build-complexity 64
# For large-scale deployments
--backend-name diskann --graph-degree 64 --build-complexity 128
``` ```
**Performance Benchmark**: Run `python benchmarks/diskann_vs_hnsw_speed_comparison.py` to compare DiskANN and HNSW on your system. **Performance Benchmark**: Run `uv run benchmarks/diskann_vs_hnsw_speed_comparison.py` to compare DiskANN and HNSW on your system.
## LLM Selection: Engine and Model Comparison ## LLM Selection: Engine and Model Comparison
@@ -273,24 +267,114 @@ Every configuration choice involves trade-offs:
The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed. The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
## Deep Dive: Critical Configuration Decisions ## Low-resource setups
### When to Disable Recomputation If you dont have a local GPU or builds/searches are too slow, use one or more of the options below.
LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements: ### 1) Use OpenAI embeddings (no local compute)
Fastest path with zero local GPU requirements. Set your API key and use OpenAI embeddings during build and search:
```bash ```bash
--no-recompute # Disable selective recomputation export OPENAI_API_KEY=sk-...
# Build with OpenAI embeddings
leann build my-index \
--embedding-mode openai \
--embedding-model text-embedding-3-small
# Search with OpenAI embeddings (recompute at query time)
leann search my-index "your query" \
--recompute
``` ```
**Trade-offs**: ### 2) Run remote builds with SkyPilot (cloud GPU)
- **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
- **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search Offload embedding generation and index building to a GPU VM using [SkyPilot](https://skypilot.readthedocs.io/en/latest/). A template is provided at `sky/leann-build.yaml`.
```bash
# One-time: install and configure SkyPilot
pip install skypilot
# Launch with defaults (L4:1) and mount ./data to ~/leann-data; the build runs automatically
sky launch -c leann-gpu sky/leann-build.yaml
# Override parameters via -e key=value (optional)
sky launch -c leann-gpu sky/leann-build.yaml \
-e index_name=my-index \
-e backend=hnsw \
-e embedding_mode=sentence-transformers \
-e embedding_model=Qwen/Qwen3-Embedding-0.6B
# Copy the built index back to your local .leann (use rsync)
rsync -Pavz leann-gpu:~/.leann/indexes/my-index ./.leann/indexes/
```
### 3) Disable recomputation to trade storage for speed
If you need lower latency and have more storage/memory, disable recomputation. This stores full embeddings and avoids recomputing at search time.
```bash
# Build without recomputation (HNSW requires non-compact in this mode)
leann build my-index --no-recompute --no-compact
# Search without recomputation
leann search my-index "your query" --no-recompute
```
When to use:
- Extreme low latency requirements (high QPS, interactive assistants)
- Read-heavy workloads where storage is cheaper than latency
- No always-available GPU
Constraints:
- HNSW: when `--no-recompute` is set, LEANN automatically disables compact mode during build
- DiskANN: supported; `--no-recompute` skips selective recompute during search
Storage impact:
- Storing N embeddings of dimension D with float32 requires approximately N × D × 4 bytes
- Example: 1,000,000 chunks × 768 dims × 4 bytes ≈ 2.86 GB (plus graph/metadata)
Converting an existing index (rebuild required):
```bash
# Rebuild in-place (ensure you still have original docs or can regenerate chunks)
leann build my-index --force --no-recompute --no-compact
```
Python API usage:
```python
from leann import LeannSearcher
searcher = LeannSearcher("/path/to/my-index.leann")
results = searcher.search("your query", top_k=10, recompute_embeddings=False)
```
Trade-offs:
- Lower latency and fewer network hops at query time
- Significantly higher storage (10100× vs selective recomputation)
- Slightly larger memory footprint during build and search
Quick benchmark results (`benchmarks/benchmark_no_recompute.py` with 5k texts, complexity=32):
- HNSW
```text
recompute=True: search_time=0.818s, size=1.1MB
recompute=False: search_time=0.012s, size=16.6MB
```
- DiskANN
```text
recompute=True: search_time=0.041s, size=5.9MB
recompute=False: search_time=0.013s, size=24.6MB
```
Conclusion:
- **HNSW**: `no-recompute` is significantly faster (no embedding recomputation) but requires much more storage (stores all embeddings)
- **DiskANN**: `no-recompute` uses PQ + partial real distances during traversal (slower but higher accuracy), while `recompute=True` uses pure PQ traversal + final reranking (faster traversal, enables build-time partitioning for smaller storage)
**Disable when**:
- You have abundant storage and memory
- Need extremely low latency (< 100ms)
- Running a read-heavy workload where storage cost is acceptable
## Further Reading ## Further Reading

View File

@@ -441,9 +441,14 @@ class DiskannSearcher(BaseSearcher):
else: # "global" else: # "global"
use_global_pruning = True use_global_pruning = True
# Perform search with suppressed C++ output based on log level # Strategy:
use_deferred_fetch = kwargs.get("USE_DEFERRED_FETCH", True) # - Traversal always uses PQ distances
recompute_neighors = False # - If recompute_embeddings=True, do a single final rerank via deferred fetch
# (fetch embeddings for the final candidate set only)
# - Do not recompute neighbor distances along the path
use_deferred_fetch = True if recompute_embeddings else False
recompute_neighors = False # Expected typo. For backward compatibility.
with suppress_cpp_output_if_needed(): with suppress_cpp_output_if_needed():
labels, distances = self._index.batch_search( labels, distances = self._index.batch_search(
query, query,

View File

@@ -54,12 +54,13 @@ class HNSWBuilder(LeannBackendBuilderInterface):
self.efConstruction = self.build_params.setdefault("efConstruction", 200) self.efConstruction = self.build_params.setdefault("efConstruction", 200)
self.distance_metric = self.build_params.setdefault("distance_metric", "mips") self.distance_metric = self.build_params.setdefault("distance_metric", "mips")
self.dimensions = self.build_params.get("dimensions") self.dimensions = self.build_params.get("dimensions")
if not self.is_recompute: if not self.is_recompute and self.is_compact:
if self.is_compact: # Auto-correct: non-recompute requires non-compact storage for HNSW
# TODO: support this case @andy logger.warning(
raise ValueError( "is_recompute=False requires non-compact HNSW. Forcing is_compact=False."
"is_recompute is False, but is_compact is True. This is not compatible now. change is compact to False and you can use the original HNSW index." )
) self.is_compact = False
self.build_params["is_compact"] = False
def build(self, data: np.ndarray, ids: list[str], index_path: str, **kwargs): def build(self, data: np.ndarray, ids: list[str], index_path: str, **kwargs):
from . import faiss # type: ignore from . import faiss # type: ignore
@@ -184,9 +185,11 @@ class HNSWSearcher(BaseSearcher):
""" """
from . import faiss # type: ignore from . import faiss # type: ignore
if not recompute_embeddings: if not recompute_embeddings and self.is_pruned:
if self.is_pruned: raise RuntimeError(
raise RuntimeError("Recompute is required for pruned index.") "Recompute is required for pruned/compact HNSW index. "
"Re-run search with --recompute, or rebuild with --no-recompute and --no-compact."
)
if recompute_embeddings: if recompute_embeddings:
if zmq_port is None: if zmq_port is None:
raise ValueError("zmq_port must be provided if recompute_embeddings is True") raise ValueError("zmq_port must be provided if recompute_embeddings is True")

View File

@@ -204,6 +204,18 @@ class LeannBuilder:
**backend_kwargs, **backend_kwargs,
): ):
self.backend_name = backend_name self.backend_name = backend_name
# Normalize incompatible combinations early (for consistent metadata)
if backend_name == "hnsw":
is_recompute = backend_kwargs.get("is_recompute", True)
is_compact = backend_kwargs.get("is_compact", True)
if is_recompute is False and is_compact is True:
warnings.warn(
"HNSW with is_recompute=False requires non-compact storage. Forcing is_compact=False.",
UserWarning,
stacklevel=2,
)
backend_kwargs["is_compact"] = False
backend_factory: Optional[LeannBackendFactoryInterface] = BACKEND_REGISTRY.get(backend_name) backend_factory: Optional[LeannBackendFactoryInterface] = BACKEND_REGISTRY.get(backend_name)
if backend_factory is None: if backend_factory is None:
raise ValueError(f"Backend '{backend_name}' not found or not registered.") raise ValueError(f"Backend '{backend_name}' not found or not registered.")
@@ -294,6 +306,23 @@ class LeannBuilder:
def build_index(self, index_path: str): def build_index(self, index_path: str):
if not self.chunks: if not self.chunks:
raise ValueError("No chunks added.") raise ValueError("No chunks added.")
# Filter out invalid/empty text chunks early to keep passage and embedding counts aligned
valid_chunks: list[dict[str, Any]] = []
skipped = 0
for chunk in self.chunks:
text = chunk.get("text", "")
if isinstance(text, str) and text.strip():
valid_chunks.append(chunk)
else:
skipped += 1
if skipped > 0:
print(
f"Warning: Skipping {skipped} empty/invalid text chunk(s). Processing {len(valid_chunks)} valid chunks"
)
self.chunks = valid_chunks
if not self.chunks:
raise ValueError("All provided chunks are empty or invalid. Nothing to index.")
if self.dimensions is None: if self.dimensions is None:
self.dimensions = len( self.dimensions = len(
compute_embeddings( compute_embeddings(
@@ -523,6 +552,7 @@ class LeannSearcher:
self.embedding_model = self.meta_data["embedding_model"] self.embedding_model = self.meta_data["embedding_model"]
# Support both old and new format # Support both old and new format
self.embedding_mode = self.meta_data.get("embedding_mode", "sentence-transformers") self.embedding_mode = self.meta_data.get("embedding_mode", "sentence-transformers")
# Delegate portability handling to PassageManager
self.passage_manager = PassageManager( self.passage_manager = PassageManager(
self.meta_data.get("passage_sources", []), metadata_file_path=self.meta_path_str self.meta_data.get("passage_sources", []), metadata_file_path=self.meta_path_str
) )
@@ -652,6 +682,23 @@ class LeannSearcher:
if hasattr(self.backend_impl, "embedding_server_manager"): if hasattr(self.backend_impl, "embedding_server_manager"):
self.backend_impl.embedding_server_manager.stop_server() self.backend_impl.embedding_server_manager.stop_server()
# Enable automatic cleanup patterns
def __enter__(self):
return self
def __exit__(self, exc_type, exc, tb):
try:
self.cleanup()
except Exception:
pass
def __del__(self):
try:
self.cleanup()
except Exception:
# Avoid noisy errors during interpreter shutdown
pass
class LeannChat: class LeannChat:
def __init__( def __init__(
@@ -730,3 +777,19 @@ class LeannChat:
""" """
if hasattr(self.searcher, "cleanup"): if hasattr(self.searcher, "cleanup"):
self.searcher.cleanup() self.searcher.cleanup()
# Enable automatic cleanup patterns
def __enter__(self):
return self
def __exit__(self, exc_type, exc, tb):
try:
self.cleanup()
except Exception:
pass
def __del__(self):
try:
self.cleanup()
except Exception:
pass

View File

@@ -422,7 +422,6 @@ class LLMInterface(ABC):
top_k=10, top_k=10,
complexity=64, complexity=64,
beam_width=8, beam_width=8,
USE_DEFERRED_FETCH=True,
skip_search_reorder=True, skip_search_reorder=True,
recompute_beighbor_embeddings=True, recompute_beighbor_embeddings=True,
dedup_node_dis=True, dedup_node_dis=True,
@@ -434,7 +433,6 @@ class LLMInterface(ABC):
Supported kwargs: Supported kwargs:
- complexity (int): Search complexity parameter (default: 32) - complexity (int): Search complexity parameter (default: 32)
- beam_width (int): Beam width for search (default: 4) - beam_width (int): Beam width for search (default: 4)
- USE_DEFERRED_FETCH (bool): Enable deferred fetch mode (default: False)
- skip_search_reorder (bool): Skip search reorder step (default: False) - skip_search_reorder (bool): Skip search reorder step (default: False)
- recompute_beighbor_embeddings (bool): Enable ZMQ embedding server for neighbor recomputation (default: False) - recompute_beighbor_embeddings (bool): Enable ZMQ embedding server for neighbor recomputation (default: False)
- dedup_node_dis (bool): Deduplicate nodes by distance (default: False) - dedup_node_dis (bool): Deduplicate nodes by distance (default: False)

View File

@@ -72,7 +72,7 @@ class LeannCLI:
def create_parser(self) -> argparse.ArgumentParser: def create_parser(self) -> argparse.ArgumentParser:
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
prog="leann", prog="leann",
description="LEANN - Local Enhanced AI Navigation", description="The smallest vector index in the world. RAG Everything with LEANN!",
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=""" epilog="""
Examples: Examples:
@@ -102,9 +102,18 @@ Examples:
help="Documents directories and/or files (default: current directory)", help="Documents directories and/or files (default: current directory)",
) )
build_parser.add_argument( build_parser.add_argument(
"--backend", type=str, default="hnsw", choices=["hnsw", "diskann"] "--backend",
type=str,
default="hnsw",
choices=["hnsw", "diskann"],
help="Backend to use (default: hnsw)",
)
build_parser.add_argument(
"--embedding-model",
type=str,
default="facebook/contriever",
help="Embedding model (default: facebook/contriever)",
) )
build_parser.add_argument("--embedding-model", type=str, default="facebook/contriever")
build_parser.add_argument( build_parser.add_argument(
"--embedding-mode", "--embedding-mode",
type=str, type=str,
@@ -112,36 +121,82 @@ Examples:
choices=["sentence-transformers", "openai", "mlx", "ollama"], choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode (default: sentence-transformers)", help="Embedding backend mode (default: sentence-transformers)",
) )
build_parser.add_argument("--force", "-f", action="store_true", help="Force rebuild") build_parser.add_argument(
build_parser.add_argument("--graph-degree", type=int, default=32) "--force", "-f", action="store_true", help="Force rebuild existing index"
build_parser.add_argument("--complexity", type=int, default=64) )
build_parser.add_argument(
"--graph-degree", type=int, default=32, help="Graph degree (default: 32)"
)
build_parser.add_argument(
"--complexity", type=int, default=64, help="Build complexity (default: 64)"
)
build_parser.add_argument("--num-threads", type=int, default=1) build_parser.add_argument("--num-threads", type=int, default=1)
build_parser.add_argument("--compact", action="store_true", default=True) build_parser.add_argument(
build_parser.add_argument("--recompute", action="store_true", default=True) "--compact",
action=argparse.BooleanOptionalAction,
default=True,
help="Use compact storage (default: true). Must be `no-compact` for `no-recompute` build.",
)
build_parser.add_argument(
"--recompute",
action=argparse.BooleanOptionalAction,
default=True,
help="Enable recomputation (default: true)",
)
build_parser.add_argument( build_parser.add_argument(
"--file-types", "--file-types",
type=str, type=str,
help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.", help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
) )
build_parser.add_argument(
"--doc-chunk-size",
type=int,
default=256,
help="Document chunk size in tokens/characters (default: 256)",
)
build_parser.add_argument(
"--doc-chunk-overlap",
type=int,
default=128,
help="Document chunk overlap (default: 128)",
)
build_parser.add_argument(
"--code-chunk-size",
type=int,
default=512,
help="Code chunk size in tokens/lines (default: 512)",
)
build_parser.add_argument(
"--code-chunk-overlap",
type=int,
default=50,
help="Code chunk overlap (default: 50)",
)
# Search command # Search command
search_parser = subparsers.add_parser("search", help="Search documents") search_parser = subparsers.add_parser("search", help="Search documents")
search_parser.add_argument("index_name", help="Index name") search_parser.add_argument("index_name", help="Index name")
search_parser.add_argument("query", help="Search query") search_parser.add_argument("query", help="Search query")
search_parser.add_argument("--top-k", type=int, default=5) search_parser.add_argument(
search_parser.add_argument("--complexity", type=int, default=64) "--top-k", type=int, default=5, help="Number of results (default: 5)"
)
search_parser.add_argument(
"--complexity", type=int, default=64, help="Search complexity (default: 64)"
)
search_parser.add_argument("--beam-width", type=int, default=1) search_parser.add_argument("--beam-width", type=int, default=1)
search_parser.add_argument("--prune-ratio", type=float, default=0.0) search_parser.add_argument("--prune-ratio", type=float, default=0.0)
search_parser.add_argument( search_parser.add_argument(
"--recompute-embeddings", "--recompute",
action="store_true", dest="recompute_embeddings",
action=argparse.BooleanOptionalAction,
default=True, default=True,
help="Recompute embeddings (default: True)", help="Enable/disable embedding recomputation (default: enabled). Should not do a `no-recompute` search in a `recompute` build.",
) )
search_parser.add_argument( search_parser.add_argument(
"--pruning-strategy", "--pruning-strategy",
choices=["global", "local", "proportional"], choices=["global", "local", "proportional"],
default="global", default="global",
help="Pruning strategy (default: global)",
) )
# Ask command # Ask command
@@ -152,19 +207,27 @@ Examples:
type=str, type=str,
default="ollama", default="ollama",
choices=["simulated", "ollama", "hf", "openai"], choices=["simulated", "ollama", "hf", "openai"],
help="LLM provider (default: ollama)",
)
ask_parser.add_argument(
"--model", type=str, default="qwen3:8b", help="Model name (default: qwen3:8b)"
) )
ask_parser.add_argument("--model", type=str, default="qwen3:8b")
ask_parser.add_argument("--host", type=str, default="http://localhost:11434") ask_parser.add_argument("--host", type=str, default="http://localhost:11434")
ask_parser.add_argument("--interactive", "-i", action="store_true") ask_parser.add_argument(
ask_parser.add_argument("--top-k", type=int, default=20) "--interactive", "-i", action="store_true", help="Interactive chat mode"
)
ask_parser.add_argument(
"--top-k", type=int, default=20, help="Retrieval count (default: 20)"
)
ask_parser.add_argument("--complexity", type=int, default=32) ask_parser.add_argument("--complexity", type=int, default=32)
ask_parser.add_argument("--beam-width", type=int, default=1) ask_parser.add_argument("--beam-width", type=int, default=1)
ask_parser.add_argument("--prune-ratio", type=float, default=0.0) ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
ask_parser.add_argument( ask_parser.add_argument(
"--recompute-embeddings", "--recompute",
action="store_true", dest="recompute_embeddings",
action=argparse.BooleanOptionalAction,
default=True, default=True,
help="Recompute embeddings (default: True)", help="Enable/disable embedding recomputation during ask (default: enabled)",
) )
ask_parser.add_argument( ask_parser.add_argument(
"--pruning-strategy", "--pruning-strategy",
@@ -687,6 +750,37 @@ Examples:
print(f"Index '{index_name}' already exists. Use --force to rebuild.") print(f"Index '{index_name}' already exists. Use --force to rebuild.")
return return
# Configure chunking based on CLI args before loading documents
# Guard against invalid configurations
doc_chunk_size = max(1, int(args.doc_chunk_size))
doc_chunk_overlap = max(0, int(args.doc_chunk_overlap))
if doc_chunk_overlap >= doc_chunk_size:
print(
f"⚠️ Adjusting doc chunk overlap from {doc_chunk_overlap} to {doc_chunk_size - 1} (must be < chunk size)"
)
doc_chunk_overlap = doc_chunk_size - 1
code_chunk_size = max(1, int(args.code_chunk_size))
code_chunk_overlap = max(0, int(args.code_chunk_overlap))
if code_chunk_overlap >= code_chunk_size:
print(
f"⚠️ Adjusting code chunk overlap from {code_chunk_overlap} to {code_chunk_size - 1} (must be < chunk size)"
)
code_chunk_overlap = code_chunk_size - 1
self.node_parser = SentenceSplitter(
chunk_size=doc_chunk_size,
chunk_overlap=doc_chunk_overlap,
separator=" ",
paragraph_separator="\n\n",
)
self.code_parser = SentenceSplitter(
chunk_size=code_chunk_size,
chunk_overlap=code_chunk_overlap,
separator="\n",
paragraph_separator="\n\n",
)
all_texts = self.load_documents(docs_paths, args.file_types) all_texts = self.load_documents(docs_paths, args.file_types)
if not all_texts: if not all_texts:
print("No documents found") print("No documents found")

View File

@@ -244,6 +244,16 @@ def compute_embeddings_openai(texts: list[str], model_name: str) -> np.ndarray:
except ImportError as e: except ImportError as e:
raise ImportError(f"OpenAI package not installed: {e}") raise ImportError(f"OpenAI package not installed: {e}")
# Validate input list
if not texts:
raise ValueError("Cannot compute embeddings for empty text list")
# Extra validation: abort early if any item is empty/whitespace
invalid_count = sum(1 for t in texts if not isinstance(t, str) or not t.strip())
if invalid_count > 0:
raise ValueError(
f"Found {invalid_count} empty/invalid text(s) in input. Upstream should filter before calling OpenAI."
)
api_key = os.getenv("OPENAI_API_KEY") api_key = os.getenv("OPENAI_API_KEY")
if not api_key: if not api_key:
raise RuntimeError("OPENAI_API_KEY environment variable not set") raise RuntimeError("OPENAI_API_KEY environment variable not set")
@@ -263,8 +273,16 @@ def compute_embeddings_openai(texts: list[str], model_name: str) -> np.ndarray:
print(f"len of texts: {len(texts)}") print(f"len of texts: {len(texts)}")
# OpenAI has limits on batch size and input length # OpenAI has limits on batch size and input length
max_batch_size = 1000 # Conservative batch size max_batch_size = 800 # Conservative batch size because the token limit is 300K
all_embeddings = [] all_embeddings = []
# get the avg len of texts
avg_len = sum(len(text) for text in texts) / len(texts)
print(f"avg len of texts: {avg_len}")
# if avg len is less than 1000, use the max batch size
if avg_len > 300:
max_batch_size = 500
# if avg len is less than 1000, use the max batch size
try: try:
from tqdm import tqdm from tqdm import tqdm

View File

@@ -268,8 +268,12 @@ class EmbeddingServerManager:
f"Terminating server process (PID: {self.server_process.pid}) for backend {self.backend_module_name}..." f"Terminating server process (PID: {self.server_process.pid}) for backend {self.backend_module_name}..."
) )
# Use simple termination - our improved server shutdown should handle this properly # Use simple termination first; if the server installed signal handlers,
self.server_process.terminate() # it will exit cleanly. Otherwise escalate to kill after a short wait.
try:
self.server_process.terminate()
except Exception:
pass
try: try:
self.server_process.wait(timeout=5) # Give more time for graceful shutdown self.server_process.wait(timeout=5) # Give more time for graceful shutdown
@@ -278,7 +282,10 @@ class EmbeddingServerManager:
logger.warning( logger.warning(
f"Server process {self.server_process.pid} did not terminate within 5 seconds, force killing..." f"Server process {self.server_process.pid} did not terminate within 5 seconds, force killing..."
) )
self.server_process.kill() try:
self.server_process.kill()
except Exception:
pass
try: try:
self.server_process.wait(timeout=2) self.server_process.wait(timeout=2)
logger.info(f"Server process {self.server_process.pid} killed successfully.") logger.info(f"Server process {self.server_process.pid} killed successfully.")

View File

@@ -64,19 +64,6 @@ def handle_request(request):
"required": ["index_name", "query"], "required": ["index_name", "query"],
}, },
}, },
{
"name": "leann_status",
"description": "📊 Check the health and stats of your code indexes - like a medical checkup for your codebase knowledge!",
"inputSchema": {
"type": "object",
"properties": {
"index_name": {
"type": "string",
"description": "Optional: Name of specific index to check. If not provided, shows status of all indexes.",
}
},
},
},
{ {
"name": "leann_list", "name": "leann_list",
"description": "📋 Show all your indexed codebases - your personal code library! Use this to see what's available for search.", "description": "📋 Show all your indexed codebases - your personal code library! Use this to see what's available for search.",
@@ -118,15 +105,6 @@ def handle_request(request):
] ]
result = subprocess.run(cmd, capture_output=True, text=True) result = subprocess.run(cmd, capture_output=True, text=True)
elif tool_name == "leann_status":
if args.get("index_name"):
# Check specific index status - for now, we'll use leann list and filter
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
# We could enhance this to show more detailed status per index
else:
# Show all indexes status
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
elif tool_name == "leann_list": elif tool_name == "leann_list":
result = subprocess.run(["leann", "list"], capture_output=True, text=True) result = subprocess.run(["leann", "list"], capture_output=True, text=True)

View File

@@ -13,10 +13,20 @@ This installs the `leann` CLI into an isolated tool environment and includes bot
## 🚀 Quick Setup ## 🚀 Quick Setup
Add the LEANN MCP server to Claude Code: Add the LEANN MCP server to Claude Code. Choose the scope based on how widely you want it available. Below is the command to install it globally; if you prefer a local install, skip this step:
```bash ```bash
claude mcp add leann-server -- leann_mcp # Global (recommended): available in all projects for your user
claude mcp add --scope user leann-server -- leann_mcp
```
- `leann-server`: the display name of the MCP server in Claude Code (you can change it).
- `leann_mcp`: the Python entry point installed with LEANN that starts the MCP server.
Verify it is registered globally:
```bash
claude mcp list | cat
``` ```
## 🛠️ Available Tools ## 🛠️ Available Tools
@@ -25,27 +35,36 @@ Once connected, you'll have access to these powerful semantic search tools in Cl
- **`leann_list`** - List all available indexes across your projects - **`leann_list`** - List all available indexes across your projects
- **`leann_search`** - Perform semantic searches across code and documents - **`leann_search`** - Perform semantic searches across code and documents
- **`leann_ask`** - Ask natural language questions and get AI-powered answers from your codebase
## 🎯 Quick Start Example ## 🎯 Quick Start Example
```bash ```bash
# Add locally if you did not add it globally (current folder only; default if --scope is omitted)
claude mcp add leann-server -- leann_mcp
# Build an index for your project (change to your actual path) # Build an index for your project (change to your actual path)
leann build my-project --docs ./ # See the advanced examples below for more ways to configure indexing
# Set the index name (replace 'my-project' with your own)
leann build my-project --docs $(git ls-files)
# Start Claude Code # Start Claude Code
claude claude
``` ```
## 🚀 Advanced Usage Examples ## 🚀 Advanced Usage Examples to build the index
### Index Entire Git Repository ### Index Entire Git Repository
```bash ```bash
# Index all tracked files in your git repository, note right now we will skip submodules, but we can add it back easily if you want # Index all tracked files in your Git repository.
# Note: submodules are currently skipped; we can add them back if needed.
leann build my-repo --docs $(git ls-files) --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw leann build my-repo --docs $(git ls-files) --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
# Index only specific file types from git # Index only tracked Python files from Git.
leann build my-python-code --docs $(git ls-files "*.py") --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw leann build my-python-code --docs $(git ls-files "*.py") --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
# If you encounter empty requests caused by empty files (e.g., __init__.py), exclude zero-byte files. Thanks @ww2283 for pointing [that](https://github.com/yichuan-w/LEANN/issues/48) out
leann build leann-prospec-lig --docs $(find ./src -name "*.py" -not -empty) --embedding-mode openai --embedding-model text-embedding-3-small
``` ```
### Multiple Directories and Files ### Multiple Directories and Files
@@ -73,7 +92,7 @@ leann build docs-and-configs --docs $(git ls-files "*.md" "*.yml" "*.yaml" "*.js
``` ```
**Try this in Claude Code:** ## **Try this in Claude Code:**
``` ```
Help me understand this codebase. List available indexes and search for authentication patterns. Help me understand this codebase. List available indexes and search for authentication patterns.
``` ```
@@ -82,6 +101,7 @@ Help me understand this codebase. List available indexes and search for authenti
<img src="../../assets/claude_code_leann.png" alt="LEANN in Claude Code" width="80%"> <img src="../../assets/claude_code_leann.png" alt="LEANN in Claude Code" width="80%">
</p> </p>
If you see a prompt asking whether to proceed with LEANN, you can now use it in your chat!
## 🧠 How It Works ## 🧠 How It Works
@@ -117,3 +137,11 @@ To remove LEANN
``` ```
uv pip uninstall leann leann-backend-hnsw leann-core uv pip uninstall leann leann-backend-hnsw leann-core
``` ```
To globally remove LEANN (for version update)
```
uv tool list | cat
uv tool uninstall leann-core
command -v leann || echo "leann gone"
command -v leann_mcp || echo "leann_mcp gone"
```

View File

@@ -0,0 +1 @@
__all__ = []

View File

@@ -136,5 +136,9 @@ def export_sqlite(
connection.commit() connection.commit()
if __name__ == "__main__": def main():
app() app()
if __name__ == "__main__":
main()

View File

@@ -10,6 +10,7 @@ requires-python = ">=3.9"
dependencies = [ dependencies = [
"leann-core", "leann-core",
"leann-backend-hnsw", "leann-backend-hnsw",
"typer>=0.12.3",
"numpy>=1.26.0", "numpy>=1.26.0",
"torch", "torch",
"tqdm", "tqdm",
@@ -84,6 +85,11 @@ documents = [
[tool.setuptools] [tool.setuptools]
py-modules = [] py-modules = []
packages = ["wechat_exporter"]
package-dir = { "wechat_exporter" = "packages/wechat-exporter" }
[project.scripts]
wechat-exporter = "wechat_exporter.main:main"
[tool.uv.sources] [tool.uv.sources]

76
sky/leann-build.yaml Normal file
View File

@@ -0,0 +1,76 @@
name: leann-build
resources:
# Choose a GPU for fast embeddings (examples: L4, A10G, A100). CPU also works but is slower.
accelerators: L4:1
# Optionally pin a cloud, otherwise SkyPilot will auto-select
# cloud: aws
disk_size: 100
envs:
# Build parameters (override with: sky launch -c leann-gpu sky/leann-build.yaml -e key=value)
index_name: my-index
docs: ./data
backend: hnsw # hnsw | diskann
complexity: 64
graph_degree: 32
num_threads: 8
# Embedding selection
embedding_mode: sentence-transformers # sentence-transformers | openai | mlx | ollama
embedding_model: facebook/contriever
# Storage/latency knobs
recompute: true # true => selective recomputation (recommended)
compact: true # for HNSW only
# Optional pass-through
extra_args: ""
# Rebuild control
force: true
# Sync local paths to the remote VM. Adjust as needed.
file_mounts:
# Example: mount your local data directory used for building
~/leann-data: ${docs}
setup: |
set -e
# Install uv (package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# Ensure modern libstdc++ for FAISS (GLIBCXX >= 3.4.30)
sudo apt-get update -y
sudo apt-get install -y libstdc++6 libgomp1
# Also upgrade conda's libstdc++ in base env (Skypilot images include conda)
if command -v conda >/dev/null 2>&1; then
conda install -y -n base -c conda-forge libstdcxx-ng
fi
# Install LEANN CLI and backends into the user environment
uv pip install --upgrade pip
uv pip install leann-core leann-backend-hnsw leann-backend-diskann
run: |
export PATH="$HOME/.local/bin:$PATH"
# Derive flags from env
recompute_flag=""
if [ "${recompute}" = "false" ] || [ "${recompute}" = "0" ]; then
recompute_flag="--no-recompute"
fi
force_flag=""
if [ "${force}" = "true" ] || [ "${force}" = "1" ]; then
force_flag="--force"
fi
# Build command
python -m leann.cli build ${index_name} \
--docs ~/leann-data \
--backend ${backend} \
--complexity ${complexity} \
--graph-degree ${graph_degree} \
--num-threads ${num_threads} \
--embedding-mode ${embedding_mode} \
--embedding-model ${embedding_model} \
${recompute_flag} ${force_flag} ${extra_args}
# Print where the index is stored for downstream rsync
echo "INDEX_OUT_DIR=~/.leann/indexes/${index_name}"

10
uv.lock generated
View File

@@ -2223,7 +2223,7 @@ wheels = [
[[package]] [[package]]
name = "leann-backend-diskann" name = "leann-backend-diskann"
version = "0.2.8" version = "0.2.9"
source = { editable = "packages/leann-backend-diskann" } source = { editable = "packages/leann-backend-diskann" }
dependencies = [ dependencies = [
{ name = "leann-core" }, { name = "leann-core" },
@@ -2235,14 +2235,14 @@ dependencies = [
[package.metadata] [package.metadata]
requires-dist = [ requires-dist = [
{ name = "leann-core", specifier = "==0.2.8" }, { name = "leann-core", specifier = "==0.2.9" },
{ name = "numpy" }, { name = "numpy" },
{ name = "protobuf", specifier = ">=3.19.0" }, { name = "protobuf", specifier = ">=3.19.0" },
] ]
[[package]] [[package]]
name = "leann-backend-hnsw" name = "leann-backend-hnsw"
version = "0.2.8" version = "0.2.9"
source = { editable = "packages/leann-backend-hnsw" } source = { editable = "packages/leann-backend-hnsw" }
dependencies = [ dependencies = [
{ name = "leann-core" }, { name = "leann-core" },
@@ -2255,7 +2255,7 @@ dependencies = [
[package.metadata] [package.metadata]
requires-dist = [ requires-dist = [
{ name = "leann-core", specifier = "==0.2.8" }, { name = "leann-core", specifier = "==0.2.9" },
{ name = "msgpack", specifier = ">=1.0.0" }, { name = "msgpack", specifier = ">=1.0.0" },
{ name = "numpy" }, { name = "numpy" },
{ name = "pyzmq", specifier = ">=23.0.0" }, { name = "pyzmq", specifier = ">=23.0.0" },
@@ -2263,7 +2263,7 @@ requires-dist = [
[[package]] [[package]]
name = "leann-core" name = "leann-core"
version = "0.2.8" version = "0.2.9"
source = { editable = "packages/leann-core" } source = { editable = "packages/leann-core" }
dependencies = [ dependencies = [
{ name = "accelerate" }, { name = "accelerate" },