Compare commits

..

13 Commits

Author SHA1 Message Date
Andy Lee
8eee90bf80 docs: add a link 2025-08-04 20:10:14 -07:00
Andy Lee
649d4ad03e docs: Address all configuration guide feedback
- Fix grammar: 'If time is not a constraint' instead of 'time expense is not large'
- Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level)
- Add OpenAI quick start section with configuration example
- Fold Cloud vs Local trade-offs into collapsible section
- Update HNSW as 'default and recommended for extreme low storage'
- Add DiskANN beta warning and explain PQ+rerank architecture
- Expand Ollama models: add qwen3:0.6b, 4b, 7b variants
- Note OpenAI as current default but recommend Ollama switch
- Add 'need to install extra software' warning for Ollama
- Remove incorrect latency numbers from search-complexity recommendations
2025-08-04 20:01:23 -07:00
Andy Lee
d9b6f195c5 docs: Improve configuration guide based on feedback
- List specific files in default data/ directory (2 AI papers, literature, tech report)
- Update examples to use English and better RAG-suitable queries
- Change full dataset reference to use --max-items -1
- Adjust small model guidance about upgrading to larger models when time allows
- Update top-k defaults to reflect actual default of 20
- Ensure consistent use of full model name Qwen/Qwen3-Embedding-0.6B
- Reorder optimization steps, move MLX to third position
- Remove incorrect chunk size tuning guidance
- Change README from 'Having trouble' to 'Need best practices'
2025-08-04 19:29:17 -07:00
Andy Lee
00f506c0bd docs: Adjust DiskANN positioning in features and roadmap
- features.md: Put HNSW/FAISS first as default, DiskANN as optional
- roadmap.md: Reorder to show HNSW integration before DiskANN
- Consistent with positioning DiskANN as advanced option for large-scale use
2025-08-04 17:53:27 -07:00
Andy Lee
e872dd1d23 docs: Weaken DiskANN emphasis in README
- Change backend description to emphasize HNSW as default
- DiskANN positioned as optional for billion-scale datasets
- Simplify evaluation commands to be more generic
2025-08-04 17:51:21 -07:00
Andy Lee
063c687ff7 chore: move evaluation data .gitattributes to correct location 2025-08-04 17:46:17 -07:00
Andy Lee
bb8ecd54d7 feat: add comprehensive configuration guide and update README
- Create docs/configuration-guide.md with detailed guidance on:
  - Embedding model selection (small/medium/large)
  - Index selection (HNSW vs DiskANN)
  - LLM engine and model comparison
  - Parameter tuning (build/search complexity, top-k)
  - Performance optimization tips
  - Deep dive into LEANN's recomputation feature
- Update README.md to link to the configuration guide
- Include latest 2025 model recommendations (Qwen3, DeepSeek-R1, O3-mini)
2025-08-04 17:41:27 -07:00
Andy Lee
716217ae24 docs: config guidance 2025-08-04 16:21:13 -07:00
Andy Lee
dd71ac8d71 feat: implement smart memory configuration for DiskANN (#16)
- Add intelligent memory calculation based on data size and system specs
- search_memory_maximum: 1/10 of embedding size (controls PQ compression)
- build_memory_maximum: 50% of available RAM (controls sharding)
- Provides optimal balance between performance and memory usage
- Automatic fallback to default values if parameters are explicitly provided
2025-08-04 14:36:29 -07:00
GitHub Actions
8bee1d4100 chore: release v0.2.0 2025-08-04 21:34:31 +00:00
yichuan520030910320
33521d6d00 add logs 2025-08-04 14:15:52 -07:00
Andy Lee
8899734952 refactor: Unify examples interface with BaseRAGExample (#12)
* refactor: Unify examples interface with BaseRAGExample

- Create BaseRAGExample base class for all RAG examples
- Refactor 4 examples to use unified interface:
  - document_rag.py (replaces main_cli_example.py)
  - email_rag.py (replaces mail_reader_leann.py)
  - browser_rag.py (replaces google_history_reader_leann.py)
  - wechat_rag.py (replaces wechat_history_reader_leann.py)
- Maintain 100% parameter compatibility with original files
- Add interactive mode support for all examples
- Unify parameter names (--max-items replaces --max-emails/--max-entries)
- Update README.md with new examples usage
- Add PARAMETER_CONSISTENCY.md documenting all parameter mappings
- Keep main_cli_example.py for backward compatibility with migration notice

All default values, LeannBuilder parameters, and chunking settings
remain identical to ensure full compatibility with existing indexes.

* fix: Update CI tests for new unified examples interface

- Rename test_main_cli.py to test_document_rag.py
- Update all references from main_cli_example.py to document_rag.py
- Update tests/README.md documentation

The tests now properly test the new unified interface while maintaining
the same test coverage and functionality.

* fix: Fix pre-commit issues and update tests

- Fix import sorting and unused imports
- Update type annotations to use built-in types (list, dict) instead of typing.List/Dict
- Fix trailing whitespace and end-of-file issues
- Fix Chinese fullwidth comma to regular comma
- Update test_main_cli.py to test_document_rag.py
- Add backward compatibility test for main_cli_example.py
- Pass all pre-commit hooks (ruff, ruff-format, etc.)

* refactor: Remove old example scripts and migration references

- Delete old example scripts (mail_reader_leann.py, google_history_reader_leann.py, etc.)
- Remove migration hints and backward compatibility
- Update tests to use new unified examples directly
- Clean up all references to old script names
- Users now only see the new unified interface

* fix: Restore embedding-mode parameter to all examples

- All examples now have --embedding-mode parameter (unified interface benefit)
- Default is 'sentence-transformers' (consistent with original behavior)
- Users can now use OpenAI or MLX embeddings with any data source
- Maintains functional equivalence with original scripts

* docs: Improve parameter categorization in README

- Clearly separate core (shared) vs specific parameters
- Move LLM and embedding examples to 'Example Commands' section
- Add descriptive comments for all specific parameters
- Keep only truly data-source-specific parameters in specific sections

* docs: Make example commands more representative

- Add default values to parameter descriptions
- Replace generic examples with real-world use cases
- Focus on data-source-specific features in examples
- Remove redundant demonstrations of common parameters

* docs: Reorganize parameter documentation structure

- Move common parameters to a dedicated section before all examples
- Rename sections to 'X-Specific Arguments' for clarity
- Remove duplicate common parameters from individual examples
- Better information architecture for users

* docs: polish applications

* docs: Add CLI installation instructions

- Add two installation options: venv and global uv tool
- Clearly explain when to use each option
- Make CLI more accessible for daily use

* docs: Clarify CLI global installation process

- Explain the transition from venv to global installation
- Add upgrade command for global installation
- Make it clear that global install allows usage without venv activation

* docs: Add collapsible section for CLI installation

- Wrap CLI installation instructions in details/summary tags
- Keep consistent with other collapsible sections in README
- Improve document readability and navigation

* style: format

* docs: Fix collapsible sections

- Make Common Parameters collapsible (as it's lengthy reference material)
- Keep CLI Installation visible (important for users to see immediately)
- Better information hierarchy

* docs: Add introduction for Common Parameters section

- Add 'Flexible Configuration' heading with descriptive sentence
- Create parallel structure with 'Generation Model Setup' section
- Improve document flow and readability

* docs: nit

* fix: Fix issues in unified examples

- Add smart path detection for data directory
- Fix add_texts -> add_text method call
- Handle both running from project root and examples directory

* fix: Fix async/await and add_text issues in unified examples

- Remove incorrect await from chat.ask() calls (not async)
- Fix add_texts -> add_text method calls
- Verify search-complexity correctly maps to efSearch parameter
- All examples now run successfully

* feat: Address review comments

- Add complexity parameter to LeannChat initialization (default: search_complexity)
- Fix chunk-size default in README documentation (256, not 2048)
- Add more index building parameters as CLI arguments:
  - --backend-name (hnsw/diskann)
  - --graph-degree (default: 32)
  - --build-complexity (default: 64)
  - --no-compact (disable compact storage)
  - --no-recompute (disable embedding recomputation)
- Update README to document all new parameters

* feat: Add chunk-size parameters and improve file type filtering

- Add --chunk-size and --chunk-overlap parameters to all RAG examples
- Preserve original default values for each data source:
  - Document: 256/128 (optimized for general documents)
  - Email: 256/25 (smaller overlap for email threads)
  - Browser: 256/128 (standard for web content)
  - WeChat: 192/64 (smaller chunks for chat messages)
- Make --file-types optional filter instead of restriction in document_rag
- Update README to clarify interactive mode and parameter usage
- Fix LLM default model documentation (gpt-4o, not gpt-4o-mini)

* feat: Update documentation based on review feedback

- Add MLX embedding example to README
- Clarify examples/data content description (two papers, Pride and Prejudice, Chinese README)
- Move chunk parameters to common parameters section
- Remove duplicate chunk parameters from document-specific section

* docs: Emphasize diverse data sources in examples/data description

* fix: update default embedding models for better performance

- Change WeChat, Browser, and Email RAG examples to use all-MiniLM-L6-v2
- Previous Qwen/Qwen3-Embedding-0.6B was too slow for these use cases
- all-MiniLM-L6-v2 is a fast 384-dim model, ideal for large-scale personal data

* add response highlight

* change rebuild logic

* fix some example

* feat: check if k is larger than #docs

* fix: WeChat history reader bugs and refactor wechat_rag to use unified architecture

* fix email wrong -1 to process all file

* refactor: reorgnize all examples/ and test/

* refactor: reorganize examples and add link checker

* fix: add init.py

* fix: handle certificate errors in link checker

* fix wechat

* merge

* docs: update README to use proper module imports for apps

- Change from 'python apps/xxx.py' to 'python -m apps.xxx'
- More professional and pythonic module calling
- Ensures proper module resolution and imports
- Better separation between apps/ (production tools) and examples/ (demos)

---------

Co-authored-by: yichuan520030910320 <yichuan_wang@berkeley.edu>
2025-08-03 23:06:24 -07:00
Andy Lee
54df6310c5 fix: diskann build and prevent termination from hanging
- Fix OpenMP library linking in DiskANN CMake configuration
- Add timeout protection for HuggingFace model loading to prevent hangs
- Improve embedding server process termination with better timeouts
- Make DiskANN backend default enabled alongside HNSW
- Update documentation to reflect both backends included by default
2025-08-03 21:16:52 -07:00
17 changed files with 367 additions and 42 deletions

View File

@@ -170,6 +170,8 @@ ollama pull llama3.2:1b
LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.
📚 **Need configuration best practices?** Check our [Configuration Guide](docs/configuration-guide.md) for detailed optimization tips, model selection advice, and solutions to common issues like slow embeddings or poor search quality.
<details>
<summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>
@@ -514,7 +516,7 @@ Options:
- **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
- **Two-level search:** Smart graph traversal that prioritizes promising nodes
**Backends:** DiskANN or HNSW - pick what works for your data size.
**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
## Benchmarks
@@ -534,8 +536,7 @@ Options:
```bash
uv pip install -e ".[dev]" # Install dev dependencies
python benchmarks/run_evaluation.py data/indices/dpr/dpr_diskann # DPR dataset
python benchmarks/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index # Wikipedia
python benchmarks/run_evaluation.py # Will auto-download evaluation data and run benchmarks
```
The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!

View File

@@ -99,7 +99,9 @@ if __name__ == "__main__":
print("- 'What are the main techniques LEANN uses?'")
print("- 'What is the technique DLPM?'")
print("- 'Who does Elizabeth Bennet marry?'")
print("- 'What is the problem of developing pan gu model? (盘古大模型开发中遇到什么问题?)'")
print(
"- 'What is the problem of developing pan gu model Huawei meets? (盘古大模型开发中遇到什么问题?)'"
)
print("\nOr run without --query for interactive mode\n")
rag = DocumentRAG()

View File

236
docs/configuration-guide.md Normal file
View File

@@ -0,0 +1,236 @@
# LEANN Configuration Guide
This guide helps you optimize LEANN for different use cases and understand the trade-offs between various configuration options.
## Getting Started: Simple is Better
When first trying LEANN, start with a small dataset to quickly validate your approach:
**For document RAG**: The default `data/` directory works perfectly - includes 2 AI research papers, Pride and Prejudice literature, and a technical report
```bash
python -m apps.document_rag --query "What techniques does LEANN use?"
```
**For other data sources**: Limit the dataset size for quick testing
```bash
# WeChat: Test with recent messages only
python -m apps.wechat_rag --max-items 100 --query "What did we discuss about the project timeline?"
# Browser history: Last few days
python -m apps.browser_rag --max-items 500 --query "Find documentation about vector databases"
# Email: Recent inbox
python -m apps.email_rag --max-items 200 --query "Who sent updates about the deployment status?"
```
Once validated, scale up gradually:
- 100 documents → 1,000 → 10,000 → full dataset (`--max-items -1`)
- This helps identify issues early before committing to long processing times
## Embedding Model Selection: Understanding the Trade-offs
Based on our experience developing LEANN, embedding models fall into three categories:
### Small Models (< 100M parameters)
**Example**: `sentence-transformers/all-MiniLM-L6-v2` (22M params)
- **Pros**: Lightweight, fast for both indexing and inference
- **Cons**: Lower semantic understanding, may miss nuanced relationships
- **Use when**: Speed is critical, handling simple queries, interactive mode, or just experimenting with LEANN. If time is not a constraint, consider using a larger/better embedding model
### Medium Models (100M-500M parameters)
**Example**: `facebook/contriever` (110M params), `BAAI/bge-base-en-v1.5` (110M params)
- **Pros**: Balanced performance, good multilingual support, reasonable speed
- **Cons**: Requires more compute than small models
- **Use when**: Need quality results without extreme compute requirements, general-purpose RAG applications
### Large Models (500M+ parameters)
**Example**: `Qwen/Qwen3-Embedding-0.6B` (600M params), `intfloat/multilingual-e5-large` (560M params)
- **Pros**: Best semantic understanding, captures complex relationships, excellent multilingual support. **Qwen3-Embedding-0.6B achieves nearly OpenAI API performance!**
- **Cons**: Slower inference, longer index build times
- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
### Quick Start: OpenAI Embeddings (Fastest Setup)
For immediate testing without local model downloads:
```bash
# Set OpenAI embeddings (requires OPENAI_API_KEY)
--embedding-mode openai --embedding-model text-embedding-3-small
```
<details>
<summary><strong>Cloud vs Local Trade-offs</strong></summary>
**OpenAI Embeddings** (`text-embedding-3-small/large`)
- **Pros**: No local compute needed, consistently fast, high quality
- **Cons**: Requires API key, costs money, data leaves your system, [known limitations with certain languages](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
- **When to use**: Prototyping, non-sensitive data, need immediate results
**Local Embeddings**
- **Pros**: Complete privacy, no ongoing costs, full control, can sometimes outperform OpenAI embeddings
- **Cons**: Slower than cloud APIs, requires local compute resources
- **When to use**: Production systems, sensitive data, cost-sensitive applications
</details>
## Index Selection: Matching Your Scale
### HNSW (Hierarchical Navigable Small World)
**Best for**: Small to medium datasets (< 10M vectors) - **Default and recommended for extreme low storage**
- Full recomputation required
- High memory usage during build phase
- Excellent recall (95%+)
```bash
# Optimal for most use cases
--backend-name hnsw --graph-degree 32 --build-complexity 64
```
### DiskANN
**Best for**: Large datasets (> 10M vectors, 10GB+ index size) - **⚠️ Beta version, still in active development**
- Uses Product Quantization (PQ) for coarse filtering during graph traversal
- Novel approach: stores only PQ codes, performs rerank with exact computation in final step
- Implements a corner case of double-queue: prunes all neighbors and recomputes at the end
```bash
# For billion-scale deployments
--backend-name diskann --graph-degree 64 --build-complexity 128
```
## LLM Selection: Engine and Model Comparison
### LLM Engines
**OpenAI** (`--llm openai`)
- **Pros**: Best quality, consistent performance, no local resources needed
- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
- **Note**: Our current default, but we recommend switching to Ollama for most use cases
**Ollama** (`--llm ollama`)
- **Pros**: Fully local, free, privacy-preserving, good model variety
- **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
- **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
**HuggingFace** (`--llm hf`)
- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
- **Cons**: More complex initial setup
- **Models**: `Qwen/Qwen3-1.7B-FP8`
## Parameter Tuning Guide
### Search Complexity Parameters
**`--build-complexity`** (index building)
- Controls thoroughness during index construction
- Higher = better recall but slower build
- Recommendations:
- 32: Quick prototyping
- 64: Balanced (default)
- 128: Production systems
- 256: Maximum quality
**`--search-complexity`** (query time)
- Controls search thoroughness
- Higher = better results but slower
- Recommendations:
- 16: Fast/Interactive search
- 32: High quality with diversity
- 64+: Maximum accuracy
### Top-K Selection
**`--top-k`** (number of retrieved chunks)
- More chunks = better context but slower LLM processing
- Should be always smaller than `--search-complexity`
- Guidelines:
- 10-20: General questions (default: 20)
- 30+: Complex multi-hop reasoning requiring comprehensive context
**Trade-off formula**:
- Retrieval time ∝ log(n) × search_complexity
- LLM processing time ∝ top_k × chunk_size
- Total context = top_k × chunk_size tokens
### Graph Degree (HNSW/DiskANN)
**`--graph-degree`**
- Number of connections per node in the graph
- Higher = better recall but more memory
- HNSW: 16-32 (default: 32)
- DiskANN: 32-128 (default: 64)
## Performance Optimization Checklist
### If Embedding is Too Slow
1. **Switch to smaller model**:
```bash
# From large model
--embedding-model Qwen/Qwen3-Embedding-0.6B
# To small model
--embedding-model sentence-transformers/all-MiniLM-L6-v2
```
2. **Limit dataset size for testing**:
```bash
--max-items 1000 # Process first 1k items only
```
3. **Use MLX on Apple Silicon** (optional optimization):
```bash
--embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
```
### If Search Quality is Poor
1. **Increase retrieval count**:
```bash
--top-k 30 # Retrieve more candidates
```
2. **Upgrade embedding model**:
```bash
# For English
--embedding-model BAAI/bge-base-en-v1.5
# For multilingual
--embedding-model intfloat/multilingual-e5-large
```
## Understanding the Trade-offs
Every configuration choice involves trade-offs:
| Factor | Small/Fast | Large/Quality |
|--------|------------|---------------|
| Embedding Model | `all-MiniLM-L6-v2` | `Qwen/Qwen3-Embedding-0.6B` |
| Chunk Size | 512 tokens | 128 tokens |
| Index Type | HNSW | DiskANN |
| LLM | `qwen3:1.7b` | `gpt-4o` |
The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
## Deep Dive: Critical Configuration Decisions
### When to Disable Recomputation
LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
```bash
--no-recompute # Disable selective recomputation
```
**Trade-offs**:
- **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
- **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search
**Disable when**:
- You have abundant storage and memory
- Need extremely low latency (< 100ms)
- Running a read-heavy workload where storage cost is acceptable
## Further Reading
- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
- [LEANN Technical Paper](https://arxiv.org/abs/2506.08276)
- [DiskANN Original Paper](https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)

View File

@@ -5,7 +5,7 @@
- **🔄 Real-time Embeddings** - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
- **📈 Scalable Architecture** - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
- **🎯 Graph Pruning** - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
- **🏗️ Pluggable Backends** - DiskANN, HNSW/FAISS with unified API
- **🏗️ Pluggable Backends** - HNSW/FAISS (default), with optional DiskANN for large-scale deployments
## 🛠️ Technical Highlights
- **🔄 Recompute Mode** - Highest accuracy scenarios while eliminating vector storage overhead

View File

@@ -2,8 +2,8 @@
## 🎯 Q2 2025
- [X] DiskANN backend with MIPS/L2/Cosine support
- [X] HNSW backend integration
- [X] DiskANN backend with MIPS/L2/Cosine support
- [X] Real-time embedding pipeline
- [X] Memory-efficient graph pruning

View File

@@ -7,6 +7,7 @@ from pathlib import Path
from typing import Any, Literal
import numpy as np
import psutil
from leann.interface import (
LeannBackendBuilderInterface,
LeannBackendFactoryInterface,
@@ -84,6 +85,43 @@ def _write_vectors_to_bin(data: np.ndarray, file_path: Path):
f.write(data.tobytes())
def _calculate_smart_memory_config(data: np.ndarray) -> tuple[float, float]:
"""
Calculate smart memory configuration for DiskANN based on data size and system specs.
Args:
data: The embedding data array
Returns:
tuple: (search_memory_maximum, build_memory_maximum) in GB
"""
num_vectors, dim = data.shape
# Calculate embedding storage size
embedding_size_bytes = num_vectors * dim * 4 # float32 = 4 bytes
embedding_size_gb = embedding_size_bytes / (1024**3)
# search_memory_maximum: 1/10 of embedding size for optimal PQ compression
# This controls Product Quantization size - smaller means more compression
search_memory_gb = max(0.1, embedding_size_gb / 10) # At least 100MB
# build_memory_maximum: Based on available system RAM for sharding control
# This controls how much memory DiskANN uses during index construction
available_memory_gb = psutil.virtual_memory().available / (1024**3)
total_memory_gb = psutil.virtual_memory().total / (1024**3)
# Use 50% of available memory, but at least 2GB and at most 75% of total
build_memory_gb = max(2.0, min(available_memory_gb * 0.5, total_memory_gb * 0.75))
logger.info(
f"Smart memory config - Data: {embedding_size_gb:.2f}GB, "
f"Search mem: {search_memory_gb:.2f}GB (PQ control), "
f"Build mem: {build_memory_gb:.2f}GB (sharding control)"
)
return search_memory_gb, build_memory_gb
@register_backend("diskann")
class DiskannBackend(LeannBackendFactoryInterface):
@staticmethod
@@ -121,6 +159,16 @@ class DiskannBuilder(LeannBackendBuilderInterface):
f"Unsupported distance_metric '{build_kwargs.get('distance_metric', 'unknown')}'."
)
# Calculate smart memory configuration if not explicitly provided
if (
"search_memory_maximum" not in build_kwargs
or "build_memory_maximum" not in build_kwargs
):
smart_search_mem, smart_build_mem = _calculate_smart_memory_config(data)
else:
smart_search_mem = build_kwargs.get("search_memory_maximum", 4.0)
smart_build_mem = build_kwargs.get("build_memory_maximum", 8.0)
try:
from . import _diskannpy as diskannpy # type: ignore
@@ -131,8 +179,8 @@ class DiskannBuilder(LeannBackendBuilderInterface):
index_prefix,
build_kwargs.get("complexity", 64),
build_kwargs.get("graph_degree", 32),
build_kwargs.get("search_memory_maximum", 4.0),
build_kwargs.get("build_memory_maximum", 8.0),
build_kwargs.get("search_memory_maximum", smart_search_mem),
build_kwargs.get("build_memory_maximum", smart_build_mem),
build_kwargs.get("num_threads", 8),
build_kwargs.get("pq_disk_bytes", 0),
"",

View File

@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-diskann"
version = "0.1.16"
dependencies = ["leann-core==0.1.16", "numpy", "protobuf>=3.19.0"]
version = "0.2.0"
dependencies = ["leann-core==0.2.0", "numpy", "protobuf>=3.19.0"]
[tool.scikit-build]
# Key: simplified CMake path

View File

@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-hnsw"
version = "0.1.16"
version = "0.2.0"
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
dependencies = [
"leann-core==0.1.16",
"leann-core==0.2.0",
"numpy",
"pyzmq>=23.0.0",
"msgpack>=1.0.0",

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann-core"
version = "0.1.16"
version = "0.2.0"
description = "Core API and plugin system for LEANN"
readme = "README.md"
requires-python = ">=3.9"

View File

@@ -636,7 +636,10 @@ class LeannChat:
"Please provide the best answer you can based on this context and your knowledge."
)
ask_time = time.time()
ans = self.llm.ask(prompt, **llm_kwargs)
ask_time = time.time() - ask_time
logger.info(f" Ask time: {ask_time} seconds")
return ans
def start_interactive(self):

View File

@@ -358,7 +358,11 @@ def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
error_msg += f"\n\nModel '{model_name}' was not found in Ollama's library."
if suggestions:
error_msg += "\n\nDid you mean one of these installed models?\n"
error_msg += (
"\n\nDid you mean one of these installed models?\n"
+ "\nTry to use ollama pull to install the model you need\n"
)
for i, suggestion in enumerate(suggestions, 1):
error_msg += f" {i}. {suggestion}\n"
else:
@@ -542,14 +546,41 @@ class HFChat(LLMInterface):
self.device = "cpu"
logger.info("No GPU detected. Using CPU.")
# Load tokenizer and model
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
device_map="auto" if self.device != "cpu" else None,
trust_remote_code=True,
)
# Load tokenizer and model with timeout protection
try:
import signal
def timeout_handler(signum, frame):
raise TimeoutError("Model download/loading timed out")
# Set timeout for model loading (60 seconds)
old_handler = signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(60)
try:
logger.info(f"Loading tokenizer for {model_name}...")
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
logger.info(f"Loading model {model_name}...")
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
device_map="auto" if self.device != "cpu" else None,
trust_remote_code=True,
)
logger.info(f"Successfully loaded {model_name}")
finally:
signal.alarm(0) # Cancel the alarm
signal.signal(signal.SIGALRM, old_handler) # Restore old handler
except TimeoutError:
logger.error(f"Model loading timed out for {model_name}")
raise RuntimeError(
f"Model loading timed out for {model_name}. Please check your internet connection or try a smaller model."
)
except Exception as e:
logger.error(f"Failed to load model {model_name}: {e}")
raise
# Move model to device if not using device_map
if self.device != "cpu" and "device_map" not in str(self.model):

View File

@@ -354,13 +354,21 @@ class EmbeddingServerManager:
self.server_process.terminate()
try:
self.server_process.wait(timeout=5)
self.server_process.wait(timeout=3)
logger.info(f"Server process {self.server_process.pid} terminated.")
except subprocess.TimeoutExpired:
logger.warning(
f"Server process {self.server_process.pid} did not terminate gracefully, killing it."
f"Server process {self.server_process.pid} did not terminate gracefully within 3 seconds, killing it."
)
self.server_process.kill()
try:
self.server_process.wait(timeout=2)
logger.info(f"Server process {self.server_process.pid} killed successfully.")
except subprocess.TimeoutExpired:
logger.error(
f"Failed to kill server process {self.server_process.pid} - it may be hung"
)
# Don't hang indefinitely
# Clean up process resources to prevent resource tracker warnings
try:

View File

@@ -5,11 +5,8 @@ LEANN is a revolutionary vector database that democratizes personal AI. Transfor
## Installation
```bash
# Default installation (HNSW backend, recommended)
# Default installation (includes both HNSW and DiskANN backends)
uv pip install leann
# With DiskANN backend (for large-scale deployments)
uv pip install leann[diskann]
```
## Quick Start
@@ -19,8 +16,8 @@ from leann import LeannBuilder, LeannSearcher, LeannChat
from pathlib import Path
INDEX_PATH = str(Path("./").resolve() / "demo.leann")
# Build an index
builder = LeannBuilder(backend_name="hnsw")
# Build an index (choose backend: "hnsw" or "diskann")
builder = LeannBuilder(backend_name="hnsw") # or "diskann" for large-scale deployments
builder.add_text("LEANN saves 97% storage compared to traditional vector databases.")
builder.add_text("Tung Tung Tung Sahur called—they need their bananacrocodile hybrid back")
builder.build_index(INDEX_PATH)

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann"
version = "0.1.16"
version = "0.2.0"
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
readme = "README.md"
requires-python = ">=3.9"
@@ -24,16 +24,15 @@ classifiers = [
"Programming Language :: Python :: 3.12",
]
# Default installation: core + hnsw
# Default installation: core + hnsw + diskann
dependencies = [
"leann-core>=0.1.0",
"leann-backend-hnsw>=0.1.0",
"leann-backend-diskann>=0.1.0",
]
[project.optional-dependencies]
diskann = [
"leann-backend-diskann>=0.1.0",
]
# All backends now included by default
[project.urls]
Repository = "https://github.com/yichuan-w/LEANN"

12
uv.lock generated
View File

@@ -1650,7 +1650,7 @@ name = "importlib-metadata"
version = "8.7.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "zipp" },
{ name = "zipp", marker = "python_full_version < '3.10'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641 }
wheels = [
@@ -2155,7 +2155,7 @@ wheels = [
[[package]]
name = "leann-backend-diskann"
version = "0.1.15"
version = "0.2.0"
source = { editable = "packages/leann-backend-diskann" }
dependencies = [
{ name = "leann-core" },
@@ -2167,14 +2167,14 @@ dependencies = [
[package.metadata]
requires-dist = [
{ name = "leann-core", specifier = "==0.1.15" },
{ name = "leann-core", specifier = "==0.2.0" },
{ name = "numpy" },
{ name = "protobuf", specifier = ">=3.19.0" },
]
[[package]]
name = "leann-backend-hnsw"
version = "0.1.15"
version = "0.2.0"
source = { editable = "packages/leann-backend-hnsw" }
dependencies = [
{ name = "leann-core" },
@@ -2187,7 +2187,7 @@ dependencies = [
[package.metadata]
requires-dist = [
{ name = "leann-core", specifier = "==0.1.15" },
{ name = "leann-core", specifier = "==0.2.0" },
{ name = "msgpack", specifier = ">=1.0.0" },
{ name = "numpy" },
{ name = "pyzmq", specifier = ">=23.0.0" },
@@ -2195,7 +2195,7 @@ requires-dist = [
[[package]]
name = "leann-core"
version = "0.1.15"
version = "0.2.0"
source = { editable = "packages/leann-core" }
dependencies = [
{ name = "accelerate" },