Compare commits

..

4 Commits

Author SHA1 Message Date
Andy Lee
dec55dc5d1 docs: Specify core contributors and welcome new contributors 2025-08-05 17:05:13 -07:00
Andy Lee
5cc724e661 docs: Change to Contributors and use personal homepage 2025-08-05 17:03:39 -07:00
Andy Lee
57a2ddec3e docs: Use GitHub links for co-contributors and improve order 2025-08-05 14:16:41 -07:00
Andy Lee
c80720c6f2 docs: Update co-contributors with GitHub usernames 2025-08-05 14:14:44 -07:00
19 changed files with 101 additions and 1239 deletions

View File

@@ -6,7 +6,6 @@
<img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+">
<img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License">
<img src="https://img.shields.io/badge/Platform-Linux%20%7C%20macOS-lightgrey" alt="Platform">
<img src="https://img.shields.io/badge/MCP-Native%20Integration-blue?style=flat-square" alt="MCP Integration">
</p>
<h2 align="center" tabindex="-1" class="heading-element" dir="auto">
@@ -17,10 +16,7 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
\* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow. 🔥 Check out [the easy setup →](packages/leann-mcp/README.md)
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
@@ -30,7 +26,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
<img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%">
</p>
> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
> **The numbers speak for themselves:** Index 60 million Wikipedia chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -170,7 +166,7 @@ ollama pull llama3.2:1b
</details>
### Flexible Configuration
### Flexible Configuration
LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.
@@ -189,13 +185,12 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
--force-rebuild # Force rebuild index even if it exists
# Embedding Parameters
--embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small, nomic-embed-text, or mlx-community/multilingual-e5-base-mlx
--embedding-mode MODE # sentence-transformers, openai, mlx, or ollama
--embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small or mlx-community/multilingual-e5-base-mlx
--embedding-mode MODE # sentence-transformers, openai, or mlx
# LLM Parameters (Text generation models)
--llm TYPE # LLM backend: openai, ollama, or hf (default: openai)
--llm-model MODEL # Model name (default: gpt-4o) e.g., gpt-4o-mini, llama3.2:1b, Qwen/Qwen2.5-1.5B-Instruct
--thinking-budget LEVEL # Thinking budget for reasoning models: low/medium/high (supported by o3, o3-mini, GPT-Oss:20b, and other reasoning models)
# Search Parameters
--top-k N # Number of results to retrieve (default: 20)
@@ -223,7 +218,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
<img src="videos/paper_clear.gif" alt="LEANN Document Search Demo" width="600">
</p>
The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:
The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a README in Chinese) and this is the **easiest example** to run here:
```bash
source .venv/bin/activate # Don't forget to activate the virtual environment
@@ -418,26 +413,7 @@ Once the index is built, you can ask questions like:
</details>
### 🚀 Claude Code Integration: Transform Your Development Workflow!
**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
**Key features:**
- 🔍 **Semantic code search** across your entire project
- 📚 **Context-aware assistance** for debugging and development
- 🚀 **Zero-config setup** with automatic language detection
```bash
# Install LEANN globally for MCP integration
uv tool install leann-core
# Setup is automatic - just start using Claude Code!
```
Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
![LEANN MCP Integration](assets/mcp_leann.png)
**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
## 🖥️ Command Line Interface
@@ -451,7 +427,7 @@ source .venv/bin/activate
leann --help
```
**To make it globally available:**
**To make it globally available (recommended for daily use):**
```bash
# Install the LEANN CLI globally using uv tool
uv tool install leann
@@ -460,15 +436,13 @@ uv tool install leann
leann --help
```
> **Note**: Global installation is required for Claude Code integration. The `leann_mcp` server depends on the globally available `leann` command.
### Usage Examples
```bash
# build from a specific directory, and my_docs is the index name
leann build my-docs --docs ./your_documents
# Build an index from documents
leann build my-docs --docs ./documents
# Search your documents
leann search my-docs "machine learning concepts"

View File

@@ -75,7 +75,7 @@ class BaseRAGExample(ABC):
"--embedding-mode",
type=str,
default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx", "ollama"],
choices=["sentence-transformers", "openai", "mlx"],
help="Embedding backend mode (default: sentence-transformers)",
)
@@ -85,7 +85,7 @@ class BaseRAGExample(ABC):
"--llm",
type=str,
default="openai",
choices=["openai", "ollama", "hf", "simulated"],
choices=["openai", "ollama", "hf"],
help="LLM backend to use (default: openai)",
)
llm_group.add_argument(
@@ -100,13 +100,6 @@ class BaseRAGExample(ABC):
default="http://localhost:11434",
help="Host for Ollama API (default: http://localhost:11434)",
)
llm_group.add_argument(
"--thinking-budget",
type=str,
choices=["low", "medium", "high"],
default=None,
help="Thinking budget for reasoning models (low/medium/high). Supported by GPT-Oss:20b and other reasoning models.",
)
# Search parameters
search_group = parser.add_argument_group("Search Parameters")
@@ -235,17 +228,7 @@ class BaseRAGExample(ABC):
if not query:
continue
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if hasattr(args, "thinking_budget") and args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask(
query,
top_k=args.top_k,
complexity=args.search_complexity,
llm_kwargs=llm_kwargs,
)
response = chat.ask(query, top_k=args.top_k, complexity=args.search_complexity)
print(f"\nAssistant: {response}\n")
except KeyboardInterrupt:
@@ -264,15 +247,7 @@ class BaseRAGExample(ABC):
)
print(f"\n[Query]: \033[36m{query}\033[0m")
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if hasattr(args, "thinking_budget") and args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask(
query, top_k=args.top_k, complexity=args.search_complexity, llm_kwargs=llm_kwargs
)
response = chat.ask(query, top_k=args.top_k, complexity=args.search_complexity)
print(f"\n[Response]: \033[36m{response}\033[0m")
async def run(self):

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 224 KiB

View File

@@ -1,123 +0,0 @@
# Thinking Budget Feature Implementation
## Overview
This document describes the implementation of the **thinking budget** feature for LEANN, which allows users to control the computational effort for reasoning models like GPT-Oss:20b.
## Feature Description
The thinking budget feature provides three levels of computational effort for reasoning models:
- **`low`**: Fast responses, basic reasoning (default for simple queries)
- **`medium`**: Balanced speed and reasoning depth
- **`high`**: Maximum reasoning effort, best for complex analytical questions
## Implementation Details
### 1. Command Line Interface
Added `--thinking-budget` parameter to both CLI and RAG examples:
```bash
# LEANN CLI
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high
# RAG Examples
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
python apps/document_rag.py --llm openai --llm-model o3 --thinking-budget medium
```
### 2. LLM Backend Support
#### Ollama Backend (`packages/leann-core/src/leann/chat.py`)
```python
def ask(self, prompt: str, **kwargs) -> str:
# Handle thinking budget for reasoning models
options = kwargs.copy()
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget:
options.pop("thinking_budget", None)
if thinking_budget in ["low", "medium", "high"]:
options["reasoning"] = {"effort": thinking_budget, "exclude": False}
```
**API Format**: Uses Ollama's `reasoning` parameter with `effort` and `exclude` fields.
#### OpenAI Backend (`packages/leann-core/src/leann/chat.py`)
```python
def ask(self, prompt: str, **kwargs) -> str:
# Handle thinking budget for reasoning models
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget and thinking_budget in ["low", "medium", "high"]:
# Check if this is an o-series model
o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
if any(model in self.model for model in o_series_models):
params["reasoning_effort"] = thinking_budget
```
**API Format**: Uses OpenAI's `reasoning_effort` parameter for o-series models.
### 3. Parameter Propagation
The thinking budget parameter is properly propagated through the LEANN architecture:
1. **CLI** (`packages/leann-core/src/leann/cli.py`): Captures `--thinking-budget` argument
2. **Base RAG** (`apps/base_rag_example.py`): Adds parameter to argument parser
3. **LeannChat** (`packages/leann-core/src/leann/api.py`): Passes `llm_kwargs` to LLM
4. **LLM Interface**: Handles the parameter in backend-specific implementations
## Files Modified
### Core Implementation
- `packages/leann-core/src/leann/chat.py`: Added thinking budget support to OllamaChat and OpenAIChat
- `packages/leann-core/src/leann/cli.py`: Added `--thinking-budget` argument
- `apps/base_rag_example.py`: Added thinking budget parameter to RAG examples
### Documentation
- `README.md`: Added thinking budget parameter to usage examples
- `docs/configuration-guide.md`: Added detailed documentation and usage guidelines
### Examples
- `examples/thinking_budget_demo.py`: Comprehensive demo script with usage examples
## Usage Examples
### Basic Usage
```bash
# High reasoning effort for complex questions
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high
# Medium reasoning for balanced performance
leann ask my-index --llm openai --model gpt-4o --thinking-budget medium
# Low reasoning for fast responses
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget low
```
### RAG Examples
```bash
# Email RAG with high reasoning
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
# Document RAG with medium reasoning
python apps/document_rag.py --llm openai --llm-model gpt-4o --thinking-budget medium
```
## Supported Models
### Ollama Models
- **GPT-Oss:20b**: Primary target model with reasoning capabilities
- **Other reasoning models**: Any Ollama model that supports the `reasoning` parameter
### OpenAI Models
- **o3, o3-mini, o4-mini, o1**: o-series reasoning models with `reasoning_effort` parameter
- **GPT-OSS models**: Models that support reasoning capabilities
## Testing
The implementation includes comprehensive testing:
- Parameter handling verification
- Backend-specific API format validation
- CLI argument parsing tests
- Integration with existing LEANN architecture

View File

@@ -49,25 +49,14 @@ Based on our experience developing LEANN, embedding models fall into three categ
- **Cons**: Slower inference, longer index build times
- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
### Quick Start: Cloud and Local Embedding Options
### Quick Start: OpenAI Embeddings (Fastest Setup)
**OpenAI Embeddings (Fastest Setup)**
For immediate testing without local model downloads:
```bash
# Set OpenAI embeddings (requires OPENAI_API_KEY)
--embedding-mode openai --embedding-model text-embedding-3-small
```
**Ollama Embeddings (Privacy-Focused)**
For local embeddings with complete privacy:
```bash
# First, pull an embedding model
ollama pull nomic-embed-text
# Use Ollama embeddings
--embedding-mode ollama --embedding-model nomic-embed-text
```
<details>
<summary><strong>Cloud vs Local Trade-offs</strong></summary>
@@ -114,15 +103,13 @@ ollama pull nomic-embed-text
**OpenAI** (`--llm openai`)
- **Pros**: Best quality, consistent performance, no local resources needed
- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3` (reasoning), `o3-mini` (reasoning, cheaper)
- **Thinking Budget**: Use `--thinking-budget low/medium/high` for o-series reasoning models (o3, o3-mini, o4-mini)
- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
- **Note**: Our current default, but we recommend switching to Ollama for most use cases
**Ollama** (`--llm ollama`)
- **Pros**: Fully local, free, privacy-preserving, good model variety
- **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
- **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
- **Thinking Budget**: Use `--thinking-budget low/medium/high` for reasoning models like GPT-Oss:20b
**HuggingFace** (`--llm hf`)
- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
@@ -164,36 +151,6 @@ ollama pull nomic-embed-text
- LLM processing time ∝ top_k × chunk_size
- Total context = top_k × chunk_size tokens
### Thinking Budget for Reasoning Models
**`--thinking-budget`** (reasoning effort level)
- Controls the computational effort for reasoning models
- Options: `low`, `medium`, `high`
- Guidelines:
- `low`: Fast responses, basic reasoning (default for simple queries)
- `medium`: Balanced speed and reasoning depth
- `high`: Maximum reasoning effort, best for complex analytical questions
- **Supported Models**:
- **Ollama**: `gpt-oss:20b`, `gpt-oss:120b`
- **OpenAI**: `o3`, `o3-mini`, `o4-mini`, `o1` (o-series reasoning models)
- **Note**: Models without reasoning support will show a warning and proceed without reasoning parameters
- **Example**: `--thinking-budget high` for complex analytical questions
**📖 For detailed usage examples and implementation details, check out [Thinking Budget Documentation](THINKING_BUDGET_FEATURE.md)**
**💡 Quick Examples:**
```bash
# OpenAI o-series reasoning model
python apps/document_rag.py --query "What are the main techniques LEANN explores?" \
--index-dir hnswbuild --backend hnsw \
--llm openai --llm-model o3 --thinking-budget medium
# Ollama reasoning model
python apps/document_rag.py --query "What are the main techniques LEANN explores?" \
--index-dir hnswbuild --backend hnsw \
--llm ollama --llm-model gpt-oss:20b --thinking-budget high
```
### Graph Degree (HNSW/DiskANN)
**`--graph-degree`**

View File

@@ -261,7 +261,7 @@ if __name__ == "__main__":
"--embedding-mode",
type=str,
default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx", "ollama"],
choices=["sentence-transformers", "openai", "mlx"],
help="Embedding backend mode",
)
parser.add_argument(

View File

@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-diskann"
version = "0.2.6"
dependencies = ["leann-core==0.2.6", "numpy", "protobuf>=3.19.0"]
version = "0.2.1"
dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
[tool.scikit-build]
# Key: simplified CMake path

View File

@@ -295,7 +295,7 @@ if __name__ == "__main__":
"--embedding-mode",
type=str,
default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx", "ollama"],
choices=["sentence-transformers", "openai", "mlx"],
help="Embedding backend mode",
)

View File

@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-hnsw"
version = "0.2.6"
version = "0.2.1"
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
dependencies = [
"leann-core==0.2.6",
"leann-core==0.2.1",
"numpy",
"pyzmq>=23.0.0",
"msgpack>=1.0.0",

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann-core"
version = "0.2.6"
version = "0.2.1"
description = "Core API and plugin system for LEANN"
readme = "README.md"
requires-python = ">=3.9"
@@ -44,7 +44,6 @@ colab = [
[project.scripts]
leann = "leann.cli:main"
leann_mcp = "leann.mcp:main"
[tool.setuptools.packages.find]
where = ["src"]

View File

@@ -17,12 +17,12 @@ logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def check_ollama_models(host: str) -> list[str]:
def check_ollama_models() -> list[str]:
"""Check available Ollama models and return a list"""
try:
import requests
response = requests.get(f"{host}/api/tags", timeout=5)
response = requests.get("http://localhost:11434/api/tags", timeout=5)
if response.status_code == 200:
data = response.json()
return [model["name"] for model in data.get("models", [])]
@@ -309,12 +309,10 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:
return search_hf_models_fuzzy(query, limit)
def validate_model_and_suggest(
model_name: str, llm_type: str, host: str = "http://localhost:11434"
) -> str | None:
def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
"""Validate model name and provide suggestions if invalid"""
if llm_type == "ollama":
available_models = check_ollama_models(host)
available_models = check_ollama_models()
if available_models and model_name not in available_models:
error_msg = f"Model '{model_name}' not found in your local Ollama installation."
@@ -471,7 +469,7 @@ class OllamaChat(LLMInterface):
requests.get(host)
# Pre-check model availability with helpful suggestions
model_error = validate_model_and_suggest(model, "ollama", host)
model_error = validate_model_and_suggest(model, "ollama")
if model_error:
raise ValueError(model_error)
@@ -491,35 +489,11 @@ class OllamaChat(LLMInterface):
import requests
full_url = f"{self.host}/api/generate"
# Handle thinking budget for reasoning models
options = kwargs.copy()
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget:
# Remove thinking_budget from options as it's not a standard Ollama option
options.pop("thinking_budget", None)
# Only apply reasoning parameters to models that support it
reasoning_supported_models = [
"gpt-oss:20b",
"gpt-oss:120b",
"deepseek-r1",
"deepseek-coder",
]
if thinking_budget in ["low", "medium", "high"]:
if any(model in self.model.lower() for model in reasoning_supported_models):
options["reasoning"] = {"effort": thinking_budget, "exclude": False}
logger.info(f"Applied reasoning effort={thinking_budget} to model {self.model}")
else:
logger.warning(
f"Thinking budget '{thinking_budget}' requested but model '{self.model}' may not support reasoning parameters. Proceeding without reasoning."
)
payload = {
"model": self.model,
"prompt": prompt,
"stream": False, # Keep it simple for now
"options": options,
"options": kwargs,
}
logger.debug(f"Sending request to Ollama: {payload}")
try:
@@ -710,38 +684,11 @@ class OpenAIChat(LLMInterface):
params = {
"model": self.model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": kwargs.get("max_tokens", 1000),
"temperature": kwargs.get("temperature", 0.7),
**{k: v for k, v in kwargs.items() if k not in ["max_tokens", "temperature"]},
}
# Handle max_tokens vs max_completion_tokens based on model
max_tokens = kwargs.get("max_tokens", 1000)
if "o3" in self.model or "o4" in self.model or "o1" in self.model:
# o-series models use max_completion_tokens
params["max_completion_tokens"] = max_tokens
params["temperature"] = 1.0
else:
# Other models use max_tokens
params["max_tokens"] = max_tokens
# Handle thinking budget for reasoning models
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget and thinking_budget in ["low", "medium", "high"]:
# Check if this is an o-series model (partial match for model names)
o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
if any(model in self.model for model in o_series_models):
# Use the correct OpenAI reasoning parameter format
params["reasoning_effort"] = thinking_budget
logger.info(f"Applied reasoning_effort={thinking_budget} to model {self.model}")
else:
logger.warning(
f"Thinking budget '{thinking_budget}' requested but model '{self.model}' may not support reasoning parameters. Proceeding without reasoning."
)
# Add other kwargs (excluding thinking_budget as it's handled above)
for k, v in kwargs.items():
if k not in ["max_tokens", "temperature", "thinking_budget"]:
params[k] = v
logger.info(f"Sending request to OpenAI with model {self.model}")
try:

View File

@@ -41,23 +41,13 @@ def extract_pdf_text_with_pdfplumber(file_path: str) -> str:
class LeannCLI:
def __init__(self):
# Always use project-local .leann directory (like .git)
self.indexes_dir = Path.cwd() / ".leann" / "indexes"
self.indexes_dir = Path.home() / ".leann" / "indexes"
self.indexes_dir.mkdir(parents=True, exist_ok=True)
# Default parser for documents
self.node_parser = SentenceSplitter(
chunk_size=256, chunk_overlap=128, separator=" ", paragraph_separator="\n\n"
)
# Code-optimized parser
self.code_parser = SentenceSplitter(
chunk_size=512, # Larger chunks for code context
chunk_overlap=50, # Less overlap to preserve function boundaries
separator="\n", # Split by lines for code
paragraph_separator="\n\n", # Preserve logical code blocks
)
def get_index_path(self, index_name: str) -> str:
index_dir = self.indexes_dir / index_name
return str(index_dir / "documents.leann")
@@ -74,11 +64,10 @@ class LeannCLI:
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
leann build my-docs --docs ./documents # Build index named my-docs
leann build my-ppts --docs ./ --file-types .pptx,.pdf # Index only PowerPoint and PDF files
leann search my-docs "query" # Search in my-docs index
leann ask my-docs "question" # Ask my-docs index
leann list # List all stored indexes
leann build my-docs --docs ./documents # Build index named my-docs
leann search my-docs "query" # Search in my-docs index
leann ask my-docs "question" # Ask my-docs index
leann list # List all stored indexes
""",
)
@@ -86,34 +75,18 @@ Examples:
# Build command
build_parser = subparsers.add_parser("build", help="Build document index")
build_parser.add_argument(
"index_name", nargs="?", help="Index name (default: current directory name)"
)
build_parser.add_argument(
"--docs", type=str, default=".", help="Documents directory (default: current directory)"
)
build_parser.add_argument("index_name", help="Index name")
build_parser.add_argument("--docs", type=str, required=True, help="Documents directory")
build_parser.add_argument(
"--backend", type=str, default="hnsw", choices=["hnsw", "diskann"]
)
build_parser.add_argument("--embedding-model", type=str, default="facebook/contriever")
build_parser.add_argument(
"--embedding-mode",
type=str,
default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode (default: sentence-transformers)",
)
build_parser.add_argument("--force", "-f", action="store_true", help="Force rebuild")
build_parser.add_argument("--graph-degree", type=int, default=32)
build_parser.add_argument("--complexity", type=int, default=64)
build_parser.add_argument("--num-threads", type=int, default=1)
build_parser.add_argument("--compact", action="store_true", default=True)
build_parser.add_argument("--recompute", action="store_true", default=True)
build_parser.add_argument(
"--file-types",
type=str,
help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
)
# Search command
search_parser = subparsers.add_parser("search", help="Search documents")
@@ -123,12 +96,7 @@ Examples:
search_parser.add_argument("--complexity", type=int, default=64)
search_parser.add_argument("--beam-width", type=int, default=1)
search_parser.add_argument("--prune-ratio", type=float, default=0.0)
search_parser.add_argument(
"--recompute-embeddings",
action="store_true",
default=True,
help="Recompute embeddings (default: True)",
)
search_parser.add_argument("--recompute-embeddings", action="store_true")
search_parser.add_argument(
"--pruning-strategy",
choices=["global", "local", "proportional"],
@@ -151,376 +119,94 @@ Examples:
ask_parser.add_argument("--complexity", type=int, default=32)
ask_parser.add_argument("--beam-width", type=int, default=1)
ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
ask_parser.add_argument(
"--recompute-embeddings",
action="store_true",
default=True,
help="Recompute embeddings (default: True)",
)
ask_parser.add_argument("--recompute-embeddings", action="store_true")
ask_parser.add_argument(
"--pruning-strategy",
choices=["global", "local", "proportional"],
default="global",
)
ask_parser.add_argument(
"--thinking-budget",
type=str,
choices=["low", "medium", "high"],
default=None,
help="Thinking budget for reasoning models (low/medium/high). Supported by GPT-Oss:20b and other reasoning models.",
)
# List command
subparsers.add_parser("list", help="List all indexes")
return parser
def register_project_dir(self):
"""Register current project directory in global registry"""
global_registry = Path.home() / ".leann" / "projects.json"
global_registry.parent.mkdir(exist_ok=True)
current_dir = str(Path.cwd())
# Load existing registry
projects = []
if global_registry.exists():
try:
import json
with open(global_registry) as f:
projects = json.load(f)
except Exception:
projects = []
# Add current directory if not already present
if current_dir not in projects:
projects.append(current_dir)
# Save registry
import json
with open(global_registry, "w") as f:
json.dump(projects, f, indent=2)
def _read_gitignore_patterns(self, docs_dir: str) -> list[str]:
"""Read .gitignore file and return patterns for exclusion."""
gitignore_path = Path(docs_dir) / ".gitignore"
patterns = []
# Add some essential patterns that should always be excluded
essential_patterns = [
".git",
".DS_Store",
]
patterns.extend(essential_patterns)
if gitignore_path.exists():
try:
with open(gitignore_path, encoding="utf-8") as f:
for line in f:
line = line.strip()
# Skip empty lines and comments
if line and not line.startswith("#"):
# Remove leading slash if present (make it relative)
if line.startswith("/"):
line = line[1:]
patterns.append(line)
print(
f"📋 Loaded {len(patterns) - len(essential_patterns)} patterns from .gitignore"
)
except Exception as e:
print(f"Warning: Could not read .gitignore: {e}")
else:
print("📋 No .gitignore found, using minimal exclusion patterns")
return patterns
def _should_exclude_file(self, relative_path: Path, exclude_patterns: list[str]) -> bool:
"""Check if a file should be excluded based on gitignore-style patterns."""
path_str = str(relative_path)
for pattern in exclude_patterns:
# Simple pattern matching (could be enhanced with full gitignore syntax)
if pattern.endswith("*"):
# Wildcard pattern
prefix = pattern[:-1]
if path_str.startswith(prefix):
return True
elif "*" in pattern:
# Contains wildcard - simple glob-like matching
import fnmatch
if fnmatch.fnmatch(path_str, pattern):
return True
else:
# Exact match or directory match
if path_str == pattern or path_str.startswith(pattern + "/"):
return True
return False
def list_indexes(self):
print("Stored LEANN indexes:")
# Get all project directories with .leann
global_registry = Path.home() / ".leann" / "projects.json"
all_projects = []
if global_registry.exists():
try:
import json
with open(global_registry) as f:
all_projects = json.load(f)
except Exception:
pass
# Filter to only existing directories with .leann
valid_projects = []
for project_dir in all_projects:
project_path = Path(project_dir)
if project_path.exists() and (project_path / ".leann" / "indexes").exists():
valid_projects.append(project_path)
# Add current project if it has .leann but not in registry
current_path = Path.cwd()
if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects:
valid_projects.append(current_path)
if not valid_projects:
if not self.indexes_dir.exists():
print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
return
total_indexes = 0
current_dir = Path.cwd()
index_dirs = [d for d in self.indexes_dir.iterdir() if d.is_dir()]
for project_path in valid_projects:
indexes_dir = project_path / ".leann" / "indexes"
if not indexes_dir.exists():
continue
if not index_dirs:
print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
return
index_dirs = [d for d in indexes_dir.iterdir() if d.is_dir()]
if not index_dirs:
continue
print(f"Found {len(index_dirs)} indexes:")
for i, index_dir in enumerate(index_dirs, 1):
index_name = index_dir.name
status = "" if self.index_exists(index_name) else ""
# Show project header
if project_path == current_dir:
print(f"\n📁 Current project ({project_path}):")
else:
print(f"\n📂 {project_path}:")
print(f" {i}. {index_name} [{status}]")
if self.index_exists(index_name):
index_dir / "documents.leann.meta.json"
size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / (
1024 * 1024
)
print(f" Size: {size_mb:.1f} MB")
for index_dir in index_dirs:
total_indexes += 1
index_name = index_dir.name
meta_file = index_dir / "documents.leann.meta.json"
status = "" if meta_file.exists() else ""
if index_dirs:
example_name = index_dirs[0].name
print("\nUsage:")
print(f' leann search {example_name} "your query"')
print(f" leann ask {example_name} --interactive")
print(f" {total_indexes}. {index_name} [{status}]")
if status == "":
size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / (
1024 * 1024
)
print(f" Size: {size_mb:.1f} MB")
if total_indexes > 0:
print(f"\nTotal: {total_indexes} indexes across {len(valid_projects)} projects")
print("\nUsage (current project only):")
# Show example from current project
current_indexes_dir = current_dir / ".leann" / "indexes"
if current_indexes_dir.exists():
current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()]
if current_index_dirs:
example_name = current_index_dirs[0].name
print(f' leann search {example_name} "your query"')
print(f" leann ask {example_name} --interactive")
def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
def load_documents(self, docs_dir: str):
print(f"Loading documents from {docs_dir}...")
if custom_file_types:
print(f"Using custom file types: {custom_file_types}")
# Read .gitignore patterns first
exclude_patterns = self._read_gitignore_patterns(docs_dir)
# Try to use better PDF parsers first, but only if PDFs are requested
# Try to use better PDF parsers first
documents = []
docs_path = Path(docs_dir)
# Check if we should process PDFs
should_process_pdfs = custom_file_types is None or ".pdf" in custom_file_types
for file_path in docs_path.rglob("*.pdf"):
print(f"Processing PDF: {file_path}")
if should_process_pdfs:
for file_path in docs_path.rglob("*.pdf"):
# Check if file matches any exclude pattern
relative_path = file_path.relative_to(docs_path)
if self._should_exclude_file(relative_path, exclude_patterns):
continue
# Try PyMuPDF first (best quality)
text = extract_pdf_text_with_pymupdf(str(file_path))
if text is None:
# Try pdfplumber
text = extract_pdf_text_with_pdfplumber(str(file_path))
print(f"Processing PDF: {file_path}")
if text:
# Create a simple document structure
from llama_index.core import Document
# Try PyMuPDF first (best quality)
text = extract_pdf_text_with_pymupdf(str(file_path))
if text is None:
# Try pdfplumber
text = extract_pdf_text_with_pdfplumber(str(file_path))
if text:
# Create a simple document structure
from llama_index.core import Document
doc = Document(text=text, metadata={"source": str(file_path)})
documents.append(doc)
else:
# Fallback to default reader
print(f"Using default reader for {file_path}")
try:
default_docs = SimpleDirectoryReader(
str(file_path.parent),
filename_as_id=True,
required_exts=[file_path.suffix],
).load_data()
documents.extend(default_docs)
except Exception as e:
print(f"Warning: Could not process {file_path}: {e}")
doc = Document(text=text, metadata={"source": str(file_path)})
documents.append(doc)
else:
# Fallback to default reader
print(f"Using default reader for {file_path}")
default_docs = SimpleDirectoryReader(
str(file_path.parent),
filename_as_id=True,
required_exts=[file_path.suffix],
).load_data()
documents.extend(default_docs)
# Load other file types with default reader
if custom_file_types:
# Parse custom file types from comma-separated string
code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
# Ensure extensions start with a dot
code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
else:
# Use default supported file types
code_extensions = [
# Original document types
".txt",
".md",
".docx",
".pptx",
# Code files for Claude Code integration
".py",
".js",
".ts",
".jsx",
".tsx",
".java",
".cpp",
".c",
".h",
".hpp",
".cs",
".go",
".rs",
".rb",
".php",
".swift",
".kt",
".scala",
".r",
".sql",
".sh",
".bash",
".zsh",
".fish",
".ps1",
".bat",
# Config and markup files
".json",
".yaml",
".yml",
".xml",
".toml",
".ini",
".cfg",
".conf",
".html",
".css",
".scss",
".less",
".vue",
".svelte",
# Data science
".ipynb",
".R",
".py",
".jl",
]
# Try to load other file types, but don't fail if none are found
try:
other_docs = SimpleDirectoryReader(
docs_dir,
recursive=True,
encoding="utf-8",
required_exts=code_extensions,
exclude=exclude_patterns,
).load_data(show_progress=True)
documents.extend(other_docs)
except ValueError as e:
if "No files found" in str(e):
print("No additional files found for other supported types.")
else:
raise e
other_docs = SimpleDirectoryReader(
docs_dir,
recursive=True,
encoding="utf-8",
required_exts=[".txt", ".md", ".docx"],
).load_data(show_progress=True)
documents.extend(other_docs)
all_texts = []
# Define code file extensions for intelligent chunking
code_file_exts = {
".py",
".js",
".ts",
".jsx",
".tsx",
".java",
".cpp",
".c",
".h",
".hpp",
".cs",
".go",
".rs",
".rb",
".php",
".swift",
".kt",
".scala",
".r",
".sql",
".sh",
".bash",
".zsh",
".fish",
".ps1",
".bat",
".json",
".yaml",
".yml",
".xml",
".toml",
".ini",
".cfg",
".conf",
".html",
".css",
".scss",
".less",
".vue",
".svelte",
".ipynb",
".R",
".jl",
}
for doc in documents:
# Check if this is a code file based on source path
source_path = doc.metadata.get("source", "")
is_code_file = any(source_path.endswith(ext) for ext in code_file_exts)
# Use appropriate parser based on file type
parser = self.code_parser if is_code_file else self.node_parser
nodes = parser.get_nodes_from_documents([doc])
nodes = self.node_parser.get_nodes_from_documents([doc])
for node in nodes:
all_texts.append(node.get_content())
@@ -529,23 +215,15 @@ Examples:
async def build_index(self, args):
docs_dir = args.docs
# Use current directory name if index_name not provided
if args.index_name:
index_name = args.index_name
else:
index_name = Path.cwd().name
print(f"Using current directory name as index: '{index_name}'")
index_name = args.index_name
index_dir = self.indexes_dir / index_name
index_path = self.get_index_path(index_name)
print(f"📂 Indexing: {Path(docs_dir).resolve()}")
if index_dir.exists() and not args.force:
print(f"Index '{index_name}' already exists. Use --force to rebuild.")
return
all_texts = self.load_documents(docs_dir, args.file_types)
all_texts = self.load_documents(docs_dir)
if not all_texts:
print("No documents found")
return
@@ -557,7 +235,6 @@ Examples:
builder = LeannBuilder(
backend_name=args.backend,
embedding_model=args.embedding_model,
embedding_mode=args.embedding_mode,
graph_degree=args.graph_degree,
complexity=args.complexity,
is_compact=args.compact,
@@ -571,9 +248,6 @@ Examples:
builder.build_index(index_path)
print(f"Index built at {index_path}")
# Register this project directory in global registry
self.register_project_dir()
async def search_documents(self, args):
index_name = args.index_name
query = args.query
@@ -634,11 +308,6 @@ Examples:
if not user_input:
continue
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask(
user_input,
top_k=args.top_k,
@@ -647,17 +316,11 @@ Examples:
prune_ratio=args.prune_ratio,
recompute_embeddings=args.recompute_embeddings,
pruning_strategy=args.pruning_strategy,
llm_kwargs=llm_kwargs,
)
print(f"LEANN: {response}")
else:
query = input("Enter your question: ").strip()
if query:
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask(
query,
top_k=args.top_k,
@@ -666,7 +329,6 @@ Examples:
prune_ratio=args.prune_ratio,
recompute_embeddings=args.recompute_embeddings,
pruning_strategy=args.pruning_strategy,
llm_kwargs=llm_kwargs,
)
print(f"LEANN: {response}")

View File

@@ -6,7 +6,6 @@ Preserves all optimization parameters to ensure performance
import logging
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Any
import numpy as np
@@ -36,7 +35,7 @@ def compute_embeddings(
Args:
texts: List of texts to compute embeddings for
model_name: Model name
mode: Computation mode ('sentence-transformers', 'openai', 'mlx', 'ollama')
mode: Computation mode ('sentence-transformers', 'openai', 'mlx')
is_build: Whether this is a build operation (shows progress bar)
batch_size: Batch size for processing
adaptive_optimization: Whether to use adaptive optimization based on batch size
@@ -56,8 +55,6 @@ def compute_embeddings(
return compute_embeddings_openai(texts, model_name)
elif mode == "mlx":
return compute_embeddings_mlx(texts, model_name)
elif mode == "ollama":
return compute_embeddings_ollama(texts, model_name, is_build=is_build)
else:
raise ValueError(f"Unsupported embedding mode: {mode}")
@@ -368,262 +365,3 @@ def compute_embeddings_mlx(chunks: list[str], model_name: str, batch_size: int =
# Stack numpy arrays
return np.stack(all_embeddings)
def compute_embeddings_ollama(
texts: list[str], model_name: str, is_build: bool = False, host: str = "http://localhost:11434"
) -> np.ndarray:
"""
Compute embeddings using Ollama API.
Args:
texts: List of texts to compute embeddings for
model_name: Ollama model name (e.g., "nomic-embed-text", "mxbai-embed-large")
is_build: Whether this is a build operation (shows progress bar)
host: Ollama host URL (default: http://localhost:11434)
Returns:
Normalized embeddings array, shape: (len(texts), embedding_dim)
"""
try:
import requests
except ImportError:
raise ImportError(
"The 'requests' library is required for Ollama embeddings. Install with: uv pip install requests"
)
if not texts:
raise ValueError("Cannot compute embeddings for empty text list")
logger.info(
f"Computing embeddings for {len(texts)} texts using Ollama API, model: '{model_name}'"
)
# Check if Ollama is running
try:
response = requests.get(f"{host}/api/version", timeout=5)
response.raise_for_status()
except requests.exceptions.ConnectionError:
error_msg = (
f"❌ Could not connect to Ollama at {host}.\n\n"
"Please ensure Ollama is running:\n"
" • macOS/Linux: ollama serve\n"
" • Windows: Make sure Ollama is running in the system tray\n\n"
"Installation: https://ollama.com/download"
)
raise RuntimeError(error_msg)
except Exception as e:
raise RuntimeError(f"Unexpected error connecting to Ollama: {e}")
# Check if model exists and provide helpful suggestions
try:
response = requests.get(f"{host}/api/tags", timeout=5)
response.raise_for_status()
models = response.json()
model_names = [model["name"] for model in models.get("models", [])]
# Filter for embedding models (models that support embeddings)
embedding_models = []
suggested_embedding_models = [
"nomic-embed-text",
"mxbai-embed-large",
"bge-m3",
"all-minilm",
"snowflake-arctic-embed",
]
for model in model_names:
# Check if it's an embedding model (by name patterns or known models)
base_name = model.split(":")[0]
if any(emb in base_name for emb in ["embed", "bge", "minilm", "e5"]):
embedding_models.append(model)
# Check if model exists (handle versioned names)
model_found = any(
model_name == name.split(":")[0] or model_name == name for name in model_names
)
if not model_found:
error_msg = f"❌ Model '{model_name}' not found in local Ollama.\n\n"
# Suggest pulling the model
error_msg += "📦 To install this embedding model:\n"
error_msg += f" ollama pull {model_name}\n\n"
# Show available embedding models
if embedding_models:
error_msg += "✅ Available embedding models:\n"
for model in embedding_models[:5]:
error_msg += f"{model}\n"
if len(embedding_models) > 5:
error_msg += f" ... and {len(embedding_models) - 5} more\n"
else:
error_msg += "💡 Popular embedding models to install:\n"
for model in suggested_embedding_models[:3]:
error_msg += f" • ollama pull {model}\n"
error_msg += "\n📚 Browse more: https://ollama.com/library"
raise ValueError(error_msg)
# Verify the model supports embeddings by testing it
try:
test_response = requests.post(
f"{host}/api/embeddings", json={"model": model_name, "prompt": "test"}, timeout=10
)
if test_response.status_code != 200:
error_msg = (
f"⚠️ Model '{model_name}' exists but may not support embeddings.\n\n"
f"Please use an embedding model like:\n"
)
for model in suggested_embedding_models[:3]:
error_msg += f"{model}\n"
raise ValueError(error_msg)
except requests.exceptions.RequestException:
# If test fails, continue anyway - model might still work
pass
except requests.exceptions.RequestException as e:
logger.warning(f"Could not verify model existence: {e}")
# Process embeddings with optimized concurrent processing
import requests
def get_single_embedding(text_idx_tuple):
"""Helper function to get embedding for a single text."""
text, idx = text_idx_tuple
max_retries = 3
retry_count = 0
# Truncate very long texts to avoid API issues
truncated_text = text[:8000] if len(text) > 8000 else text
while retry_count < max_retries:
try:
response = requests.post(
f"{host}/api/embeddings",
json={"model": model_name, "prompt": truncated_text},
timeout=30,
)
response.raise_for_status()
result = response.json()
embedding = result.get("embedding")
if embedding is None:
raise ValueError(f"No embedding returned for text {idx}")
return idx, embedding
except requests.exceptions.Timeout:
retry_count += 1
if retry_count >= max_retries:
logger.warning(f"Timeout for text {idx} after {max_retries} retries")
return idx, None
except Exception as e:
if retry_count >= max_retries - 1:
logger.error(f"Failed to get embedding for text {idx}: {e}")
return idx, None
retry_count += 1
return idx, None
# Determine if we should use concurrent processing
use_concurrent = (
len(texts) > 5 and not is_build
) # Don't use concurrent in build mode to avoid overwhelming
max_workers = min(4, len(texts)) # Limit concurrent requests to avoid overwhelming Ollama
all_embeddings = [None] * len(texts) # Pre-allocate list to maintain order
failed_indices = []
if use_concurrent:
logger.info(
f"Using concurrent processing with {max_workers} workers for {len(texts)} texts"
)
with ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all tasks
future_to_idx = {
executor.submit(get_single_embedding, (text, idx)): idx
for idx, text in enumerate(texts)
}
# Add progress bar for concurrent processing
try:
if is_build or len(texts) > 10:
from tqdm import tqdm
futures_iterator = tqdm(
as_completed(future_to_idx),
total=len(texts),
desc="Computing Ollama embeddings",
)
else:
futures_iterator = as_completed(future_to_idx)
except ImportError:
futures_iterator = as_completed(future_to_idx)
# Collect results as they complete
for future in futures_iterator:
try:
idx, embedding = future.result()
if embedding is not None:
all_embeddings[idx] = embedding
else:
failed_indices.append(idx)
except Exception as e:
idx = future_to_idx[future]
logger.error(f"Exception for text {idx}: {e}")
failed_indices.append(idx)
else:
# Sequential processing with progress bar
show_progress = is_build or len(texts) > 10
try:
if show_progress:
from tqdm import tqdm
iterator = tqdm(
enumerate(texts), total=len(texts), desc="Computing Ollama embeddings"
)
else:
iterator = enumerate(texts)
except ImportError:
iterator = enumerate(texts)
for idx, text in iterator:
result_idx, embedding = get_single_embedding((text, idx))
if embedding is not None:
all_embeddings[idx] = embedding
else:
failed_indices.append(idx)
# Handle failed embeddings
if failed_indices:
if len(failed_indices) == len(texts):
raise RuntimeError("Failed to compute any embeddings")
logger.warning(f"Failed to compute embeddings for {len(failed_indices)}/{len(texts)} texts")
# Use zero embeddings as fallback for failed ones
valid_embedding = next((e for e in all_embeddings if e is not None), None)
if valid_embedding:
embedding_dim = len(valid_embedding)
for idx in failed_indices:
all_embeddings[idx] = [0.0] * embedding_dim
# Remove None values and convert to numpy array
all_embeddings = [e for e in all_embeddings if e is not None]
# Convert to numpy array and normalize
embeddings = np.array(all_embeddings, dtype=np.float32)
# Normalize embeddings (L2 normalization)
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
embeddings = embeddings / (norms + 1e-8) # Add small epsilon to avoid division by zero
logger.info(f"Generated {len(embeddings)} embeddings, dimension: {embeddings.shape[1]}")
return embeddings

View File

@@ -1,176 +0,0 @@
#!/usr/bin/env python3
import json
import subprocess
import sys
def handle_request(request):
if request.get("method") == "initialize":
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"capabilities": {"tools": {}},
"protocolVersion": "2024-11-05",
"serverInfo": {"name": "leann-mcp", "version": "1.0.0"},
},
}
elif request.get("method") == "tools/list":
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"tools": [
{
"name": "leann_search",
"description": """🔍 Search code using natural language - like having a coding assistant who knows your entire codebase!
🎯 **Perfect for**:
- "How does authentication work?" → finds auth-related code
- "Error handling patterns" → locates try-catch blocks and error logic
- "Database connection setup" → finds DB initialization code
- "API endpoint definitions" → locates route handlers
- "Configuration management" → finds config files and usage
💡 **Pro tip**: Use this before making any changes to understand existing patterns and conventions.""",
"inputSchema": {
"type": "object",
"properties": {
"index_name": {
"type": "string",
"description": "Name of the LEANN index to search. Use 'leann_list' first to see available indexes.",
},
"query": {
"type": "string",
"description": "Search query - can be natural language (e.g., 'how to handle errors') or technical terms (e.g., 'async function definition')",
},
"top_k": {
"type": "integer",
"default": 5,
"minimum": 1,
"maximum": 20,
"description": "Number of search results to return. Use 5-10 for focused results, 15-20 for comprehensive exploration.",
},
"complexity": {
"type": "integer",
"default": 32,
"minimum": 16,
"maximum": 128,
"description": "Search complexity level. Use 16-32 for fast searches (recommended), 64+ for higher precision when needed.",
},
},
"required": ["index_name", "query"],
},
},
{
"name": "leann_status",
"description": "📊 Check the health and stats of your code indexes - like a medical checkup for your codebase knowledge!",
"inputSchema": {
"type": "object",
"properties": {
"index_name": {
"type": "string",
"description": "Optional: Name of specific index to check. If not provided, shows status of all indexes.",
}
},
},
},
{
"name": "leann_list",
"description": "📋 Show all your indexed codebases - your personal code library! Use this to see what's available for search.",
"inputSchema": {"type": "object", "properties": {}},
},
]
},
}
elif request.get("method") == "tools/call":
tool_name = request["params"]["name"]
args = request["params"].get("arguments", {})
try:
if tool_name == "leann_search":
# Validate required parameters
if not args.get("index_name") or not args.get("query"):
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"content": [
{
"type": "text",
"text": "Error: Both index_name and query are required",
}
]
},
}
# Build simplified command
cmd = [
"leann",
"search",
args["index_name"],
args["query"],
f"--top-k={args.get('top_k', 5)}",
f"--complexity={args.get('complexity', 32)}",
]
result = subprocess.run(cmd, capture_output=True, text=True)
elif tool_name == "leann_status":
if args.get("index_name"):
# Check specific index status - for now, we'll use leann list and filter
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
# We could enhance this to show more detailed status per index
else:
# Show all indexes status
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
elif tool_name == "leann_list":
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"content": [
{
"type": "text",
"text": result.stdout
if result.returncode == 0
else f"Error: {result.stderr}",
}
]
},
}
except Exception as e:
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"error": {"code": -1, "message": str(e)},
}
def main():
for line in sys.stdin:
try:
request = json.loads(line.strip())
response = handle_request(request)
if response:
print(json.dumps(response))
sys.stdout.flush()
except Exception as e:
error_response = {
"jsonrpc": "2.0",
"id": None,
"error": {"code": -1, "message": str(e)},
}
print(json.dumps(error_response))
sys.stdout.flush()
if __name__ == "__main__":
main()

View File

@@ -1,91 +0,0 @@
# 🔥 LEANN Claude Code Integration
Transform your development workflow with intelligent code assistance using LEANN's semantic search directly in Claude Code.
## Prerequisites
**Step 1:** First, complete the basic LEANN installation following the [📦 Installation guide](../../README.md#installation) in the root README:
```bash
uv venv
source .venv/bin/activate
uv pip install leann
```
**Step 2:** Install LEANN globally for MCP integration:
```bash
uv tool install leann-core
```
This makes the `leann` command available system-wide, which `leann_mcp` requires.
## 🚀 Quick Setup
Add the LEANN MCP server to Claude Code:
```bash
claude mcp add leann-server -- leann_mcp
```
## 🛠️ Available Tools
Once connected, you'll have access to these powerful semantic search tools in Claude Code:
- **`leann_list`** - List all available indexes across your projects
- **`leann_search`** - Perform semantic searches across code and documents
- **`leann_ask`** - Ask natural language questions and get AI-powered answers from your codebase
## 🎯 Quick Start Example
```bash
# Build an index for your project (change to your actual path)
leann build my-project --docs ./
# Start Claude Code
claude
```
**Try this in Claude Code:**
```
Help me understand this codebase. List available indexes and search for authentication patterns.
```
<p align="center">
<img src="../../assets/claude_code_leann.png" alt="LEANN in Claude Code" width="80%">
</p>
## 🧠 How It Works
The integration consists of three key components working seamlessly together:
- **`leann`** - Core CLI tool for indexing and searching (installed globally via `uv tool install`)
- **`leann_mcp`** - MCP server that wraps `leann` commands for Claude Code integration
- **Claude Code** - Calls `leann_mcp`, which executes `leann` commands and returns intelligent results
## 📁 File Support
LEANN understands **30+ file types** including:
- **Programming**: Python, JavaScript, TypeScript, Java, Go, Rust, C++, C#
- **Data**: SQL, YAML, JSON, CSV, XML
- **Documentation**: Markdown, TXT, PDF
- **And many more!**
## 💾 Storage & Organization
- **Project indexes**: Stored in `.leann/` directory (just like `.git`)
- **Global registry**: Project tracking at `~/.leann/projects.json`
- **Multi-project support**: Switch between different codebases seamlessly
- **Portable**: Transfer indexes between machines with minimal overhead
## 🗑️ Uninstalling
To remove the LEANN MCP server from Claude Code:
```bash
claude mcp remove leann-server
```
To remove LEANN
```
uv pip uninstall leann leann-backend-hnsw leann-core
```

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann"
version = "0.2.6"
version = "0.2.1"
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
readme = "README.md"
requires-python = ">=3.9"

10
uv.lock generated
View File

@@ -2155,7 +2155,7 @@ wheels = [
[[package]]
name = "leann-backend-diskann"
version = "0.2.1"
version = "0.2.0"
source = { editable = "packages/leann-backend-diskann" }
dependencies = [
{ name = "leann-core" },
@@ -2167,14 +2167,14 @@ dependencies = [
[package.metadata]
requires-dist = [
{ name = "leann-core", specifier = "==0.2.1" },
{ name = "leann-core", specifier = "==0.2.0" },
{ name = "numpy" },
{ name = "protobuf", specifier = ">=3.19.0" },
]
[[package]]
name = "leann-backend-hnsw"
version = "0.2.1"
version = "0.2.0"
source = { editable = "packages/leann-backend-hnsw" }
dependencies = [
{ name = "leann-core" },
@@ -2187,7 +2187,7 @@ dependencies = [
[package.metadata]
requires-dist = [
{ name = "leann-core", specifier = "==0.2.1" },
{ name = "leann-core", specifier = "==0.2.0" },
{ name = "msgpack", specifier = ">=1.0.0" },
{ name = "numpy" },
{ name = "pyzmq", specifier = ">=23.0.0" },
@@ -2195,7 +2195,7 @@ requires-dist = [
[[package]]
name = "leann-core"
version = "0.2.1"
version = "0.2.0"
source = { editable = "packages/leann-core" }
dependencies = [
{ name = "accelerate" },