diff --git a/README.md b/README.md
index 13bae5d..c2af034 100755
--- a/README.md
+++ b/README.md
@@ -6,6 +6,7 @@
+
@@ -16,9 +17,10 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#️-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
-**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
+**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
-> **🚀 Claude Code Integration!** LEANN now provides native MCP integration for Claude Code users. Index your codebase and get intelligent code assistance directly in Claude Code. [Setup Guide →](packages/leann-mcp/README.md)
+
+\* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow. 🔥 Check out [the easy setup →](packages/leann-mcp/README.md)
@@ -28,7 +30,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
-> **The numbers speak for themselves:** Index 60 million Wikipedia chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
+> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -95,7 +97,6 @@ uv sync
-
## Quick Start
Our declarative API makes RAG as easy as writing a config file.
@@ -187,8 +188,8 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
--force-rebuild # Force rebuild index even if it exists
# Embedding Parameters
---embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small or mlx-community/multilingual-e5-base-mlx
---embedding-mode MODE # sentence-transformers, openai, or mlx
+--embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small, nomic-embed-text, or mlx-community/multilingual-e5-base-mlx
+--embedding-mode MODE # sentence-transformers, openai, mlx, or ollama
# LLM Parameters (Text generation models)
--llm TYPE # LLM backend: openai, ollama, or hf (default: openai)
@@ -221,7 +222,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
-The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a README in Chinese) and this is the **easiest example** to run here:
+The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:
```bash
source .venv/bin/activate # Don't forget to activate the virtual environment
@@ -416,7 +417,26 @@ Once the index is built, you can ask questions like:
+### 🚀 Claude Code Integration: Transform Your Development Workflow!
+**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
+
+**Key features:**
+- 🔍 **Semantic code search** across your entire project
+- 📚 **Context-aware assistance** for debugging and development
+- 🚀 **Zero-config setup** with automatic language detection
+
+```bash
+# Install LEANN globally for MCP integration
+uv tool install leann-core
+
+# Setup is automatic - just start using Claude Code!
+```
+Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
+
+
+
+**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
## 🖥️ Command Line Interface
@@ -446,11 +466,8 @@ leann --help
### Usage Examples
```bash
-# Build an index from current directory (default)
-leann build my-docs
-
-# Or from specific directory
-leann build my-docs --docs ./documents
+# build from a specific directory, and my_docs is the index name
+leann build my-docs --docs ./your_documents
# Search your documents
leann search my-docs "machine learning concepts"
diff --git a/apps/base_rag_example.py b/apps/base_rag_example.py
index f5a481c..4bd62b9 100644
--- a/apps/base_rag_example.py
+++ b/apps/base_rag_example.py
@@ -75,7 +75,7 @@ class BaseRAGExample(ABC):
"--embedding-mode",
type=str,
default="sentence-transformers",
- choices=["sentence-transformers", "openai", "mlx"],
+ choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode (default: sentence-transformers)",
)
@@ -85,7 +85,7 @@ class BaseRAGExample(ABC):
"--llm",
type=str,
default="openai",
- choices=["openai", "ollama", "hf"],
+ choices=["openai", "ollama", "hf", "simulated"],
help="LLM backend to use (default: openai)",
)
llm_group.add_argument(
diff --git a/assets/mcp_leann.png b/assets/mcp_leann.png
new file mode 100644
index 0000000..de5ed04
Binary files /dev/null and b/assets/mcp_leann.png differ
diff --git a/docs/claude-code-integration.md b/docs/claude-code-integration.md
deleted file mode 100644
index e19adfb..0000000
--- a/docs/claude-code-integration.md
+++ /dev/null
@@ -1,150 +0,0 @@
-# Claude Code x LEANN 集成指南
-
-## ✅ 现状:已经可以工作!
-
-好消息:LEANN CLI已经完全可以在Claude Code中使用,无需任何修改!
-
-## 🚀 立即开始
-
-### 1. 激活环境
-```bash
-# 在LEANN项目目录下
-source .venv/bin/activate.fish # fish shell
-# 或
-source .venv/bin/activate # bash shell
-```
-
-### 2. 基本命令
-
-#### 查看现有索引
-```bash
-leann list
-```
-
-#### 搜索文档
-```bash
-leann search my-docs "machine learning" --recompute-embeddings
-```
-
-#### 问答对话
-```bash
-echo "What is machine learning?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
-```
-
-#### 构建新索引
-```bash
-leann build project-docs --docs ./src --recompute-embeddings
-```
-
-## 💡 Claude Code 使用技巧
-
-### 在Claude Code中直接使用
-
-1. **激活环境**:
- ```bash
- cd /Users/andyl/Projects/LEANN-RAG
- source .venv/bin/activate.fish
- ```
-
-2. **搜索代码库**:
- ```bash
- leann search my-docs "authentication patterns" --recompute-embeddings --top-k 10
- ```
-
-3. **智能问答**:
- ```bash
- echo "How does the authentication system work?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
- ```
-
-### 批量操作示例
-
-```bash
-# 构建项目文档索引
-leann build project-docs --docs ./docs --force
-
-# 搜索多个关键词
-leann search project-docs "API authentication" --recompute-embeddings
-leann search project-docs "database schema" --recompute-embeddings
-leann search project-docs "deployment guide" --recompute-embeddings
-
-# 问答模式
-echo "What are the API endpoints?" | leann ask project-docs --recompute-embeddings
-```
-
-## 🎯 Claude 可以立即执行的工作流
-
-### 代码分析工作流
-```bash
-# 1. 构建代码库索引
-leann build codebase --docs ./src --backend hnsw --recompute-embeddings
-
-# 2. 分析架构
-echo "What is the overall architecture?" | leann ask codebase --recompute-embeddings
-
-# 3. 查找特定功能
-leann search codebase "user authentication" --recompute-embeddings --top-k 5
-
-# 4. 理解实现细节
-echo "How is user authentication implemented?" | leann ask codebase --recompute-embeddings
-```
-
-### 文档理解工作流
-```bash
-# 1. 索引项目文档
-leann build docs --docs ./docs --recompute-embeddings
-
-# 2. 快速查找信息
-leann search docs "installation requirements" --recompute-embeddings
-
-# 3. 获取详细说明
-echo "What are the system requirements?" | leann ask docs --recompute-embeddings
-```
-
-## ⚠️ 重要提示
-
-1. **必须使用 `--recompute-embeddings`** - 这是关键参数,不加会报错
-2. **需要先激活虚拟环境** - 确保有LEANN的Python环境
-3. **Ollama需要预先安装** - ask功能需要本地LLM
-
-## 🔥 立即可用的Claude提示词
-
-```
-Help me analyze this codebase using LEANN:
-
-1. First, activate the environment:
- cd /Users/andyl/Projects/LEANN-RAG && source .venv/bin/activate.fish
-
-2. Build an index of the source code:
- leann build codebase --docs ./src --recompute-embeddings
-
-3. Search for authentication patterns:
- leann search codebase "authentication middleware" --recompute-embeddings --top-k 10
-
-4. Ask about the authentication system:
- echo "How does user authentication work in this codebase?" | leann ask codebase --recompute-embeddings
-
-Please execute these commands and help me understand the code structure.
-```
-
-## 📈 下一步改进计划
-
-虽然现在已经可以用,但还可以进一步优化:
-
-1. **简化命令** - 默认启用recompute-embeddings
-2. **配置文件** - 避免重复输入参数
-3. **状态管理** - 自动检测环境和索引
-4. **输出格式** - 更适合Claude解析的格式
-
-但这些都是锦上添花,现在就能用起来!
-
-## 🎉 总结
-
-**LEANN现在就可以在Claude Code中完美工作!**
-
-- ✅ 搜索功能正常
-- ✅ RAG问答功能正常
-- ✅ 索引构建功能正常
-- ✅ 支持多种数据源
-- ✅ 支持本地LLM
-
-只需要记住加上 `--recompute-embeddings` 参数就行!
diff --git a/docs/configuration-guide.md b/docs/configuration-guide.md
index 7c6d663..28aa202 100644
--- a/docs/configuration-guide.md
+++ b/docs/configuration-guide.md
@@ -49,14 +49,25 @@ Based on our experience developing LEANN, embedding models fall into three categ
- **Cons**: Slower inference, longer index build times
- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
-### Quick Start: OpenAI Embeddings (Fastest Setup)
+### Quick Start: Cloud and Local Embedding Options
+**OpenAI Embeddings (Fastest Setup)**
For immediate testing without local model downloads:
```bash
# Set OpenAI embeddings (requires OPENAI_API_KEY)
--embedding-mode openai --embedding-model text-embedding-3-small
```
+**Ollama Embeddings (Privacy-Focused)**
+For local embeddings with complete privacy:
+```bash
+# First, pull an embedding model
+ollama pull nomic-embed-text
+
+# Use Ollama embeddings
+--embedding-mode ollama --embedding-model nomic-embed-text
+```
+
Cloud vs Local Trade-offs
diff --git a/packages/leann-backend-diskann/leann_backend_diskann/diskann_embedding_server.py b/packages/leann-backend-diskann/leann_backend_diskann/diskann_embedding_server.py
index b566ae6..456689d 100644
--- a/packages/leann-backend-diskann/leann_backend_diskann/diskann_embedding_server.py
+++ b/packages/leann-backend-diskann/leann_backend_diskann/diskann_embedding_server.py
@@ -263,7 +263,7 @@ if __name__ == "__main__":
"--embedding-mode",
type=str,
default="sentence-transformers",
- choices=["sentence-transformers", "openai", "mlx"],
+ choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode",
)
parser.add_argument(
diff --git a/packages/leann-backend-diskann/pyproject.toml b/packages/leann-backend-diskann/pyproject.toml
index 48b2134..5519ac2 100644
--- a/packages/leann-backend-diskann/pyproject.toml
+++ b/packages/leann-backend-diskann/pyproject.toml
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-diskann"
-version = "0.2.1"
-dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
+version = "0.2.5"
+dependencies = ["leann-core==0.2.5", "numpy", "protobuf>=3.19.0"]
[tool.scikit-build]
# Key: simplified CMake path
diff --git a/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py b/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
index bf36883..f26a050 100644
--- a/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
+++ b/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
@@ -285,7 +285,7 @@ if __name__ == "__main__":
"--embedding-mode",
type=str,
default="sentence-transformers",
- choices=["sentence-transformers", "openai", "mlx"],
+ choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode",
)
diff --git a/packages/leann-backend-hnsw/pyproject.toml b/packages/leann-backend-hnsw/pyproject.toml
index f2b4b5c..89e63eb 100644
--- a/packages/leann-backend-hnsw/pyproject.toml
+++ b/packages/leann-backend-hnsw/pyproject.toml
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-hnsw"
-version = "0.2.1"
+version = "0.2.5"
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
dependencies = [
- "leann-core==0.2.1",
+ "leann-core==0.2.5",
"numpy",
"pyzmq>=23.0.0",
"msgpack>=1.0.0",
diff --git a/packages/leann-core/pyproject.toml b/packages/leann-core/pyproject.toml
index e7d178d..7e564f4 100644
--- a/packages/leann-core/pyproject.toml
+++ b/packages/leann-core/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann-core"
-version = "0.2.1"
+version = "0.2.5"
description = "Core API and plugin system for LEANN"
readme = "README.md"
requires-python = ">=3.9"
diff --git a/packages/leann-core/src/leann/chat.py b/packages/leann-core/src/leann/chat.py
index 541da07..4200e8e 100644
--- a/packages/leann-core/src/leann/chat.py
+++ b/packages/leann-core/src/leann/chat.py
@@ -17,12 +17,12 @@ logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
-def check_ollama_models() -> list[str]:
+def check_ollama_models(host: str) -> list[str]:
"""Check available Ollama models and return a list"""
try:
import requests
- response = requests.get("http://localhost:11434/api/tags", timeout=5)
+ response = requests.get(f"{host}/api/tags", timeout=5)
if response.status_code == 200:
data = response.json()
return [model["name"] for model in data.get("models", [])]
@@ -309,10 +309,12 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:
return search_hf_models_fuzzy(query, limit)
-def validate_model_and_suggest(model_name: str, llm_type: str) -> Optional[str]:
+def validate_model_and_suggest(
+ model_name: str, llm_type: str, host: str = "http://localhost:11434"
+) -> str | None:
"""Validate model name and provide suggestions if invalid"""
if llm_type == "ollama":
- available_models = check_ollama_models()
+ available_models = check_ollama_models(host)
if available_models and model_name not in available_models:
error_msg = f"Model '{model_name}' not found in your local Ollama installation."
@@ -469,7 +471,7 @@ class OllamaChat(LLMInterface):
requests.get(host)
# Pre-check model availability with helpful suggestions
- model_error = validate_model_and_suggest(model, "ollama")
+ model_error = validate_model_and_suggest(model, "ollama", host)
if model_error:
raise ValueError(model_error)
diff --git a/packages/leann-core/src/leann/cli.py b/packages/leann-core/src/leann/cli.py
index 489c5d1..f307204 100644
--- a/packages/leann-core/src/leann/cli.py
+++ b/packages/leann-core/src/leann/cli.py
@@ -74,10 +74,11 @@ class LeannCLI:
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
- leann build my-docs --docs ./documents # Build index named my-docs
- leann search my-docs "query" # Search in my-docs index
- leann ask my-docs "question" # Ask my-docs index
- leann list # List all stored indexes
+ leann build my-docs --docs ./documents # Build index named my-docs
+ leann build my-ppts --docs ./ --file-types .pptx,.pdf # Index only PowerPoint and PDF files
+ leann search my-docs "query" # Search in my-docs index
+ leann ask my-docs "question" # Ask my-docs index
+ leann list # List all stored indexes
""",
)
@@ -93,12 +94,24 @@ Examples:
"--backend", type=str, default="hnsw", choices=["hnsw", "diskann"]
)
build_parser.add_argument("--embedding-model", type=str, default="facebook/contriever")
+ build_parser.add_argument(
+ "--embedding-mode",
+ type=str,
+ default="sentence-transformers",
+ choices=["sentence-transformers", "openai", "mlx", "ollama"],
+ help="Embedding backend mode (default: sentence-transformers)",
+ )
build_parser.add_argument("--force", "-f", action="store_true", help="Force rebuild")
build_parser.add_argument("--graph-degree", type=int, default=32)
build_parser.add_argument("--complexity", type=int, default=64)
build_parser.add_argument("--num-threads", type=int, default=1)
build_parser.add_argument("--compact", action="store_true", default=True)
build_parser.add_argument("--recompute", action="store_true", default=True)
+ build_parser.add_argument(
+ "--file-types",
+ type=str,
+ help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
+ )
# Search command
search_parser = subparsers.add_parser("search", help="Search documents")
@@ -108,7 +121,12 @@ Examples:
search_parser.add_argument("--complexity", type=int, default=64)
search_parser.add_argument("--beam-width", type=int, default=1)
search_parser.add_argument("--prune-ratio", type=float, default=0.0)
- search_parser.add_argument("--recompute-embeddings", action="store_true")
+ search_parser.add_argument(
+ "--recompute-embeddings",
+ action="store_true",
+ default=True,
+ help="Recompute embeddings (default: True)",
+ )
search_parser.add_argument(
"--pruning-strategy",
choices=["global", "local", "proportional"],
@@ -131,7 +149,12 @@ Examples:
ask_parser.add_argument("--complexity", type=int, default=32)
ask_parser.add_argument("--beam-width", type=int, default=1)
ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
- ask_parser.add_argument("--recompute-embeddings", action="store_true")
+ ask_parser.add_argument(
+ "--recompute-embeddings",
+ action="store_true",
+ default=True,
+ help="Recompute embeddings (default: True)",
+ )
ask_parser.add_argument(
"--pruning-strategy",
choices=["global", "local", "proportional"],
@@ -254,8 +277,10 @@ Examples:
print(f' leann search {example_name} "your query"')
print(f" leann ask {example_name} --interactive")
- def load_documents(self, docs_dir: str):
+ def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
print(f"Loading documents from {docs_dir}...")
+ if custom_file_types:
+ print(f"Using custom file types: {custom_file_types}")
# Try to use better PDF parsers first
documents = []
@@ -287,66 +312,81 @@ Examples:
documents.extend(default_docs)
# Load other file types with default reader
- code_extensions = [
- # Original document types
- ".txt",
- ".md",
- ".docx",
- # Code files for Claude Code integration
- ".py",
- ".js",
- ".ts",
- ".jsx",
- ".tsx",
- ".java",
- ".cpp",
- ".c",
- ".h",
- ".hpp",
- ".cs",
- ".go",
- ".rs",
- ".rb",
- ".php",
- ".swift",
- ".kt",
- ".scala",
- ".r",
- ".sql",
- ".sh",
- ".bash",
- ".zsh",
- ".fish",
- ".ps1",
- ".bat",
- # Config and markup files
- ".json",
- ".yaml",
- ".yml",
- ".xml",
- ".toml",
- ".ini",
- ".cfg",
- ".conf",
- ".html",
- ".css",
- ".scss",
- ".less",
- ".vue",
- ".svelte",
- # Data science
- ".ipynb",
- ".R",
- ".py",
- ".jl",
- ]
- other_docs = SimpleDirectoryReader(
- docs_dir,
- recursive=True,
- encoding="utf-8",
- required_exts=code_extensions,
- ).load_data(show_progress=True)
- documents.extend(other_docs)
+ if custom_file_types:
+ # Parse custom file types from comma-separated string
+ code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
+ # Ensure extensions start with a dot
+ code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
+ else:
+ # Use default supported file types
+ code_extensions = [
+ # Original document types
+ ".txt",
+ ".md",
+ ".docx",
+ ".pptx",
+ # Code files for Claude Code integration
+ ".py",
+ ".js",
+ ".ts",
+ ".jsx",
+ ".tsx",
+ ".java",
+ ".cpp",
+ ".c",
+ ".h",
+ ".hpp",
+ ".cs",
+ ".go",
+ ".rs",
+ ".rb",
+ ".php",
+ ".swift",
+ ".kt",
+ ".scala",
+ ".r",
+ ".sql",
+ ".sh",
+ ".bash",
+ ".zsh",
+ ".fish",
+ ".ps1",
+ ".bat",
+ # Config and markup files
+ ".json",
+ ".yaml",
+ ".yml",
+ ".xml",
+ ".toml",
+ ".ini",
+ ".cfg",
+ ".conf",
+ ".html",
+ ".css",
+ ".scss",
+ ".less",
+ ".vue",
+ ".svelte",
+ # Data science
+ ".ipynb",
+ ".R",
+ ".py",
+ ".jl",
+ ]
+ # Try to load other file types, but don't fail if none are found
+ try:
+ other_docs = SimpleDirectoryReader(
+ docs_dir,
+ recursive=True,
+ encoding="utf-8",
+ required_exts=code_extensions,
+ ).load_data(show_progress=True)
+ documents.extend(other_docs)
+ except ValueError as e:
+ if "No files found" in str(e):
+ print("No additional files found for other supported types.")
+ else:
+ raise e
all_texts = []
@@ -424,7 +464,7 @@ Examples:
print(f"Index '{index_name}' already exists. Use --force to rebuild.")
return
- all_texts = self.load_documents(docs_dir)
+ all_texts = self.load_documents(docs_dir, args.file_types)
if not all_texts:
print("No documents found")
return
@@ -436,6 +476,7 @@ Examples:
builder = LeannBuilder(
backend_name=args.backend,
embedding_model=args.embedding_model,
+ embedding_mode=args.embedding_mode,
graph_degree=args.graph_degree,
complexity=args.complexity,
is_compact=args.compact,
diff --git a/packages/leann-core/src/leann/embedding_compute.py b/packages/leann-core/src/leann/embedding_compute.py
index 95fa9e4..67f33d1 100644
--- a/packages/leann-core/src/leann/embedding_compute.py
+++ b/packages/leann-core/src/leann/embedding_compute.py
@@ -6,6 +6,7 @@ Preserves all optimization parameters to ensure performance
import logging
import os
+from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Any
import numpy as np
@@ -35,7 +36,7 @@ def compute_embeddings(
Args:
texts: List of texts to compute embeddings for
model_name: Model name
- mode: Computation mode ('sentence-transformers', 'openai', 'mlx')
+ mode: Computation mode ('sentence-transformers', 'openai', 'mlx', 'ollama')
is_build: Whether this is a build operation (shows progress bar)
batch_size: Batch size for processing
adaptive_optimization: Whether to use adaptive optimization based on batch size
@@ -55,6 +56,8 @@ def compute_embeddings(
return compute_embeddings_openai(texts, model_name)
elif mode == "mlx":
return compute_embeddings_mlx(texts, model_name)
+ elif mode == "ollama":
+ return compute_embeddings_ollama(texts, model_name, is_build=is_build)
else:
raise ValueError(f"Unsupported embedding mode: {mode}")
@@ -365,3 +368,262 @@ def compute_embeddings_mlx(chunks: list[str], model_name: str, batch_size: int =
# Stack numpy arrays
return np.stack(all_embeddings)
+
+
+def compute_embeddings_ollama(
+ texts: list[str], model_name: str, is_build: bool = False, host: str = "http://localhost:11434"
+) -> np.ndarray:
+ """
+ Compute embeddings using Ollama API.
+
+ Args:
+ texts: List of texts to compute embeddings for
+ model_name: Ollama model name (e.g., "nomic-embed-text", "mxbai-embed-large")
+ is_build: Whether this is a build operation (shows progress bar)
+ host: Ollama host URL (default: http://localhost:11434)
+
+ Returns:
+ Normalized embeddings array, shape: (len(texts), embedding_dim)
+ """
+ try:
+ import requests
+ except ImportError:
+ raise ImportError(
+ "The 'requests' library is required for Ollama embeddings. Install with: uv pip install requests"
+ )
+
+ if not texts:
+ raise ValueError("Cannot compute embeddings for empty text list")
+
+ logger.info(
+ f"Computing embeddings for {len(texts)} texts using Ollama API, model: '{model_name}'"
+ )
+
+ # Check if Ollama is running
+ try:
+ response = requests.get(f"{host}/api/version", timeout=5)
+ response.raise_for_status()
+ except requests.exceptions.ConnectionError:
+ error_msg = (
+ f"❌ Could not connect to Ollama at {host}.\n\n"
+ "Please ensure Ollama is running:\n"
+ " • macOS/Linux: ollama serve\n"
+ " • Windows: Make sure Ollama is running in the system tray\n\n"
+ "Installation: https://ollama.com/download"
+ )
+ raise RuntimeError(error_msg)
+ except Exception as e:
+ raise RuntimeError(f"Unexpected error connecting to Ollama: {e}")
+
+ # Check if model exists and provide helpful suggestions
+ try:
+ response = requests.get(f"{host}/api/tags", timeout=5)
+ response.raise_for_status()
+ models = response.json()
+ model_names = [model["name"] for model in models.get("models", [])]
+
+ # Filter for embedding models (models that support embeddings)
+ embedding_models = []
+ suggested_embedding_models = [
+ "nomic-embed-text",
+ "mxbai-embed-large",
+ "bge-m3",
+ "all-minilm",
+ "snowflake-arctic-embed",
+ ]
+
+ for model in model_names:
+ # Check if it's an embedding model (by name patterns or known models)
+ base_name = model.split(":")[0]
+ if any(emb in base_name for emb in ["embed", "bge", "minilm", "e5"]):
+ embedding_models.append(model)
+
+ # Check if model exists (handle versioned names)
+ model_found = any(
+ model_name == name.split(":")[0] or model_name == name for name in model_names
+ )
+
+ if not model_found:
+ error_msg = f"❌ Model '{model_name}' not found in local Ollama.\n\n"
+
+ # Suggest pulling the model
+ error_msg += "📦 To install this embedding model:\n"
+ error_msg += f" ollama pull {model_name}\n\n"
+
+ # Show available embedding models
+ if embedding_models:
+ error_msg += "✅ Available embedding models:\n"
+ for model in embedding_models[:5]:
+ error_msg += f" • {model}\n"
+ if len(embedding_models) > 5:
+ error_msg += f" ... and {len(embedding_models) - 5} more\n"
+ else:
+ error_msg += "💡 Popular embedding models to install:\n"
+ for model in suggested_embedding_models[:3]:
+ error_msg += f" • ollama pull {model}\n"
+
+ error_msg += "\n📚 Browse more: https://ollama.com/library"
+ raise ValueError(error_msg)
+
+ # Verify the model supports embeddings by testing it
+ try:
+ test_response = requests.post(
+ f"{host}/api/embeddings", json={"model": model_name, "prompt": "test"}, timeout=10
+ )
+ if test_response.status_code != 200:
+ error_msg = (
+ f"⚠️ Model '{model_name}' exists but may not support embeddings.\n\n"
+ f"Please use an embedding model like:\n"
+ )
+ for model in suggested_embedding_models[:3]:
+ error_msg += f" • {model}\n"
+ raise ValueError(error_msg)
+ except requests.exceptions.RequestException:
+ # If test fails, continue anyway - model might still work
+ pass
+
+ except requests.exceptions.RequestException as e:
+ logger.warning(f"Could not verify model existence: {e}")
+
+ # Process embeddings with optimized concurrent processing
+ import requests
+
+ def get_single_embedding(text_idx_tuple):
+ """Helper function to get embedding for a single text."""
+ text, idx = text_idx_tuple
+ max_retries = 3
+ retry_count = 0
+
+ # Truncate very long texts to avoid API issues
+ truncated_text = text[:8000] if len(text) > 8000 else text
+
+ while retry_count < max_retries:
+ try:
+ response = requests.post(
+ f"{host}/api/embeddings",
+ json={"model": model_name, "prompt": truncated_text},
+ timeout=30,
+ )
+ response.raise_for_status()
+
+ result = response.json()
+ embedding = result.get("embedding")
+
+ if embedding is None:
+ raise ValueError(f"No embedding returned for text {idx}")
+
+ return idx, embedding
+
+ except requests.exceptions.Timeout:
+ retry_count += 1
+ if retry_count >= max_retries:
+ logger.warning(f"Timeout for text {idx} after {max_retries} retries")
+ return idx, None
+
+ except Exception as e:
+ if retry_count >= max_retries - 1:
+ logger.error(f"Failed to get embedding for text {idx}: {e}")
+ return idx, None
+ retry_count += 1
+
+ return idx, None
+
+ # Determine if we should use concurrent processing
+ use_concurrent = (
+ len(texts) > 5 and not is_build
+ ) # Don't use concurrent in build mode to avoid overwhelming
+ max_workers = min(4, len(texts)) # Limit concurrent requests to avoid overwhelming Ollama
+
+ all_embeddings = [None] * len(texts) # Pre-allocate list to maintain order
+ failed_indices = []
+
+ if use_concurrent:
+ logger.info(
+ f"Using concurrent processing with {max_workers} workers for {len(texts)} texts"
+ )
+
+ with ThreadPoolExecutor(max_workers=max_workers) as executor:
+ # Submit all tasks
+ future_to_idx = {
+ executor.submit(get_single_embedding, (text, idx)): idx
+ for idx, text in enumerate(texts)
+ }
+
+ # Add progress bar for concurrent processing
+ try:
+ if is_build or len(texts) > 10:
+ from tqdm import tqdm
+
+ futures_iterator = tqdm(
+ as_completed(future_to_idx),
+ total=len(texts),
+ desc="Computing Ollama embeddings",
+ )
+ else:
+ futures_iterator = as_completed(future_to_idx)
+ except ImportError:
+ futures_iterator = as_completed(future_to_idx)
+
+ # Collect results as they complete
+ for future in futures_iterator:
+ try:
+ idx, embedding = future.result()
+ if embedding is not None:
+ all_embeddings[idx] = embedding
+ else:
+ failed_indices.append(idx)
+ except Exception as e:
+ idx = future_to_idx[future]
+ logger.error(f"Exception for text {idx}: {e}")
+ failed_indices.append(idx)
+
+ else:
+ # Sequential processing with progress bar
+ show_progress = is_build or len(texts) > 10
+
+ try:
+ if show_progress:
+ from tqdm import tqdm
+
+ iterator = tqdm(
+ enumerate(texts), total=len(texts), desc="Computing Ollama embeddings"
+ )
+ else:
+ iterator = enumerate(texts)
+ except ImportError:
+ iterator = enumerate(texts)
+
+ for idx, text in iterator:
+ result_idx, embedding = get_single_embedding((text, idx))
+ if embedding is not None:
+ all_embeddings[idx] = embedding
+ else:
+ failed_indices.append(idx)
+
+ # Handle failed embeddings
+ if failed_indices:
+ if len(failed_indices) == len(texts):
+ raise RuntimeError("Failed to compute any embeddings")
+
+ logger.warning(f"Failed to compute embeddings for {len(failed_indices)}/{len(texts)} texts")
+
+ # Use zero embeddings as fallback for failed ones
+ valid_embedding = next((e for e in all_embeddings if e is not None), None)
+ if valid_embedding:
+ embedding_dim = len(valid_embedding)
+ for idx in failed_indices:
+ all_embeddings[idx] = [0.0] * embedding_dim
+
+ # Remove None values and convert to numpy array
+ all_embeddings = [e for e in all_embeddings if e is not None]
+
+ # Convert to numpy array and normalize
+ embeddings = np.array(all_embeddings, dtype=np.float32)
+
+ # Normalize embeddings (L2 normalization)
+ norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
+ embeddings = embeddings / (norms + 1e-8) # Add small epsilon to avoid division by zero
+
+ logger.info(f"Generated {len(embeddings)} embeddings, dimension: {embeddings.shape[1]}")
+
+ return embeddings
diff --git a/packages/leann-core/src/leann/mcp.py b/packages/leann-core/src/leann/mcp.py
index 6de6750..f5a2cae 100755
--- a/packages/leann-core/src/leann/mcp.py
+++ b/packages/leann-core/src/leann/mcp.py
@@ -1,7 +1,6 @@
#!/usr/bin/env python3
import json
-import os
import subprocess
import sys
@@ -62,10 +61,6 @@ def handle_request(request):
tool_name = request["params"]["name"]
args = request["params"].get("arguments", {})
- # Set working directory and environment
- env = os.environ.copy()
- cwd = "/Users/andyl/Projects/LEANN-RAG"
-
try:
if tool_name == "leann_search":
cmd = [
@@ -76,18 +71,14 @@ def handle_request(request):
"--recompute-embeddings",
f"--top-k={args.get('top_k', 5)}",
]
- result = subprocess.run(cmd, capture_output=True, text=True, cwd=cwd, env=env)
+ result = subprocess.run(cmd, capture_output=True, text=True)
elif tool_name == "leann_ask":
cmd = f'echo "{args["question"]}" | leann ask {args["index_name"]} --recompute-embeddings --llm ollama --model qwen3:8b'
- result = subprocess.run(
- cmd, shell=True, capture_output=True, text=True, cwd=cwd, env=env
- )
+ result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
elif tool_name == "leann_list":
- result = subprocess.run(
- ["leann", "list"], capture_output=True, text=True, cwd=cwd, env=env
- )
+ result = subprocess.run(["leann", "list"], capture_output=True, text=True)
return {
"jsonrpc": "2.0",
diff --git a/packages/leann-mcp/README.md b/packages/leann-mcp/README.md
index 0893ad3..b762ae9 100644
--- a/packages/leann-mcp/README.md
+++ b/packages/leann-mcp/README.md
@@ -1,18 +1,25 @@
-# LEANN Claude Code Integration
+# 🔥 LEANN Claude Code Integration
-Intelligent code assistance using LEANN's vector search directly in Claude Code.
+Transform your development workflow with intelligent code assistance using LEANN's semantic search directly in Claude Code.
## Prerequisites
-First, install LEANN CLI globally:
+**Step 1:** First, complete the basic LEANN installation following the [📦 Installation guide](../../README.md#installation) in the root README:
+```bash
+uv venv
+source .venv/bin/activate
+uv pip install leann
+```
+
+**Step 2:** Install LEANN globally for MCP integration:
```bash
uv tool install leann-core
```
This makes the `leann` command available system-wide, which `leann_mcp` requires.
-## Quick Setup
+## 🚀 Quick Setup
Add the LEANN MCP server to Claude Code:
@@ -20,23 +27,25 @@ Add the LEANN MCP server to Claude Code:
claude mcp add leann-server -- leann_mcp
```
-## Available Tools
+## 🛠️ Available Tools
-- **`leann_list`** - List available indexes across all projects
-- **`leann_search`** - Search code and documents with semantic queries
-- **`leann_ask`** - Ask questions and get AI-powered answers from your codebase
+Once connected, you'll have access to these powerful semantic search tools in Claude Code:
-## Quick Start
+- **`leann_list`** - List all available indexes across your projects
+- **`leann_search`** - Perform semantic searches across code and documents
+- **`leann_ask`** - Ask natural language questions and get AI-powered answers from your codebase
+
+## 🎯 Quick Start Example
```bash
-# Build an index for your project
-leann build my-project
+# Build an index for your project (change to your actual path)
+leann build my-project --docs ./
# Start Claude Code
claude
```
-Then in Claude Code:
+**Try this in Claude Code:**
```
Help me understand this codebase. List available indexes and search for authentication patterns.
```
@@ -46,24 +55,37 @@ Help me understand this codebase. List available indexes and search for authenti
-## How It Works
+## 🧠 How It Works
-- **`leann`** - Core CLI tool for indexing and searching (installed globally)
+The integration consists of three key components working seamlessly together:
+
+- **`leann`** - Core CLI tool for indexing and searching (installed globally via `uv tool install`)
- **`leann_mcp`** - MCP server that wraps `leann` commands for Claude Code integration
-- Claude Code calls `leann_mcp`, which executes `leann` commands and returns results
+- **Claude Code** - Calls `leann_mcp`, which executes `leann` commands and returns intelligent results
-## File Support
+## 📁 File Support
-Python, JavaScript, TypeScript, Java, Go, Rust, SQL, YAML, JSON, and 30+ more file types.
+LEANN understands **30+ file types** including:
+- **Programming**: Python, JavaScript, TypeScript, Java, Go, Rust, C++, C#
+- **Data**: SQL, YAML, JSON, CSV, XML
+- **Documentation**: Markdown, TXT, PDF
+- **And many more!**
-## Storage
+## 💾 Storage & Organization
-- Project indexes in `.leann/` directory (like `.git`)
-- Global project registry at `~/.leann/projects.json`
-- Multi-project support built-in
+- **Project indexes**: Stored in `.leann/` directory (just like `.git`)
+- **Global registry**: Project tracking at `~/.leann/projects.json`
+- **Multi-project support**: Switch between different codebases seamlessly
+- **Portable**: Transfer indexes between machines with minimal overhead
-## Removing
+## 🗑️ Uninstalling
+
+To remove the LEANN MCP server from Claude Code:
```bash
claude mcp remove leann-server
```
+To remove LEANN
+```
+uv pip uninstall leann leann-backend-hnsw leann-core
+```
diff --git a/packages/leann/pyproject.toml b/packages/leann/pyproject.toml
index 1f3cb50..17b50d8 100644
--- a/packages/leann/pyproject.toml
+++ b/packages/leann/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann"
-version = "0.2.1"
+version = "0.2.5"
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
readme = "README.md"
requires-python = ">=3.9"