diff --git a/README.md b/README.md index 868c994..1f8c46c 100755 --- a/README.md +++ b/README.md @@ -18,6 +18,8 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg **Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy. +> **🚀 NEW: Claude Code Integration!** LEANN now provides native MCP integration for Claude Code users. Index your codebase and get intelligent code assistance directly in Claude Code. [Setup Guide →](packages/leann-mcp/README.md) + ## Why LEANN? @@ -428,7 +430,7 @@ source .venv/bin/activate leann --help ``` -**To make it globally available (recommended for daily use):** +**To make it globally available:** ```bash # Install the LEANN CLI globally using uv tool uv tool install leann @@ -437,12 +439,17 @@ uv tool install leann leann --help ``` +> **Note**: Global installation is required for Claude Code integration. The `leann_mcp` server depends on the globally available `leann` command. + ### Usage Examples ```bash -# Build an index from documents +# Build an index from current directory (default) +leann build my-docs + +# Or from specific directory leann build my-docs --docs ./documents # Search your documents diff --git a/assets/claude_code_leann.png b/assets/claude_code_leann.png new file mode 100644 index 0000000..12894ef Binary files /dev/null and b/assets/claude_code_leann.png differ diff --git a/docs/claude-code-integration.md b/docs/claude-code-integration.md new file mode 100644 index 0000000..e19adfb --- /dev/null +++ b/docs/claude-code-integration.md @@ -0,0 +1,150 @@ +# Claude Code x LEANN 集成指南 + +## ✅ 现状:已经可以工作! + +好消息:LEANN CLI已经完全可以在Claude Code中使用,无需任何修改! + +## 🚀 立即开始 + +### 1. 激活环境 +```bash +# 在LEANN项目目录下 +source .venv/bin/activate.fish # fish shell +# 或 +source .venv/bin/activate # bash shell +``` + +### 2. 基本命令 + +#### 查看现有索引 +```bash +leann list +``` + +#### 搜索文档 +```bash +leann search my-docs "machine learning" --recompute-embeddings +``` + +#### 问答对话 +```bash +echo "What is machine learning?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings +``` + +#### 构建新索引 +```bash +leann build project-docs --docs ./src --recompute-embeddings +``` + +## 💡 Claude Code 使用技巧 + +### 在Claude Code中直接使用 + +1. **激活环境**: + ```bash + cd /Users/andyl/Projects/LEANN-RAG + source .venv/bin/activate.fish + ``` + +2. **搜索代码库**: + ```bash + leann search my-docs "authentication patterns" --recompute-embeddings --top-k 10 + ``` + +3. **智能问答**: + ```bash + echo "How does the authentication system work?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings + ``` + +### 批量操作示例 + +```bash +# 构建项目文档索引 +leann build project-docs --docs ./docs --force + +# 搜索多个关键词 +leann search project-docs "API authentication" --recompute-embeddings +leann search project-docs "database schema" --recompute-embeddings +leann search project-docs "deployment guide" --recompute-embeddings + +# 问答模式 +echo "What are the API endpoints?" | leann ask project-docs --recompute-embeddings +``` + +## 🎯 Claude 可以立即执行的工作流 + +### 代码分析工作流 +```bash +# 1. 构建代码库索引 +leann build codebase --docs ./src --backend hnsw --recompute-embeddings + +# 2. 分析架构 +echo "What is the overall architecture?" | leann ask codebase --recompute-embeddings + +# 3. 查找特定功能 +leann search codebase "user authentication" --recompute-embeddings --top-k 5 + +# 4. 理解实现细节 +echo "How is user authentication implemented?" | leann ask codebase --recompute-embeddings +``` + +### 文档理解工作流 +```bash +# 1. 索引项目文档 +leann build docs --docs ./docs --recompute-embeddings + +# 2. 快速查找信息 +leann search docs "installation requirements" --recompute-embeddings + +# 3. 获取详细说明 +echo "What are the system requirements?" | leann ask docs --recompute-embeddings +``` + +## ⚠️ 重要提示 + +1. **必须使用 `--recompute-embeddings`** - 这是关键参数,不加会报错 +2. **需要先激活虚拟环境** - 确保有LEANN的Python环境 +3. **Ollama需要预先安装** - ask功能需要本地LLM + +## 🔥 立即可用的Claude提示词 + +``` +Help me analyze this codebase using LEANN: + +1. First, activate the environment: + cd /Users/andyl/Projects/LEANN-RAG && source .venv/bin/activate.fish + +2. Build an index of the source code: + leann build codebase --docs ./src --recompute-embeddings + +3. Search for authentication patterns: + leann search codebase "authentication middleware" --recompute-embeddings --top-k 10 + +4. Ask about the authentication system: + echo "How does user authentication work in this codebase?" | leann ask codebase --recompute-embeddings + +Please execute these commands and help me understand the code structure. +``` + +## 📈 下一步改进计划 + +虽然现在已经可以用,但还可以进一步优化: + +1. **简化命令** - 默认启用recompute-embeddings +2. **配置文件** - 避免重复输入参数 +3. **状态管理** - 自动检测环境和索引 +4. **输出格式** - 更适合Claude解析的格式 + +但这些都是锦上添花,现在就能用起来! + +## 🎉 总结 + +**LEANN现在就可以在Claude Code中完美工作!** + +- ✅ 搜索功能正常 +- ✅ RAG问答功能正常 +- ✅ 索引构建功能正常 +- ✅ 支持多种数据源 +- ✅ 支持本地LLM + +只需要记住加上 `--recompute-embeddings` 参数就行! diff --git a/packages/leann-backend-diskann/third_party/DiskANN b/packages/leann-backend-diskann/third_party/DiskANN index af2a264..67a2611 160000 --- a/packages/leann-backend-diskann/third_party/DiskANN +++ b/packages/leann-backend-diskann/third_party/DiskANN @@ -1 +1 @@ -Subproject commit af2a26481e65232b57b82d96e68833cdee9f7635 +Subproject commit 67a2611ad14bc11d84dfdb554c5567cfb78a2656 diff --git a/packages/leann-core/pyproject.toml b/packages/leann-core/pyproject.toml index 7078457..e7d178d 100644 --- a/packages/leann-core/pyproject.toml +++ b/packages/leann-core/pyproject.toml @@ -44,6 +44,7 @@ colab = [ [project.scripts] leann = "leann.cli:main" +leann_mcp = "leann.mcp:main" [tool.setuptools.packages.find] where = ["src"] diff --git a/packages/leann-core/src/leann/cli.py b/packages/leann-core/src/leann/cli.py index b239b2a..489c5d1 100644 --- a/packages/leann-core/src/leann/cli.py +++ b/packages/leann-core/src/leann/cli.py @@ -41,13 +41,23 @@ def extract_pdf_text_with_pdfplumber(file_path: str) -> str: class LeannCLI: def __init__(self): - self.indexes_dir = Path.home() / ".leann" / "indexes" + # Always use project-local .leann directory (like .git) + self.indexes_dir = Path.cwd() / ".leann" / "indexes" self.indexes_dir.mkdir(parents=True, exist_ok=True) + # Default parser for documents self.node_parser = SentenceSplitter( chunk_size=256, chunk_overlap=128, separator=" ", paragraph_separator="\n\n" ) + # Code-optimized parser + self.code_parser = SentenceSplitter( + chunk_size=512, # Larger chunks for code context + chunk_overlap=50, # Less overlap to preserve function boundaries + separator="\n", # Split by lines for code + paragraph_separator="\n\n", # Preserve logical code blocks + ) + def get_index_path(self, index_name: str) -> str: index_dir = self.indexes_dir / index_name return str(index_dir / "documents.leann") @@ -76,7 +86,9 @@ Examples: # Build command build_parser = subparsers.add_parser("build", help="Build document index") build_parser.add_argument("index_name", help="Index name") - build_parser.add_argument("--docs", type=str, required=True, help="Documents directory") + build_parser.add_argument( + "--docs", type=str, default=".", help="Documents directory (default: current directory)" + ) build_parser.add_argument( "--backend", type=str, default="hnsw", choices=["hnsw", "diskann"] ) @@ -138,37 +150,109 @@ Examples: return parser + def register_project_dir(self): + """Register current project directory in global registry""" + global_registry = Path.home() / ".leann" / "projects.json" + global_registry.parent.mkdir(exist_ok=True) + + current_dir = str(Path.cwd()) + + # Load existing registry + projects = [] + if global_registry.exists(): + try: + import json + + with open(global_registry) as f: + projects = json.load(f) + except Exception: + projects = [] + + # Add current directory if not already present + if current_dir not in projects: + projects.append(current_dir) + + # Save registry + import json + + with open(global_registry, "w") as f: + json.dump(projects, f, indent=2) + def list_indexes(self): print("Stored LEANN indexes:") - if not self.indexes_dir.exists(): + # Get all project directories with .leann + global_registry = Path.home() / ".leann" / "projects.json" + all_projects = [] + + if global_registry.exists(): + try: + import json + + with open(global_registry) as f: + all_projects = json.load(f) + except Exception: + pass + + # Filter to only existing directories with .leann + valid_projects = [] + for project_dir in all_projects: + project_path = Path(project_dir) + if project_path.exists() and (project_path / ".leann" / "indexes").exists(): + valid_projects.append(project_path) + + # Add current project if it has .leann but not in registry + current_path = Path.cwd() + if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects: + valid_projects.append(current_path) + + if not valid_projects: print("No indexes found. Use 'leann build --docs ' to create one.") return - index_dirs = [d for d in self.indexes_dir.iterdir() if d.is_dir()] + total_indexes = 0 + current_dir = Path.cwd() - if not index_dirs: - print("No indexes found. Use 'leann build --docs ' to create one.") - return + for project_path in valid_projects: + indexes_dir = project_path / ".leann" / "indexes" + if not indexes_dir.exists(): + continue - print(f"Found {len(index_dirs)} indexes:") - for i, index_dir in enumerate(index_dirs, 1): - index_name = index_dir.name - status = "✓" if self.index_exists(index_name) else "✗" + index_dirs = [d for d in indexes_dir.iterdir() if d.is_dir()] + if not index_dirs: + continue - print(f" {i}. {index_name} [{status}]") - if self.index_exists(index_name): - index_dir / "documents.leann.meta.json" - size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / ( - 1024 * 1024 - ) - print(f" Size: {size_mb:.1f} MB") + # Show project header + if project_path == current_dir: + print(f"\n📁 Current project ({project_path}):") + else: + print(f"\n📂 {project_path}:") - if index_dirs: - example_name = index_dirs[0].name - print("\nUsage:") - print(f' leann search {example_name} "your query"') - print(f" leann ask {example_name} --interactive") + for index_dir in index_dirs: + total_indexes += 1 + index_name = index_dir.name + meta_file = index_dir / "documents.leann.meta.json" + status = "✓" if meta_file.exists() else "✗" + + print(f" {total_indexes}. {index_name} [{status}]") + if status == "✓": + size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / ( + 1024 * 1024 + ) + print(f" Size: {size_mb:.1f} MB") + + if total_indexes > 0: + print(f"\nTotal: {total_indexes} indexes across {len(valid_projects)} projects") + print("\nUsage (current project only):") + + # Show example from current project + current_indexes_dir = current_dir / ".leann" / "indexes" + if current_indexes_dir.exists(): + current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()] + if current_index_dirs: + example_name = current_index_dirs[0].name + print(f' leann search {example_name} "your query"') + print(f" leann ask {example_name} --interactive") def load_documents(self, docs_dir: str): print(f"Loading documents from {docs_dir}...") @@ -203,17 +287,125 @@ Examples: documents.extend(default_docs) # Load other file types with default reader + code_extensions = [ + # Original document types + ".txt", + ".md", + ".docx", + # Code files for Claude Code integration + ".py", + ".js", + ".ts", + ".jsx", + ".tsx", + ".java", + ".cpp", + ".c", + ".h", + ".hpp", + ".cs", + ".go", + ".rs", + ".rb", + ".php", + ".swift", + ".kt", + ".scala", + ".r", + ".sql", + ".sh", + ".bash", + ".zsh", + ".fish", + ".ps1", + ".bat", + # Config and markup files + ".json", + ".yaml", + ".yml", + ".xml", + ".toml", + ".ini", + ".cfg", + ".conf", + ".html", + ".css", + ".scss", + ".less", + ".vue", + ".svelte", + # Data science + ".ipynb", + ".R", + ".py", + ".jl", + ] other_docs = SimpleDirectoryReader( docs_dir, recursive=True, encoding="utf-8", - required_exts=[".txt", ".md", ".docx"], + required_exts=code_extensions, ).load_data(show_progress=True) documents.extend(other_docs) all_texts = [] + + # Define code file extensions for intelligent chunking + code_file_exts = { + ".py", + ".js", + ".ts", + ".jsx", + ".tsx", + ".java", + ".cpp", + ".c", + ".h", + ".hpp", + ".cs", + ".go", + ".rs", + ".rb", + ".php", + ".swift", + ".kt", + ".scala", + ".r", + ".sql", + ".sh", + ".bash", + ".zsh", + ".fish", + ".ps1", + ".bat", + ".json", + ".yaml", + ".yml", + ".xml", + ".toml", + ".ini", + ".cfg", + ".conf", + ".html", + ".css", + ".scss", + ".less", + ".vue", + ".svelte", + ".ipynb", + ".R", + ".jl", + } + for doc in documents: - nodes = self.node_parser.get_nodes_from_documents([doc]) + # Check if this is a code file based on source path + source_path = doc.metadata.get("source", "") + is_code_file = any(source_path.endswith(ext) for ext in code_file_exts) + + # Use appropriate parser based on file type + parser = self.code_parser if is_code_file else self.node_parser + nodes = parser.get_nodes_from_documents([doc]) + for node in nodes: all_texts.append(node.get_content()) @@ -226,6 +418,8 @@ Examples: index_dir = self.indexes_dir / index_name index_path = self.get_index_path(index_name) + print(f"📂 Indexing: {Path(docs_dir).resolve()}") + if index_dir.exists() and not args.force: print(f"Index '{index_name}' already exists. Use --force to rebuild.") return @@ -255,6 +449,9 @@ Examples: builder.build_index(index_path) print(f"Index built at {index_path}") + # Register this project directory in global registry + self.register_project_dir() + async def search_documents(self, args): index_name = args.index_name query = args.query diff --git a/packages/leann-core/src/leann/mcp.py b/packages/leann-core/src/leann/mcp.py new file mode 100755 index 0000000..6de6750 --- /dev/null +++ b/packages/leann-core/src/leann/mcp.py @@ -0,0 +1,134 @@ +#!/usr/bin/env python3 + +import json +import os +import subprocess +import sys + + +def handle_request(request): + if request.get("method") == "initialize": + return { + "jsonrpc": "2.0", + "id": request.get("id"), + "result": { + "capabilities": {"tools": {}}, + "protocolVersion": "2024-11-05", + "serverInfo": {"name": "leann-mcp", "version": "1.0.0"}, + }, + } + + elif request.get("method") == "tools/list": + return { + "jsonrpc": "2.0", + "id": request.get("id"), + "result": { + "tools": [ + { + "name": "leann_search", + "description": "Search LEANN index", + "inputSchema": { + "type": "object", + "properties": { + "index_name": {"type": "string"}, + "query": {"type": "string"}, + "top_k": {"type": "integer", "default": 5}, + }, + "required": ["index_name", "query"], + }, + }, + { + "name": "leann_ask", + "description": "Ask question using LEANN RAG", + "inputSchema": { + "type": "object", + "properties": { + "index_name": {"type": "string"}, + "question": {"type": "string"}, + }, + "required": ["index_name", "question"], + }, + }, + { + "name": "leann_list", + "description": "List all LEANN indexes", + "inputSchema": {"type": "object", "properties": {}}, + }, + ] + }, + } + + elif request.get("method") == "tools/call": + tool_name = request["params"]["name"] + args = request["params"].get("arguments", {}) + + # Set working directory and environment + env = os.environ.copy() + cwd = "/Users/andyl/Projects/LEANN-RAG" + + try: + if tool_name == "leann_search": + cmd = [ + "leann", + "search", + args["index_name"], + args["query"], + "--recompute-embeddings", + f"--top-k={args.get('top_k', 5)}", + ] + result = subprocess.run(cmd, capture_output=True, text=True, cwd=cwd, env=env) + + elif tool_name == "leann_ask": + cmd = f'echo "{args["question"]}" | leann ask {args["index_name"]} --recompute-embeddings --llm ollama --model qwen3:8b' + result = subprocess.run( + cmd, shell=True, capture_output=True, text=True, cwd=cwd, env=env + ) + + elif tool_name == "leann_list": + result = subprocess.run( + ["leann", "list"], capture_output=True, text=True, cwd=cwd, env=env + ) + + return { + "jsonrpc": "2.0", + "id": request.get("id"), + "result": { + "content": [ + { + "type": "text", + "text": result.stdout + if result.returncode == 0 + else f"Error: {result.stderr}", + } + ] + }, + } + + except Exception as e: + return { + "jsonrpc": "2.0", + "id": request.get("id"), + "error": {"code": -1, "message": str(e)}, + } + + +def main(): + for line in sys.stdin: + try: + request = json.loads(line.strip()) + response = handle_request(request) + if response: + print(json.dumps(response)) + sys.stdout.flush() + except Exception as e: + error_response = { + "jsonrpc": "2.0", + "id": None, + "error": {"code": -1, "message": str(e)}, + } + print(json.dumps(error_response)) + sys.stdout.flush() + + +if __name__ == "__main__": + main() diff --git a/packages/leann-mcp/README.md b/packages/leann-mcp/README.md new file mode 100644 index 0000000..bcda6a0 --- /dev/null +++ b/packages/leann-mcp/README.md @@ -0,0 +1,69 @@ +# LEANN Claude Code Integration + +Intelligent code assistance using LEANN's vector search directly in Claude Code. + +## Prerequisites + +First, install LEANN CLI globally: + +```bash +uv tool install leann +``` + +This makes the `leann` command available system-wide, which `leann_mcp` requires. + +## Quick Setup + +Add the LEANN MCP server to Claude Code: + +```bash +claude mcp add leann-server -- leann_mcp +``` + +## Available Tools + +- **`leann_list`** - List available indexes across all projects +- **`leann_search`** - Search code and documents with semantic queries +- **`leann_ask`** - Ask questions and get AI-powered answers from your codebase + +## Quick Start + +```bash +# Build an index for your project +leann build my-project + +# Start Claude Code +claude +``` + +Then in Claude Code: +``` +Help me understand this codebase. List available indexes and search for authentication patterns. +``` + +

+ LEANN in Claude Code +

+ + +## How It Works + +- **`leann`** - Core CLI tool for indexing and searching (installed globally) +- **`leann_mcp`** - MCP server that wraps `leann` commands for Claude Code integration +- Claude Code calls `leann_mcp`, which executes `leann` commands and returns results + +## File Support + +Python, JavaScript, TypeScript, Java, Go, Rust, SQL, YAML, JSON, and 30+ more file types. + +## Storage + +- Project indexes in `.leann/` directory (like `.git`) +- Global project registry at `~/.leann/projects.json` +- Multi-project support built-in + +## Removing + +```bash +claude mcp remove leann-server +```