Merge remote-tracking branch 'origin/main' into feature/claude-code-research

feat: Add Claude Code integration with MCP server
feat: Claude Code integration ready - LEANN CLI works out of the box
2025-08-05 23:02:00 -07:00 · 2025-08-05 14:03:36 -07:00 · 2025-08-05 12:27:58 -07:00 · 2025-08-04 20:10:14 -07:00 · 2025-08-04 20:01:23 -07:00 · 2025-08-04 19:29:17 -07:00
8 changed files with 586 additions and 28 deletions
--- a/README.md
+++ b/README.md
@@ -18,6 +18,8 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
 **Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
 > **🚀 NEW: Claude Code Integration!** LEANN now provides native MCP integration for Claude Code users. Index your codebase and get intelligent code assistance directly in Claude Code. [Setup Guide →](packages/leann-mcp/README.md)
 ## Why LEANN?
@@ -428,7 +430,7 @@ source .venv/bin/activate
 leann --help
 ```
-**To make it globally available (recommended for daily use):**
+**To make it globally available:**
 ```bash
 # Install the LEANN CLI globally using uv tool
 uv tool install leann
@@ -437,12 +439,17 @@ uv tool install leann
 leann --help
 ```
 > **Note**: Global installation is required for Claude Code integration. The `leann_mcp` server depends on the globally available `leann` command.
 ### Usage Examples
 ```bash
-# Build an index from documents
+# Build an index from current directory (default)
 leann build my-docs
 # Or from specific directory
 leann build my-docs --docs ./documents
 # Search your documents
--- a/assets/claude_code_leann.png
+++ b/assets/claude_code_leann.png
--- a/docs/claude-code-integration.md
+++ b/docs/claude-code-integration.md
@@ -0,0 +1,150 @@
 # Claude Code x LEANN 集成指南
 ## ✅ 现状：已经可以工作！
 好消息：LEANN CLI已经完全可以在Claude Code中使用，无需任何修改！
 ## 🚀 立即开始
 ### 1. 激活环境
 ```bash
 # 在LEANN项目目录下
 source .venv/bin/activate.fish  # fish shell
 # 或
 source .venv/bin/activate       # bash shell
 ```
 ### 2. 基本命令
 #### 查看现有索引
 ```bash
 leann list
 ```
 #### 搜索文档
 ```bash
 leann search my-docs "machine learning" --recompute-embeddings
 ```
 #### 问答对话
 ```bash
 echo "What is machine learning?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
 ```
 #### 构建新索引
 ```bash
 leann build project-docs --docs ./src --recompute-embeddings
 ```
 ## 💡 Claude Code 使用技巧
 ### 在Claude Code中直接使用
 1. **激活环境**：
   ```bash
   cd /Users/andyl/Projects/LEANN-RAG
   source .venv/bin/activate.fish
   ```
 2. **搜索代码库**：
   ```bash
   leann search my-docs "authentication patterns" --recompute-embeddings --top-k 10
   ```
 3. **智能问答**：
   ```bash
   echo "How does the authentication system work?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
   ```
 ### 批量操作示例
 ```bash
 # 构建项目文档索引
 leann build project-docs --docs ./docs --force
 # 搜索多个关键词
 leann search project-docs "API authentication" --recompute-embeddings
 leann search project-docs "database schema" --recompute-embeddings
 leann search project-docs "deployment guide" --recompute-embeddings
 # 问答模式
 echo "What are the API endpoints?" | leann ask project-docs --recompute-embeddings
 ```
 ## 🎯 Claude 可以立即执行的工作流
 ### 代码分析工作流
 ```bash
 # 1. 构建代码库索引
 leann build codebase --docs ./src --backend hnsw --recompute-embeddings
 # 2. 分析架构
 echo "What is the overall architecture?" | leann ask codebase --recompute-embeddings
 # 3. 查找特定功能
 leann search codebase "user authentication" --recompute-embeddings --top-k 5
 # 4. 理解实现细节
 echo "How is user authentication implemented?" | leann ask codebase --recompute-embeddings
 ```
 ### 文档理解工作流
 ```bash
 # 1. 索引项目文档
 leann build docs --docs ./docs --recompute-embeddings
 # 2. 快速查找信息
 leann search docs "installation requirements" --recompute-embeddings
 # 3. 获取详细说明
 echo "What are the system requirements?" | leann ask docs --recompute-embeddings
 ```
 ## ⚠️ 重要提示
 1. **必须使用 `--recompute-embeddings`** - 这是关键参数，不加会报错
 2. **需要先激活虚拟环境** - 确保有LEANN的Python环境
 3. **Ollama需要预先安装** - ask功能需要本地LLM
 ## 🔥 立即可用的Claude提示词
 ```
 Help me analyze this codebase using LEANN:
 1. First, activate the environment:
   cd /Users/andyl/Projects/LEANN-RAG && source .venv/bin/activate.fish
 2. Build an index of the source code:
   leann build codebase --docs ./src --recompute-embeddings
 3. Search for authentication patterns:
   leann search codebase "authentication middleware" --recompute-embeddings --top-k 10
 4. Ask about the authentication system:
   echo "How does user authentication work in this codebase?" | leann ask codebase --recompute-embeddings
 Please execute these commands and help me understand the code structure.
 ```
 ## 📈 下一步改进计划
 虽然现在已经可以用，但还可以进一步优化：
 1. **简化命令** - 默认启用recompute-embeddings
 2. **配置文件** - 避免重复输入参数
 3. **状态管理** - 自动检测环境和索引
 4. **输出格式** - 更适合Claude解析的格式
 但这些都是锦上添花，现在就能用起来！
 ## 🎉 总结
 **LEANN现在就可以在Claude Code中完美工作！**
 - ✅ 搜索功能正常
 - ✅ RAG问答功能正常
 - ✅ 索引构建功能正常
 - ✅ 支持多种数据源
 - ✅ 支持本地LLM
 只需要记住加上 `--recompute-embeddings` 参数就行！
--- a/packages/leann-backend-diskann/third_party/DiskANN
+++ b/packages/leann-backend-diskann/third_party/DiskANN
--- a/packages/leann-core/pyproject.toml
+++ b/packages/leann-core/pyproject.toml
@@ -44,6 +44,7 @@ colab = [
 [project.scripts]
 leann = "leann.cli:main"
 leann_mcp = "leann.mcp:main"
 [tool.setuptools.packages.find]
 where = ["src"]
--- a/packages/leann-core/src/leann/cli.py
+++ b/packages/leann-core/src/leann/cli.py
@@ -41,13 +41,23 @@ def extract_pdf_text_with_pdfplumber(file_path: str) -> str:
 class LeannCLI:
    def __init__(self):
-        self.indexes_dir = Path.home() / ".leann" / "indexes"
+        # Always use project-local .leann directory (like .git)
        self.indexes_dir = Path.cwd() / ".leann" / "indexes"
        self.indexes_dir.mkdir(parents=True, exist_ok=True)
        # Default parser for documents
        self.node_parser = SentenceSplitter(
            chunk_size=256, chunk_overlap=128, separator=" ", paragraph_separator="\n\n"
        )
        # Code-optimized parser
        self.code_parser = SentenceSplitter(
            chunk_size=512,  # Larger chunks for code context
            chunk_overlap=50,  # Less overlap to preserve function boundaries
            separator="\n",  # Split by lines for code
            paragraph_separator="\n\n",  # Preserve logical code blocks
        )
    def get_index_path(self, index_name: str) -> str:
        index_dir = self.indexes_dir / index_name
        return str(index_dir / "documents.leann")
@@ -76,7 +86,9 @@ Examples:
        # Build command
        build_parser = subparsers.add_parser("build", help="Build document index")
        build_parser.add_argument("index_name", help="Index name")
-        build_parser.add_argument("--docs", type=str, required=True, help="Documents directory")
+        build_parser.add_argument(
            "--docs", type=str, default=".", help="Documents directory (default: current directory)"
        )
        build_parser.add_argument(
            "--backend", type=str, default="hnsw", choices=["hnsw", "diskann"]
        )
@@ -138,35 +150,107 @@ Examples:
        return parser
    def register_project_dir(self):
        """Register current project directory in global registry"""
        global_registry = Path.home() / ".leann" / "projects.json"
        global_registry.parent.mkdir(exist_ok=True)
        current_dir = str(Path.cwd())
        # Load existing registry
        projects = []
        if global_registry.exists():
            try:
                import json
                with open(global_registry) as f:
                    projects = json.load(f)
            except Exception:
                projects = []
        # Add current directory if not already present
        if current_dir not in projects:
            projects.append(current_dir)
        # Save registry
        import json
        with open(global_registry, "w") as f:
            json.dump(projects, f, indent=2)
    def list_indexes(self):
        print("Stored LEANN indexes:")
-        if not self.indexes_dir.exists():
+        # Get all project directories with .leann
        global_registry = Path.home() / ".leann" / "projects.json"
        all_projects = []
        if global_registry.exists():
            try:
                import json
                with open(global_registry) as f:
                    all_projects = json.load(f)
            except Exception:
                pass
        # Filter to only existing directories with .leann
        valid_projects = []
        for project_dir in all_projects:
            project_path = Path(project_dir)
            if project_path.exists() and (project_path / ".leann" / "indexes").exists():
                valid_projects.append(project_path)
        # Add current project if it has .leann but not in registry
        current_path = Path.cwd()
        if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects:
            valid_projects.append(current_path)
        if not valid_projects:
            print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
            return
-        index_dirs = [d for d in self.indexes_dir.iterdir() if d.is_dir()]
+        total_indexes = 0
        current_dir = Path.cwd()
        for project_path in valid_projects:
            indexes_dir = project_path / ".leann" / "indexes"
            if not indexes_dir.exists():
                continue
            index_dirs = [d for d in indexes_dir.iterdir() if d.is_dir()]
            if not index_dirs:
-            print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
+                continue
            return
-        print(f"Found {len(index_dirs)} indexes:")
+            # Show project header
-        for i, index_dir in enumerate(index_dirs, 1):
+            if project_path == current_dir:
                print(f"\n📁 Current project ({project_path}):")
            else:
                print(f"\n📂 {project_path}:")
            for index_dir in index_dirs:
                total_indexes += 1
                index_name = index_dir.name
-            status = "✓" if self.index_exists(index_name) else "✗"
+                meta_file = index_dir / "documents.leann.meta.json"
                status = "✓" if meta_file.exists() else "✗"
-            print(f"  {i}. {index_name} [{status}]")
+                print(f"  {total_indexes}. {index_name} [{status}]")
-            if self.index_exists(index_name):
+                if status == "✓":
                index_dir / "documents.leann.meta.json"
                    size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / (
                        1024 * 1024
                    )
                    print(f"     Size: {size_mb:.1f} MB")
-        if index_dirs:
+        if total_indexes > 0:
-            example_name = index_dirs[0].name
+            print(f"\nTotal: {total_indexes} indexes across {len(valid_projects)} projects")
-            print("\nUsage:")
+            print("\nUsage (current project only):")
            # Show example from current project
            current_indexes_dir = current_dir / ".leann" / "indexes"
            if current_indexes_dir.exists():
                current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()]
                if current_index_dirs:
                    example_name = current_index_dirs[0].name
                    print(f'  leann search {example_name} "your query"')
                    print(f"  leann ask {example_name} --interactive")
@@ -203,17 +287,125 @@ Examples:
                documents.extend(default_docs)
        # Load other file types with default reader
        code_extensions = [
            # Original document types
            ".txt",
            ".md",
            ".docx",
            # Code files for Claude Code integration
            ".py",
            ".js",
            ".ts",
            ".jsx",
            ".tsx",
            ".java",
            ".cpp",
            ".c",
            ".h",
            ".hpp",
            ".cs",
            ".go",
            ".rs",
            ".rb",
            ".php",
            ".swift",
            ".kt",
            ".scala",
            ".r",
            ".sql",
            ".sh",
            ".bash",
            ".zsh",
            ".fish",
            ".ps1",
            ".bat",
            # Config and markup files
            ".json",
            ".yaml",
            ".yml",
            ".xml",
            ".toml",
            ".ini",
            ".cfg",
            ".conf",
            ".html",
            ".css",
            ".scss",
            ".less",
            ".vue",
            ".svelte",
            # Data science
            ".ipynb",
            ".R",
            ".py",
            ".jl",
        ]
        other_docs = SimpleDirectoryReader(
            docs_dir,
            recursive=True,
            encoding="utf-8",
-            required_exts=[".txt", ".md", ".docx"],
+            required_exts=code_extensions,
        ).load_data(show_progress=True)
        documents.extend(other_docs)
        all_texts = []
        # Define code file extensions for intelligent chunking
        code_file_exts = {
            ".py",
            ".js",
            ".ts",
            ".jsx",
            ".tsx",
            ".java",
            ".cpp",
            ".c",
            ".h",
            ".hpp",
            ".cs",
            ".go",
            ".rs",
            ".rb",
            ".php",
            ".swift",
            ".kt",
            ".scala",
            ".r",
            ".sql",
            ".sh",
            ".bash",
            ".zsh",
            ".fish",
            ".ps1",
            ".bat",
            ".json",
            ".yaml",
            ".yml",
            ".xml",
            ".toml",
            ".ini",
            ".cfg",
            ".conf",
            ".html",
            ".css",
            ".scss",
            ".less",
            ".vue",
            ".svelte",
            ".ipynb",
            ".R",
            ".jl",
        }
        for doc in documents:
-            nodes = self.node_parser.get_nodes_from_documents([doc])
+            # Check if this is a code file based on source path
            source_path = doc.metadata.get("source", "")
            is_code_file = any(source_path.endswith(ext) for ext in code_file_exts)
            # Use appropriate parser based on file type
            parser = self.code_parser if is_code_file else self.node_parser
            nodes = parser.get_nodes_from_documents([doc])
            for node in nodes:
                all_texts.append(node.get_content())
@@ -226,6 +418,8 @@ Examples:
        index_dir = self.indexes_dir / index_name
        index_path = self.get_index_path(index_name)
        print(f"📂 Indexing: {Path(docs_dir).resolve()}")
        if index_dir.exists() and not args.force:
            print(f"Index '{index_name}' already exists. Use --force to rebuild.")
            return
@@ -255,6 +449,9 @@ Examples:
        builder.build_index(index_path)
        print(f"Index built at {index_path}")
        # Register this project directory in global registry
        self.register_project_dir()
    async def search_documents(self, args):
        index_name = args.index_name
        query = args.query
--- a/packages/leann-core/src/leann/mcp.py
+++ b/packages/leann-core/src/leann/mcp.py
@@ -0,0 +1,134 @@
 #!/usr/bin/env python3
 import json
 import os
 import subprocess
 import sys
 def handle_request(request):
    if request.get("method") == "initialize":
        return {
            "jsonrpc": "2.0",
            "id": request.get("id"),
            "result": {
                "capabilities": {"tools": {}},
                "protocolVersion": "2024-11-05",
                "serverInfo": {"name": "leann-mcp", "version": "1.0.0"},
            },
        }
    elif request.get("method") == "tools/list":
        return {
            "jsonrpc": "2.0",
            "id": request.get("id"),
            "result": {
                "tools": [
                    {
                        "name": "leann_search",
                        "description": "Search LEANN index",
                        "inputSchema": {
                            "type": "object",
                            "properties": {
                                "index_name": {"type": "string"},
                                "query": {"type": "string"},
                                "top_k": {"type": "integer", "default": 5},
                            },
                            "required": ["index_name", "query"],
                        },
                    },
                    {
                        "name": "leann_ask",
                        "description": "Ask question using LEANN RAG",
                        "inputSchema": {
                            "type": "object",
                            "properties": {
                                "index_name": {"type": "string"},
                                "question": {"type": "string"},
                            },
                            "required": ["index_name", "question"],
                        },
                    },
                    {
                        "name": "leann_list",
                        "description": "List all LEANN indexes",
                        "inputSchema": {"type": "object", "properties": {}},
                    },
                ]
            },
        }
    elif request.get("method") == "tools/call":
        tool_name = request["params"]["name"]
        args = request["params"].get("arguments", {})
        # Set working directory and environment
        env = os.environ.copy()
        cwd = "/Users/andyl/Projects/LEANN-RAG"
        try:
            if tool_name == "leann_search":
                cmd = [
                    "leann",
                    "search",
                    args["index_name"],
                    args["query"],
                    "--recompute-embeddings",
                    f"--top-k={args.get('top_k', 5)}",
                ]
                result = subprocess.run(cmd, capture_output=True, text=True, cwd=cwd, env=env)
            elif tool_name == "leann_ask":
                cmd = f'echo "{args["question"]}" | leann ask {args["index_name"]} --recompute-embeddings --llm ollama --model qwen3:8b'
                result = subprocess.run(
                    cmd, shell=True, capture_output=True, text=True, cwd=cwd, env=env
                )
            elif tool_name == "leann_list":
                result = subprocess.run(
                    ["leann", "list"], capture_output=True, text=True, cwd=cwd, env=env
                )
            return {
                "jsonrpc": "2.0",
                "id": request.get("id"),
                "result": {
                    "content": [
                        {
                            "type": "text",
                            "text": result.stdout
                            if result.returncode == 0
                            else f"Error: {result.stderr}",
                        }
                    ]
                },
            }
        except Exception as e:
            return {
                "jsonrpc": "2.0",
                "id": request.get("id"),
                "error": {"code": -1, "message": str(e)},
            }
 def main():
    for line in sys.stdin:
        try:
            request = json.loads(line.strip())
            response = handle_request(request)
            if response:
                print(json.dumps(response))
                sys.stdout.flush()
        except Exception as e:
            error_response = {
                "jsonrpc": "2.0",
                "id": None,
                "error": {"code": -1, "message": str(e)},
            }
            print(json.dumps(error_response))
            sys.stdout.flush()
 if __name__ == "__main__":
    main()
--- a/packages/leann-mcp/README.md
+++ b/packages/leann-mcp/README.md
@@ -0,0 +1,69 @@
 # LEANN Claude Code Integration
 Intelligent code assistance using LEANN's vector search directly in Claude Code.
 ## Prerequisites
 First, install LEANN CLI globally:
 ```bash
 uv tool install leann
 ```
 This makes the `leann` command available system-wide, which `leann_mcp` requires.
 ## Quick Setup
 Add the LEANN MCP server to Claude Code:
 ```bash
 claude mcp add leann-server -- leann_mcp
 ```
 ## Available Tools
 - **`leann_list`** - List available indexes across all projects
 - **`leann_search`** - Search code and documents with semantic queries
 - **`leann_ask`** - Ask questions and get AI-powered answers from your codebase
 ## Quick Start
 ```bash
 # Build an index for your project
 leann build my-project
 # Start Claude Code
 claude
 ```
 Then in Claude Code:
 ```
 Help me understand this codebase. List available indexes and search for authentication patterns.
 ```
 <p align="center">
  <img src="../../assets/claude_code_leann.png" alt="LEANN in Claude Code" width="80%">
 </p>
 ## How It Works
 - **`leann`** - Core CLI tool for indexing and searching (installed globally)
 - **`leann_mcp`** - MCP server that wraps `leann` commands for Claude Code integration
 - Claude Code calls `leann_mcp`, which executes `leann` commands and returns results
 ## File Support
 Python, JavaScript, TypeScript, Java, Go, Rust, SQL, YAML, JSON, and 30+ more file types.
 ## Storage
 - Project indexes in `.leann/` directory (like `.git`)
 - Global project registry at `~/.leann/projects.json`
 - Multi-project support built-in
 ## Removing
 ```bash
 claude mcp remove leann-server
 ```
Author	SHA1	Message	Date
Andy Lee	b55eeeae5f	Merge remote-tracking branch 'origin/main' into feature/claude-code-research	2025-08-05 23:02:00 -07:00
Andy Lee	e890b2311f	feat: Add Claude Code integration with MCP server	2025-08-05 14:03:36 -07:00
Andy Lee	f3d99fd118	feat: Claude Code integration ready - LEANN CLI works out of the box ✅ Verified LEANN CLI works perfectly with Claude Code ✅ Added integration guide with working examples ✅ Documented simple workflow for immediate use Key findings: - No code changes needed - Just need --recompute-embeddings flag - Search, ask, and build all work - Ready for Claude Code agents and workflows	2025-08-05 12:27:58 -07:00
Andy Lee	8eee90bf80	docs: add a link	2025-08-04 20:10:14 -07:00
Andy Lee	649d4ad03e	docs: Address all configuration guide feedback - Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations	2025-08-04 20:01:23 -07:00
Andy Lee	d9b6f195c5	docs: Improve configuration guide based on feedback - List specific files in default data/ directory (2 AI papers, literature, tech report) - Update examples to use English and better RAG-suitable queries - Change full dataset reference to use --max-items -1 - Adjust small model guidance about upgrading to larger models when time allows - Update top-k defaults to reflect actual default of 20 - Ensure consistent use of full model name Qwen/Qwen3-Embedding-0.6B - Reorder optimization steps, move MLX to third position - Remove incorrect chunk size tuning guidance - Change README from 'Having trouble' to 'Need best practices'	2025-08-04 19:29:17 -07:00
Andy Lee	00f506c0bd	docs: Adjust DiskANN positioning in features and roadmap - features.md: Put HNSW/FAISS first as default, DiskANN as optional - roadmap.md: Reorder to show HNSW integration before DiskANN - Consistent with positioning DiskANN as advanced option for large-scale use	2025-08-04 17:53:27 -07:00
Andy Lee	e872dd1d23	docs: Weaken DiskANN emphasis in README - Change backend description to emphasize HNSW as default - DiskANN positioned as optional for billion-scale datasets - Simplify evaluation commands to be more generic	2025-08-04 17:51:21 -07:00
Andy Lee	063c687ff7	chore: move evaluation data .gitattributes to correct location	2025-08-04 17:46:17 -07:00
Andy Lee	bb8ecd54d7	feat: add comprehensive configuration guide and update README - Create docs/configuration-guide.md with detailed guidance on: - Embedding model selection (small/medium/large) - Index selection (HNSW vs DiskANN) - LLM engine and model comparison - Parameter tuning (build/search complexity, top-k) - Performance optimization tips - Deep dive into LEANN's recomputation feature - Update README.md to link to the configuration guide - Include latest 2025 model recommendations (Qwen3, DeepSeek-R1, O3-mini)	2025-08-04 17:41:27 -07:00
Andy Lee	716217ae24	docs: config guidance	2025-08-04 16:21:13 -07:00