chore: release v0.2.5

Propagate hosts argument for ollama through chat.py (#21 )
* Propigate hosts argument for ollama through chat.py * Apply suggestions from code review Good AI slop suggestions. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-08 22:32:27 +00:00 · 2025-08-08 15:31:15 -07:00 · 2025-08-08 01:04:13 -07:00 · 2025-08-08 00:58:36 -07:00 · 2025-08-08 07:14:51 +00:00 · 2025-08-08 00:08:56 -07:00
12 changed files with 147 additions and 253 deletions
--- a/README.md
+++ b/README.md
@@ -6,6 +6,7 @@
  <img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+">
  <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License">
  <img src="https://img.shields.io/badge/Platform-Linux%20%7C%20macOS-lightgrey" alt="Platform">
+  <img src="https://img.shields.io/badge/MCP-Native%20Integration-blue?style=flat-square" alt="MCP Integration">
 </p>

 <h2 align="center" tabindex="-1" class="heading-element" dir="auto">
@@ -16,9 +17,10 @@ LEANN is an innovative vector database that democratizes personal AI. Transform

 LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#️-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)

-**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
+**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.

-> **🚀 NEW: Claude Code Integration!** LEANN now provides native MCP integration for Claude Code users. Index your codebase and get intelligent code assistance directly in Claude Code. [Setup Guide →](packages/leann-mcp/README.md)
+
+\* Claude Code only supports basic `grep`-style keyword search.  **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow.



@@ -28,7 +30,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
  <img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%">
 </p>

-> **The numbers speak for themselves:** Index 60 million Wikipedia chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
+> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)


 🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -221,7 +223,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
  <img src="videos/paper_clear.gif" alt="LEANN Document Search Demo" width="600">
 </p>

-The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a README in Chinese) and this is the **easiest example** to run here:
+The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:

 ```bash
 source .venv/bin/activate # Don't forget to activate the virtual environment
@@ -416,7 +418,26 @@ Once the index is built, you can ask questions like:

 </details>

+### 🚀 Claude Code Integration: Transform Your Development Workflow!

+**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
+
+**Key features:**
+- 🔍 **Semantic code search** across your entire project
+- 📚 **Context-aware assistance** for debugging and development
+- 🚀 **Zero-config setup** with automatic language detection
+
+```bash
+# Install LEANN globally for MCP integration
+uv tool install leann-core
+
+# Setup is automatic - just start using Claude Code!
+```
+Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
+
+![LEANN MCP Integration](assets/mcp_leann.png)
+
+**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)

 ## 🖥️ Command Line Interface

@@ -446,11 +467,8 @@ leann --help
 ### Usage Examples

 ```bash
-# Build an index from current directory (default)
-leann build my-docs
-
-# Or from specific directory
-leann build my-docs --docs ./documents
+# build from a specific directory, and my_docs is the index name
+leann build my-docs --docs ./your_documents

 # Search your documents
 leann search my-docs "machine learning concepts"
--- a/assets/mcp_leann.png
+++ b/assets/mcp_leann.png
--- a/docs/claude-code-integration.md
+++ b/docs/claude-code-integration.md
@@ -1,150 +0,0 @@
-# Claude Code x LEANN 集成指南
-
-## ✅ 现状：已经可以工作！
-
-好消息：LEANN CLI已经完全可以在Claude Code中使用，无需任何修改！
-
-## 🚀 立即开始
-
-### 1. 激活环境
-```bash
-# 在LEANN项目目录下
-source .venv/bin/activate.fish  # fish shell
-# 或
-source .venv/bin/activate       # bash shell
-```
-
-### 2. 基本命令
-
-#### 查看现有索引
-```bash
-leann list
-```
-
-#### 搜索文档
-```bash
-leann search my-docs "machine learning" --recompute-embeddings
-```
-
-#### 问答对话
-```bash
-echo "What is machine learning?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
-```
-
-#### 构建新索引
-```bash
-leann build project-docs --docs ./src --recompute-embeddings
-```
-
-## 💡 Claude Code 使用技巧
-
-### 在Claude Code中直接使用
-
-1. **激活环境**：
-   ```bash
-   cd /Users/andyl/Projects/LEANN-RAG
-   source .venv/bin/activate.fish
-   ```
-
-2. **搜索代码库**：
-   ```bash
-   leann search my-docs "authentication patterns" --recompute-embeddings --top-k 10
-   ```
-
-3. **智能问答**：
-   ```bash
-   echo "How does the authentication system work?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
-   ```
-
-### 批量操作示例
-
-```bash
-# 构建项目文档索引
-leann build project-docs --docs ./docs --force
-
-# 搜索多个关键词
-leann search project-docs "API authentication" --recompute-embeddings
-leann search project-docs "database schema" --recompute-embeddings
-leann search project-docs "deployment guide" --recompute-embeddings
-
-# 问答模式
-echo "What are the API endpoints?" | leann ask project-docs --recompute-embeddings
-```
-
-## 🎯 Claude 可以立即执行的工作流
-
-### 代码分析工作流
-```bash
-# 1. 构建代码库索引
-leann build codebase --docs ./src --backend hnsw --recompute-embeddings
-
-# 2. 分析架构
-echo "What is the overall architecture?" | leann ask codebase --recompute-embeddings
-
-# 3. 查找特定功能
-leann search codebase "user authentication" --recompute-embeddings --top-k 5
-
-# 4. 理解实现细节
-echo "How is user authentication implemented?" | leann ask codebase --recompute-embeddings
-```
-
-### 文档理解工作流
-```bash
-# 1. 索引项目文档
-leann build docs --docs ./docs --recompute-embeddings
-
-# 2. 快速查找信息
-leann search docs "installation requirements" --recompute-embeddings
-
-# 3. 获取详细说明
-echo "What are the system requirements?" | leann ask docs --recompute-embeddings
-```
-
-## ⚠️ 重要提示
-
-1. **必须使用 `--recompute-embeddings`** - 这是关键参数，不加会报错
-2. **需要先激活虚拟环境** - 确保有LEANN的Python环境
-3. **Ollama需要预先安装** - ask功能需要本地LLM
-
-## 🔥 立即可用的Claude提示词
-
-```
-Help me analyze this codebase using LEANN:
-
-1. First, activate the environment:
-   cd /Users/andyl/Projects/LEANN-RAG && source .venv/bin/activate.fish
-
-2. Build an index of the source code:
-   leann build codebase --docs ./src --recompute-embeddings
-
-3. Search for authentication patterns:
-   leann search codebase "authentication middleware" --recompute-embeddings --top-k 10
-
-4. Ask about the authentication system:
-   echo "How does user authentication work in this codebase?" | leann ask codebase --recompute-embeddings
-
-Please execute these commands and help me understand the code structure.
-```
-
-## 📈 下一步改进计划
-
-虽然现在已经可以用，但还可以进一步优化：
-
-1. **简化命令** - 默认启用recompute-embeddings
-2. **配置文件** - 避免重复输入参数
-3. **状态管理** - 自动检测环境和索引
-4. **输出格式** - 更适合Claude解析的格式
-
-但这些都是锦上添花，现在就能用起来！
-
-## 🎉 总结
-
-**LEANN现在就可以在Claude Code中完美工作！**
-
- ✅ 搜索功能正常
- ✅ RAG问答功能正常
- ✅ 索引构建功能正常
- ✅ 支持多种数据源
- ✅ 支持本地LLM
-
-只需要记住加上 `--recompute-embeddings` 参数就行！
--- a/packages/leann-backend-diskann/pyproject.toml
+++ b/packages/leann-backend-diskann/pyproject.toml
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-diskann"
-version = "0.2.1"
-dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
+version = "0.2.5"
+dependencies = ["leann-core==0.2.5", "numpy", "protobuf>=3.19.0"]

 [tool.scikit-build]
 # Key: simplified CMake path
--- a/packages/leann-backend-diskann/third_party/DiskANN
+++ b/packages/leann-backend-diskann/third_party/DiskANN
--- a/packages/leann-backend-hnsw/pyproject.toml
+++ b/packages/leann-backend-hnsw/pyproject.toml
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-hnsw"
-version = "0.2.1"
+version = "0.2.5"
 description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
 dependencies = [
-    "leann-core==0.2.1",
+    "leann-core==0.2.5",
    "numpy",
    "pyzmq>=23.0.0",
    "msgpack>=1.0.0",
--- a/packages/leann-core/pyproject.toml
+++ b/packages/leann-core/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann-core"
-version = "0.2.1"
+version = "0.2.5"
 description = "Core API and plugin system for LEANN"
 readme = "README.md"
 requires-python = ">=3.9"
--- a/packages/leann-core/src/leann/chat.py
+++ b/packages/leann-core/src/leann/chat.py
@@ -17,12 +17,12 @@ logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)


-def check_ollama_models() -> list[str]:
+def check_ollama_models(host: str) -> list[str]:
    """Check available Ollama models and return a list"""
    try:
        import requests

-        response = requests.get("http://localhost:11434/api/tags", timeout=5)
+        response = requests.get(f"{host}/api/tags", timeout=5)
        if response.status_code == 200:
            data = response.json()
            return [model["name"] for model in data.get("models", [])]
@@ -309,10 +309,12 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:
    return search_hf_models_fuzzy(query, limit)


-def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
+def validate_model_and_suggest(
+    model_name: str, llm_type: str, host: str = "http://localhost:11434"
+) -> str | None:
    """Validate model name and provide suggestions if invalid"""
    if llm_type == "ollama":
-        available_models = check_ollama_models()
+        available_models = check_ollama_models(host)
        if available_models and model_name not in available_models:
            error_msg = f"Model '{model_name}' not found in your local Ollama installation."

@@ -469,7 +471,7 @@ class OllamaChat(LLMInterface):
                requests.get(host)

            # Pre-check model availability with helpful suggestions
-            model_error = validate_model_and_suggest(model, "ollama")
+            model_error = validate_model_and_suggest(model, "ollama", host)
            if model_error:
                raise ValueError(model_error)

--- a/packages/leann-core/src/leann/cli.py
+++ b/packages/leann-core/src/leann/cli.py
@@ -74,10 +74,11 @@ class LeannCLI:
            formatter_class=argparse.RawDescriptionHelpFormatter,
            epilog="""
 Examples:
-  leann build my-docs --docs ./documents    # Build index named my-docs
-  leann search my-docs "query"             # Search in my-docs index
-  leann ask my-docs "question"             # Ask my-docs index
-  leann list                              # List all stored indexes
+  leann build my-docs --docs ./documents                    # Build index named my-docs
+  leann build my-ppts --docs ./ --file-types .pptx,.pdf    # Index only PowerPoint and PDF files
+  leann search my-docs "query"                             # Search in my-docs index
+  leann ask my-docs "question"                             # Ask my-docs index
+  leann list                                              # List all stored indexes
            """,
        )

@@ -99,6 +100,11 @@ Examples:
        build_parser.add_argument("--num-threads", type=int, default=1)
        build_parser.add_argument("--compact", action="store_true", default=True)
        build_parser.add_argument("--recompute", action="store_true", default=True)
+        build_parser.add_argument(
+            "--file-types",
+            type=str,
+            help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
+        )

        # Search command
        search_parser = subparsers.add_parser("search", help="Search documents")
@@ -108,7 +114,12 @@ Examples:
        search_parser.add_argument("--complexity", type=int, default=64)
        search_parser.add_argument("--beam-width", type=int, default=1)
        search_parser.add_argument("--prune-ratio", type=float, default=0.0)
-        search_parser.add_argument("--recompute-embeddings", action="store_true")
+        search_parser.add_argument(
+            "--recompute-embeddings",
+            action="store_true",
+            default=True,
+            help="Recompute embeddings (default: True)",
+        )
        search_parser.add_argument(
            "--pruning-strategy",
            choices=["global", "local", "proportional"],
@@ -131,7 +142,12 @@ Examples:
        ask_parser.add_argument("--complexity", type=int, default=32)
        ask_parser.add_argument("--beam-width", type=int, default=1)
        ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
-        ask_parser.add_argument("--recompute-embeddings", action="store_true")
+        ask_parser.add_argument(
+            "--recompute-embeddings",
+            action="store_true",
+            default=True,
+            help="Recompute embeddings (default: True)",
+        )
        ask_parser.add_argument(
            "--pruning-strategy",
            choices=["global", "local", "proportional"],
@@ -254,8 +270,10 @@ Examples:
                    print(f'  leann search {example_name} "your query"')
                    print(f"  leann ask {example_name} --interactive")

-    def load_documents(self, docs_dir: str):
+    def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
        print(f"Loading documents from {docs_dir}...")
+        if custom_file_types:
+            print(f"Using custom file types: {custom_file_types}")

        # Try to use better PDF parsers first
        documents = []
@@ -287,66 +305,81 @@ Examples:
                documents.extend(default_docs)

        # Load other file types with default reader
-        code_extensions = [
-            # Original document types
-            ".txt",
-            ".md",
-            ".docx",
-            # Code files for Claude Code integration
-            ".py",
-            ".js",
-            ".ts",
-            ".jsx",
-            ".tsx",
-            ".java",
-            ".cpp",
-            ".c",
-            ".h",
-            ".hpp",
-            ".cs",
-            ".go",
-            ".rs",
-            ".rb",
-            ".php",
-            ".swift",
-            ".kt",
-            ".scala",
-            ".r",
-            ".sql",
-            ".sh",
-            ".bash",
-            ".zsh",
-            ".fish",
-            ".ps1",
-            ".bat",
-            # Config and markup files
-            ".json",
-            ".yaml",
-            ".yml",
-            ".xml",
-            ".toml",
-            ".ini",
-            ".cfg",
-            ".conf",
-            ".html",
-            ".css",
-            ".scss",
-            ".less",
-            ".vue",
-            ".svelte",
-            # Data science
-            ".ipynb",
-            ".R",
-            ".py",
-            ".jl",
-        ]
-        other_docs = SimpleDirectoryReader(
-            docs_dir,
-            recursive=True,
-            encoding="utf-8",
-            required_exts=code_extensions,
-        ).load_data(show_progress=True)
-        documents.extend(other_docs)
+        if custom_file_types:
+            # Parse custom file types from comma-separated string
+            code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
+            # Ensure extensions start with a dot
+            code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
+        else:
+            # Use default supported file types
+            code_extensions = [
+                # Original document types
+                ".txt",
+                ".md",
+                ".docx",
+                ".pptx",
+                # Code files for Claude Code integration
+                ".py",
+                ".js",
+                ".ts",
+                ".jsx",
+                ".tsx",
+                ".java",
+                ".cpp",
+                ".c",
+                ".h",
+                ".hpp",
+                ".cs",
+                ".go",
+                ".rs",
+                ".rb",
+                ".php",
+                ".swift",
+                ".kt",
+                ".scala",
+                ".r",
+                ".sql",
+                ".sh",
+                ".bash",
+                ".zsh",
+                ".fish",
+                ".ps1",
+                ".bat",
+                # Config and markup files
+                ".json",
+                ".yaml",
+                ".yml",
+                ".xml",
+                ".toml",
+                ".ini",
+                ".cfg",
+                ".conf",
+                ".html",
+                ".css",
+                ".scss",
+                ".less",
+                ".vue",
+                ".svelte",
+                # Data science
+                ".ipynb",
+                ".R",
+                ".py",
+                ".jl",
+            ]
+        # Try to load other file types, but don't fail if none are found
+        try:
+            other_docs = SimpleDirectoryReader(
+                docs_dir,
+                recursive=True,
+                encoding="utf-8",
+                required_exts=code_extensions,
+            ).load_data(show_progress=True)
+            documents.extend(other_docs)
+        except ValueError as e:
+            if "No files found" in str(e):
+                print("No additional files found for other supported types.")
+            else:
+                raise e

        all_texts = []

@@ -424,7 +457,7 @@ Examples:
            print(f"Index '{index_name}' already exists. Use --force to rebuild.")
            return

-        all_texts = self.load_documents(docs_dir)
+        all_texts = self.load_documents(docs_dir, args.file_types)
        if not all_texts:
            print("No documents found")
            return
--- a/packages/leann-core/src/leann/mcp.py
+++ b/packages/leann-core/src/leann/mcp.py
@@ -1,7 +1,6 @@
 #!/usr/bin/env python3

 import json
-import os
 import subprocess
 import sys

@@ -62,10 +61,6 @@ def handle_request(request):
        tool_name = request["params"]["name"]
        args = request["params"].get("arguments", {})

-        # Set working directory and environment
-        env = os.environ.copy()
-        cwd = "/Users/andyl/Projects/LEANN-RAG"
-
        try:
            if tool_name == "leann_search":
                cmd = [
@@ -76,18 +71,14 @@ def handle_request(request):
                    "--recompute-embeddings",
                    f"--top-k={args.get('top_k', 5)}",
                ]
-                result = subprocess.run(cmd, capture_output=True, text=True, cwd=cwd, env=env)
+                result = subprocess.run(cmd, capture_output=True, text=True)

            elif tool_name == "leann_ask":
                cmd = f'echo "{args["question"]}" | leann ask {args["index_name"]} --recompute-embeddings --llm ollama --model qwen3:8b'
-                result = subprocess.run(
-                    cmd, shell=True, capture_output=True, text=True, cwd=cwd, env=env
-                )
+                result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

            elif tool_name == "leann_list":
-                result = subprocess.run(
-                    ["leann", "list"], capture_output=True, text=True, cwd=cwd, env=env
-                )
+                result = subprocess.run(["leann", "list"], capture_output=True, text=True)

            return {
                "jsonrpc": "2.0",
--- a/packages/leann-mcp/README.md
+++ b/packages/leann-mcp/README.md
@@ -7,7 +7,7 @@ Intelligent code assistance using LEANN's vector search directly in Claude Code.
 First, install LEANN CLI globally:

 ```bash
-uv tool install leann
+uv tool install leann-core
 ```

 This makes the `leann` command available system-wide, which `leann_mcp` requires.
@@ -30,7 +30,7 @@ claude mcp add leann-server -- leann_mcp

 ```bash
 # Build an index for your project
-leann build my-project
+leann build my-project --docs ./ #change to your doc PATH

 # Start Claude Code
 claude
--- a/packages/leann/pyproject.toml
+++ b/packages/leann/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann"
-version = "0.2.1"
+version = "0.2.5"
 description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
 readme = "README.md"
 requires-python = ">=3.9"
Author	SHA1	Message	Date
GitHub Actions	b6ab6f1993	chore: release v0.2.5	2025-08-08 22:32:27 +00:00
joshuashaffer	9f2e82a838	Propagate hosts argument for ollama through chat.py (#21 ) * Propigate hosts argument for ollama through chat.py * Apply suggestions from code review Good AI slop suggestions. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-08 15:31:15 -07:00
yichuan520030910320	0b2b799d5a	[README]fix instructions in cli	2025-08-08 01:04:13 -07:00
yichuan520030910320	0f790fbbd9	docs: polish README and add optimized MCP integration image - Improve grammar and sentence structure in MCP section - Add proper markdown image formatting with relative paths - Optimize mcp_leann.png size (1.3MB -> 224KB) - Update data description to be more specific about Chinese content	2025-08-08 00:58:36 -07:00
GitHub Actions	387ae21eba	chore: release v0.2.4	2025-08-08 07:14:51 +00:00
Andy Lee	3cc329c3e7	fix: remove hardcoded paths from MCP server and documentation	2025-08-08 00:08:56 -07:00
Andy Lee	5567302316	feat: promote Claude Code integration as primary RAG feature	2025-08-07 23:19:19 -07:00
GitHub Actions	075d4bd167	chore: release v0.2.2	2025-08-08 01:58:40 +00:00
yichuan520030910320	e4bcc76f88	fix cli & make recompute default true	2025-08-07 18:58:04 -07:00
yichuan520030910320	710e83b1fd	fix cli if there is no other type of doc to make it robust	2025-08-07 18:46:05 -07:00
yichuan520030910320	c96d653072	more support for type of docs in cli	2025-08-07 18:14:03 -07:00
Andy Lee	8b22d2b5d3	Merge pull request #19 from yichuan-w/feature/claude-code-research Feature/claude code research	2025-08-05 23:02:34 -07:00