fix: improve gitignore and Jupyter notebook support

- Add nbconvert dependency for .ipynb file support - Replace manual gitignore parsing with gitignore-parser library - Proper recursive .gitignore handling (all subdirectories) - Fix compliance with Git gitignore behavior - Simplify code and improve reliability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
[Readme]update embedding model config according to reddit feedback
2025-08-10 18:52:55 -07:00 · 2025-08-09 21:33:33 -07:00 · 2025-08-10 03:39:45 +00:00 · 2025-08-09 20:37:17 -07:00 · 2025-08-08 18:44:07 -07:00 · 2025-08-08 16:05:35 -07:00
19 changed files with 4818 additions and 3574 deletions
@@ -6,6 +6,7 @@
  <img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+">
  <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License">
  <img src="https://img.shields.io/badge/Platform-Linux%20%7C%20macOS-lightgrey" alt="Platform">
  <img src="https://img.shields.io/badge/MCP-Native%20Integration-blue?style=flat-square" alt="MCP Integration">
 </p>
 <h2 align="center" tabindex="-1" class="heading-element" dir="auto">
@@ -16,7 +17,10 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
 LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#️-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
-**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
+**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
 \* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow. 🔥 Check out [the easy setup →](packages/leann-mcp/README.md)
@@ -26,7 +30,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
  <img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%">
 </p>
-> **The numbers speak for themselves:** Index 60 million Wikipedia chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
+> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
 🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -185,8 +189,8 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
 --force-rebuild         # Force rebuild index even if it exists
 # Embedding Parameters
--embedding-model MODEL  # e.g., facebook/contriever, text-embedding-3-small or mlx-community/multilingual-e5-base-mlx
+--embedding-model MODEL  # e.g., facebook/contriever, text-embedding-3-small, nomic-embed-text, mlx-community/Qwen3-Embedding-0.6B-8bit or nomic-embed-text
--embedding-mode MODE    # sentence-transformers, openai, or mlx
+--embedding-mode MODE    # sentence-transformers, openai, mlx, or ollama
 # LLM Parameters (Text generation models)
 --llm TYPE              # LLM backend: openai, ollama, or hf (default: openai)
@@ -219,7 +223,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
  <img src="videos/paper_clear.gif" alt="LEANN Document Search Demo" width="600">
 </p>
-The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a README in Chinese) and this is the **easiest example** to run here:
+The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:
 ```bash
 source .venv/bin/activate # Don't forget to activate the virtual environment
@@ -414,7 +418,26 @@ Once the index is built, you can ask questions like:
 </details>
 ### 🚀 Claude Code Integration: Transform Your Development Workflow!
 **The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
 **Key features:**
 - 🔍 **Semantic code search** across your entire project
 - 📚 **Context-aware assistance** for debugging and development
 - 🚀 **Zero-config setup** with automatic language detection
 ```bash
 # Install LEANN globally for MCP integration
 uv tool install leann-core
 # Setup is automatic - just start using Claude Code!
 ```
 Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
 ![LEANN MCP Integration](assets/mcp_leann.png)
 **Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
 ## 🖥️ Command Line Interface
@@ -428,7 +451,7 @@ source .venv/bin/activate
 leann --help
 ```
-**To make it globally available (recommended for daily use):**
+**To make it globally available:**
 ```bash
 # Install the LEANN CLI globally using uv tool
 uv tool install leann
@@ -437,13 +460,15 @@ uv tool install leann
 leann --help
 ```
 > **Note**: Global installation is required for Claude Code integration. The `leann_mcp` server depends on the globally available `leann` command.
 ### Usage Examples
 ```bash
-# Build an index from documents
+# build from a specific directory, and my_docs is the index name
-leann build my-docs --docs ./documents
+leann build my-docs --docs ./your_documents
 # Search your documents
 leann search my-docs "machine learning concepts"
@@ -75,7 +75,7 @@ class BaseRAGExample(ABC):
            "--embedding-mode",
            type=str,
            default="sentence-transformers",
-            choices=["sentence-transformers", "openai", "mlx"],
+            choices=["sentence-transformers", "openai", "mlx", "ollama"],
            help="Embedding backend mode (default: sentence-transformers)",
        )
@@ -85,7 +85,7 @@ class BaseRAGExample(ABC):
            "--llm",
            type=str,
            default="openai",
-            choices=["openai", "ollama", "hf"],
+            choices=["openai", "ollama", "hf", "simulated"],
            help="LLM backend to use (default: openai)",
        )
        llm_group.add_argument(
@@ -49,14 +49,25 @@ Based on our experience developing LEANN, embedding models fall into three categ
 - **Cons**: Slower inference, longer index build times
 - **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
-### Quick Start: OpenAI Embeddings (Fastest Setup)
+### Quick Start: Cloud and Local Embedding Options
 **OpenAI Embeddings (Fastest Setup)**
 For immediate testing without local model downloads:
 ```bash
 # Set OpenAI embeddings (requires OPENAI_API_KEY)
 --embedding-mode openai --embedding-model text-embedding-3-small
 ```
 **Ollama Embeddings (Privacy-Focused)**
 For local embeddings with complete privacy:
 ```bash
 # First, pull an embedding model
 ollama pull nomic-embed-text
 # Use Ollama embeddings
 --embedding-mode ollama --embedding-model nomic-embed-text
 ```
 <details>
 <summary><strong>Cloud vs Local Trade-offs</strong></summary>
@@ -211,9 +222,15 @@ python apps/document_rag.py --query "What are the main techniques LEANN explores
 3. **Use MLX on Apple Silicon** (optional optimization):
   ```bash
-   --embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
+   --embedding-mode mlx --embedding-model mlx-community/Qwen3-Embedding-0.6B-8bit
   ```
    MLX might not be the best choice, as we tested and found that it only offers 1.3x acceleration compared to HF, so maybe using ollama is a better choice for embedding generation
 4. **Use Ollama**
   ```bash
   --embedding-mode ollama --embedding-model nomic-embed-text
   ```
   To discover additional embedding models in ollama, check out https://ollama.com/search?c=embedding or read more about embedding models at https://ollama.com/blog/embedding-models, please do check the model size that works best for you
 ### If Search Quality is Poor
 1. **Increase retrieval count**:
@@ -261,7 +261,7 @@ if __name__ == "__main__":
        "--embedding-mode",
        type=str,
        default="sentence-transformers",
-        choices=["sentence-transformers", "openai", "mlx"],
+        choices=["sentence-transformers", "openai", "mlx", "ollama"],
        help="Embedding backend mode",
    )
    parser.add_argument(
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
 [project]
 name = "leann-backend-diskann"
-version = "0.2.1"
+version = "0.2.6"
-dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
+dependencies = ["leann-core==0.2.6", "numpy", "protobuf>=3.19.0"]
 [tool.scikit-build]
 # Key: simplified CMake path
@@ -295,7 +295,7 @@ if __name__ == "__main__":
        "--embedding-mode",
        type=str,
        default="sentence-transformers",
-        choices=["sentence-transformers", "openai", "mlx"],
+        choices=["sentence-transformers", "openai", "mlx", "ollama"],
        help="Embedding backend mode",
    )
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
 [project]
 name = "leann-backend-hnsw"
-version = "0.2.1"
+version = "0.2.6"
 description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
 dependencies = [
-    "leann-core==0.2.1",
+    "leann-core==0.2.6",
    "numpy",
    "pyzmq>=23.0.0",
    "msgpack>=1.0.0",
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "leann-core"
-version = "0.2.1"
+version = "0.2.6"
 description = "Core API and plugin system for LEANN"
 readme = "README.md"
 requires-python = ">=3.9"
@@ -31,6 +31,8 @@ dependencies = [
    "PyPDF2>=3.0.0",
    "pymupdf>=1.23.0",
    "pdfplumber>=0.10.0",
    "nbconvert>=7.0.0",  # For .ipynb file support
    "gitignore-parser>=0.1.12",  # For proper .gitignore handling
    "mlx>=0.26.3; sys_platform == 'darwin'",
    "mlx-lm>=0.26.0; sys_platform == 'darwin'",
 ]
@@ -44,6 +46,7 @@ colab = [
 [project.scripts]
 leann = "leann.cli:main"
 leann_mcp = "leann.mcp:main"
 [tool.setuptools.packages.find]
 where = ["src"]
@@ -17,12 +17,12 @@ logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-def check_ollama_models() -> list[str]:
+def check_ollama_models(host: str) -> list[str]:
    """Check available Ollama models and return a list"""
    try:
        import requests
-        response = requests.get("http://localhost:11434/api/tags", timeout=5)
+        response = requests.get(f"{host}/api/tags", timeout=5)
        if response.status_code == 200:
            data = response.json()
            return [model["name"] for model in data.get("models", [])]
@@ -309,10 +309,12 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:
    return search_hf_models_fuzzy(query, limit)
-def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
+def validate_model_and_suggest(
    model_name: str, llm_type: str, host: str = "http://localhost:11434"
 ) -> str | None:
    """Validate model name and provide suggestions if invalid"""
    if llm_type == "ollama":
-        available_models = check_ollama_models()
+        available_models = check_ollama_models(host)
        if available_models and model_name not in available_models:
            error_msg = f"Model '{model_name}' not found in your local Ollama installation."
@@ -469,7 +471,7 @@ class OllamaChat(LLMInterface):
                requests.get(host)
            # Pre-check model availability with helpful suggestions
-            model_error = validate_model_and_suggest(model, "ollama")
+            model_error = validate_model_and_suggest(model, "ollama", host)
            if model_error:
                raise ValueError(model_error)
@@ -41,13 +41,23 @@ def extract_pdf_text_with_pdfplumber(file_path: str) -> str:
 class LeannCLI:
    def __init__(self):
-        self.indexes_dir = Path.home() / ".leann" / "indexes"
+        # Always use project-local .leann directory (like .git)
        self.indexes_dir = Path.cwd() / ".leann" / "indexes"
        self.indexes_dir.mkdir(parents=True, exist_ok=True)
        # Default parser for documents
        self.node_parser = SentenceSplitter(
            chunk_size=256, chunk_overlap=128, separator=" ", paragraph_separator="\n\n"
        )
        # Code-optimized parser
        self.code_parser = SentenceSplitter(
            chunk_size=512,  # Larger chunks for code context
            chunk_overlap=50,  # Less overlap to preserve function boundaries
            separator="\n",  # Split by lines for code
            paragraph_separator="\n\n",  # Preserve logical code blocks
        )
    def get_index_path(self, index_name: str) -> str:
        index_dir = self.indexes_dir / index_name
        return str(index_dir / "documents.leann")
@@ -65,6 +75,7 @@ class LeannCLI:
            epilog="""
 Examples:
  leann build my-docs --docs ./documents                    # Build index named my-docs
  leann build my-ppts --docs ./ --file-types .pptx,.pdf    # Index only PowerPoint and PDF files
  leann search my-docs "query"                             # Search in my-docs index
  leann ask my-docs "question"                             # Ask my-docs index
  leann list                                              # List all stored indexes
@@ -75,18 +86,34 @@ Examples:
        # Build command
        build_parser = subparsers.add_parser("build", help="Build document index")
-        build_parser.add_argument("index_name", help="Index name")
+        build_parser.add_argument(
-        build_parser.add_argument("--docs", type=str, required=True, help="Documents directory")
+            "index_name", nargs="?", help="Index name (default: current directory name)"
        )
        build_parser.add_argument(
            "--docs", type=str, default=".", help="Documents directory (default: current directory)"
        )
        build_parser.add_argument(
            "--backend", type=str, default="hnsw", choices=["hnsw", "diskann"]
        )
        build_parser.add_argument("--embedding-model", type=str, default="facebook/contriever")
        build_parser.add_argument(
            "--embedding-mode",
            type=str,
            default="sentence-transformers",
            choices=["sentence-transformers", "openai", "mlx", "ollama"],
            help="Embedding backend mode (default: sentence-transformers)",
        )
        build_parser.add_argument("--force", "-f", action="store_true", help="Force rebuild")
        build_parser.add_argument("--graph-degree", type=int, default=32)
        build_parser.add_argument("--complexity", type=int, default=64)
        build_parser.add_argument("--num-threads", type=int, default=1)
        build_parser.add_argument("--compact", action="store_true", default=True)
        build_parser.add_argument("--recompute", action="store_true", default=True)
        build_parser.add_argument(
            "--file-types",
            type=str,
            help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
        )
        # Search command
        search_parser = subparsers.add_parser("search", help="Search documents")
@@ -96,7 +123,12 @@ Examples:
        search_parser.add_argument("--complexity", type=int, default=64)
        search_parser.add_argument("--beam-width", type=int, default=1)
        search_parser.add_argument("--prune-ratio", type=float, default=0.0)
-        search_parser.add_argument("--recompute-embeddings", action="store_true")
+        search_parser.add_argument(
            "--recompute-embeddings",
            action="store_true",
            default=True,
            help="Recompute embeddings (default: True)",
        )
        search_parser.add_argument(
            "--pruning-strategy",
            choices=["global", "local", "proportional"],
@@ -119,7 +151,12 @@ Examples:
        ask_parser.add_argument("--complexity", type=int, default=32)
        ask_parser.add_argument("--beam-width", type=int, default=1)
        ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
-        ask_parser.add_argument("--recompute-embeddings", action="store_true")
+        ask_parser.add_argument(
            "--recompute-embeddings",
            action="store_true",
            default=True,
            help="Recompute embeddings (default: True)",
        )
        ask_parser.add_argument(
            "--pruning-strategy",
            choices=["global", "local", "proportional"],
@@ -138,46 +175,163 @@ Examples:
        return parser
    def register_project_dir(self):
        """Register current project directory in global registry"""
        global_registry = Path.home() / ".leann" / "projects.json"
        global_registry.parent.mkdir(exist_ok=True)
        current_dir = str(Path.cwd())
        # Load existing registry
        projects = []
        if global_registry.exists():
            try:
                import json
                with open(global_registry) as f:
                    projects = json.load(f)
            except Exception:
                projects = []
        # Add current directory if not already present
        if current_dir not in projects:
            projects.append(current_dir)
        # Save registry
        import json
        with open(global_registry, "w") as f:
            json.dump(projects, f, indent=2)
    def _build_gitignore_parser(self, docs_dir: str):
        """Build gitignore parser using gitignore-parser library."""
        from gitignore_parser import parse_gitignore
        # Try to parse the root .gitignore
        gitignore_path = Path(docs_dir) / ".gitignore"
        if gitignore_path.exists():
            try:
                # gitignore-parser automatically handles all subdirectory .gitignore files!
                matches = parse_gitignore(str(gitignore_path))
                print(f"📋 Loaded .gitignore from {docs_dir} (includes all subdirectories)")
                return matches
            except Exception as e:
                print(f"Warning: Could not parse .gitignore: {e}")
        else:
            print("📋 No .gitignore found")
        # Fallback: basic pattern matching for essential files
        essential_patterns = {".git", ".DS_Store", "__pycache__", "node_modules", ".venv", "venv"}
        def basic_matches(file_path):
            path_parts = Path(file_path).parts
            return any(part in essential_patterns for part in path_parts)
        return basic_matches
    def _should_exclude_file(self, relative_path: Path, gitignore_matches) -> bool:
        """Check if a file should be excluded using gitignore parser."""
        return gitignore_matches(str(relative_path))
    def list_indexes(self):
        print("Stored LEANN indexes:")
-        if not self.indexes_dir.exists():
+        # Get all project directories with .leann
        global_registry = Path.home() / ".leann" / "projects.json"
        all_projects = []
        if global_registry.exists():
            try:
                import json
                with open(global_registry) as f:
                    all_projects = json.load(f)
            except Exception:
                pass
        # Filter to only existing directories with .leann
        valid_projects = []
        for project_dir in all_projects:
            project_path = Path(project_dir)
            if project_path.exists() and (project_path / ".leann" / "indexes").exists():
                valid_projects.append(project_path)
        # Add current project if it has .leann but not in registry
        current_path = Path.cwd()
        if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects:
            valid_projects.append(current_path)
        if not valid_projects:
            print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
            return
-        index_dirs = [d for d in self.indexes_dir.iterdir() if d.is_dir()]
+        total_indexes = 0
        current_dir = Path.cwd()
        for project_path in valid_projects:
            indexes_dir = project_path / ".leann" / "indexes"
            if not indexes_dir.exists():
                continue
            index_dirs = [d for d in indexes_dir.iterdir() if d.is_dir()]
            if not index_dirs:
-            print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
+                continue
            return
-        print(f"Found {len(index_dirs)} indexes:")
+            # Show project header
-        for i, index_dir in enumerate(index_dirs, 1):
+            if project_path == current_dir:
                print(f"\n📁 Current project ({project_path}):")
            else:
                print(f"\n📂 {project_path}:")
            for index_dir in index_dirs:
                total_indexes += 1
                index_name = index_dir.name
-            status = "✓" if self.index_exists(index_name) else "✗"
+                meta_file = index_dir / "documents.leann.meta.json"
                status = "✓" if meta_file.exists() else "✗"
-            print(f"  {i}. {index_name} [{status}]")
+                print(f"  {total_indexes}. {index_name} [{status}]")
-            if self.index_exists(index_name):
+                if status == "✓":
                index_dir / "documents.leann.meta.json"
                    size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / (
                        1024 * 1024
                    )
                    print(f"     Size: {size_mb:.1f} MB")
-        if index_dirs:
+        if total_indexes > 0:
-            example_name = index_dirs[0].name
+            print(f"\nTotal: {total_indexes} indexes across {len(valid_projects)} projects")
-            print("\nUsage:")
+            print("\nUsage (current project only):")
            # Show example from current project
            current_indexes_dir = current_dir / ".leann" / "indexes"
            if current_indexes_dir.exists():
                current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()]
                if current_index_dirs:
                    example_name = current_index_dirs[0].name
                    print(f'  leann search {example_name} "your query"')
                    print(f"  leann ask {example_name} --interactive")
-    def load_documents(self, docs_dir: str):
+    def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
        print(f"Loading documents from {docs_dir}...")
        if custom_file_types:
            print(f"Using custom file types: {custom_file_types}")
-        # Try to use better PDF parsers first
+        # Build gitignore parser
        gitignore_matches = self._build_gitignore_parser(docs_dir)
        # Try to use better PDF parsers first, but only if PDFs are requested
        documents = []
        docs_path = Path(docs_dir)
        # Check if we should process PDFs
        should_process_pdfs = custom_file_types is None or ".pdf" in custom_file_types
        if should_process_pdfs:
            for file_path in docs_path.rglob("*.pdf"):
                # Check if file matches any exclude pattern
                relative_path = file_path.relative_to(docs_path)
                if self._should_exclude_file(relative_path, gitignore_matches):
                    continue
                print(f"Processing PDF: {file_path}")
                # Try PyMuPDF first (best quality)
@@ -195,25 +349,172 @@ Examples:
                else:
                    # Fallback to default reader
                    print(f"Using default reader for {file_path}")
                    try:
                        default_docs = SimpleDirectoryReader(
                            str(file_path.parent),
                            filename_as_id=True,
                            required_exts=[file_path.suffix],
                        ).load_data()
                        documents.extend(default_docs)
                    except Exception as e:
                        print(f"Warning: Could not process {file_path}: {e}")
        # Load other file types with default reader
        if custom_file_types:
            # Parse custom file types from comma-separated string
            code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
            # Ensure extensions start with a dot
            code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
        else:
            # Use default supported file types
            code_extensions = [
                # Original document types
                ".txt",
                ".md",
                ".docx",
                ".pptx",
                # Code files for Claude Code integration
                ".py",
                ".js",
                ".ts",
                ".jsx",
                ".tsx",
                ".java",
                ".cpp",
                ".c",
                ".h",
                ".hpp",
                ".cs",
                ".go",
                ".rs",
                ".rb",
                ".php",
                ".swift",
                ".kt",
                ".scala",
                ".r",
                ".sql",
                ".sh",
                ".bash",
                ".zsh",
                ".fish",
                ".ps1",
                ".bat",
                # Config and markup files
                ".json",
                ".yaml",
                ".yml",
                ".xml",
                ".toml",
                ".ini",
                ".cfg",
                ".conf",
                ".html",
                ".css",
                ".scss",
                ".less",
                ".vue",
                ".svelte",
                # Data science
                ".ipynb",
                ".R",
                ".py",
                ".jl",
            ]
        # Try to load other file types, but don't fail if none are found
        try:
            # Create a custom file filter function using our PathSpec
            def file_filter(file_path: str) -> bool:
                """Return True if file should be included (not excluded)"""
                try:
                    docs_path_obj = Path(docs_dir)
                    file_path_obj = Path(file_path)
                    relative_path = file_path_obj.relative_to(docs_path_obj)
                    return not self._should_exclude_file(relative_path, gitignore_matches)
                except (ValueError, OSError):
                    return True  # Include files that can't be processed
            other_docs = SimpleDirectoryReader(
                docs_dir,
                recursive=True,
                encoding="utf-8",
-            required_exts=[".txt", ".md", ".docx"],
+                required_exts=code_extensions,
                file_extractor={},  # Use default extractors
                filename_as_id=True,
            ).load_data(show_progress=True)
-        documents.extend(other_docs)
+
            # Filter documents after loading based on gitignore rules
            filtered_docs = []
            for doc in other_docs:
                file_path = doc.metadata.get("file_path", "")
                if file_filter(file_path):
                    filtered_docs.append(doc)
            documents.extend(filtered_docs)
        except ValueError as e:
            if "No files found" in str(e):
                print("No additional files found for other supported types.")
            else:
                raise e
        all_texts = []
        # Define code file extensions for intelligent chunking
        code_file_exts = {
            ".py",
            ".js",
            ".ts",
            ".jsx",
            ".tsx",
            ".java",
            ".cpp",
            ".c",
            ".h",
            ".hpp",
            ".cs",
            ".go",
            ".rs",
            ".rb",
            ".php",
            ".swift",
            ".kt",
            ".scala",
            ".r",
            ".sql",
            ".sh",
            ".bash",
            ".zsh",
            ".fish",
            ".ps1",
            ".bat",
            ".json",
            ".yaml",
            ".yml",
            ".xml",
            ".toml",
            ".ini",
            ".cfg",
            ".conf",
            ".html",
            ".css",
            ".scss",
            ".less",
            ".vue",
            ".svelte",
            ".ipynb",
            ".R",
            ".jl",
        }
        for doc in documents:
-            nodes = self.node_parser.get_nodes_from_documents([doc])
+            # Check if this is a code file based on source path
            source_path = doc.metadata.get("source", "")
            is_code_file = any(source_path.endswith(ext) for ext in code_file_exts)
            # Use appropriate parser based on file type
            parser = self.code_parser if is_code_file else self.node_parser
            nodes = parser.get_nodes_from_documents([doc])
            for node in nodes:
                all_texts.append(node.get_content())
@@ -222,15 +523,23 @@ Examples:
    async def build_index(self, args):
        docs_dir = args.docs
        # Use current directory name if index_name not provided
        if args.index_name:
            index_name = args.index_name
        else:
            index_name = Path.cwd().name
            print(f"Using current directory name as index: '{index_name}'")
        index_dir = self.indexes_dir / index_name
        index_path = self.get_index_path(index_name)
        print(f"📂 Indexing: {Path(docs_dir).resolve()}")
        if index_dir.exists() and not args.force:
            print(f"Index '{index_name}' already exists. Use --force to rebuild.")
            return
-        all_texts = self.load_documents(docs_dir)
+        all_texts = self.load_documents(docs_dir, args.file_types)
        if not all_texts:
            print("No documents found")
            return
@@ -242,6 +551,7 @@ Examples:
        builder = LeannBuilder(
            backend_name=args.backend,
            embedding_model=args.embedding_model,
            embedding_mode=args.embedding_mode,
            graph_degree=args.graph_degree,
            complexity=args.complexity,
            is_compact=args.compact,
@@ -255,6 +565,9 @@ Examples:
        builder.build_index(index_path)
        print(f"Index built at {index_path}")
        # Register this project directory in global registry
        self.register_project_dir()
    async def search_documents(self, args):
        index_name = args.index_name
        query = args.query
@@ -6,6 +6,7 @@ Preserves all optimization parameters to ensure performance
 import logging
 import os
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from typing import Any
 import numpy as np
@@ -35,7 +36,7 @@ def compute_embeddings(
    Args:
        texts: List of texts to compute embeddings for
        model_name: Model name
-        mode: Computation mode ('sentence-transformers', 'openai', 'mlx')
+        mode: Computation mode ('sentence-transformers', 'openai', 'mlx', 'ollama')
        is_build: Whether this is a build operation (shows progress bar)
        batch_size: Batch size for processing
        adaptive_optimization: Whether to use adaptive optimization based on batch size
@@ -55,6 +56,8 @@ def compute_embeddings(
        return compute_embeddings_openai(texts, model_name)
    elif mode == "mlx":
        return compute_embeddings_mlx(texts, model_name)
    elif mode == "ollama":
        return compute_embeddings_ollama(texts, model_name, is_build=is_build)
    else:
        raise ValueError(f"Unsupported embedding mode: {mode}")
@@ -365,3 +368,262 @@ def compute_embeddings_mlx(chunks: list[str], model_name: str, batch_size: int =
    # Stack numpy arrays
    return np.stack(all_embeddings)
 def compute_embeddings_ollama(
    texts: list[str], model_name: str, is_build: bool = False, host: str = "http://localhost:11434"
 ) -> np.ndarray:
    """
    Compute embeddings using Ollama API.
    Args:
        texts: List of texts to compute embeddings for
        model_name: Ollama model name (e.g., "nomic-embed-text", "mxbai-embed-large")
        is_build: Whether this is a build operation (shows progress bar)
        host: Ollama host URL (default: http://localhost:11434)
    Returns:
        Normalized embeddings array, shape: (len(texts), embedding_dim)
    """
    try:
        import requests
    except ImportError:
        raise ImportError(
            "The 'requests' library is required for Ollama embeddings. Install with: uv pip install requests"
        )
    if not texts:
        raise ValueError("Cannot compute embeddings for empty text list")
    logger.info(
        f"Computing embeddings for {len(texts)} texts using Ollama API, model: '{model_name}'"
    )
    # Check if Ollama is running
    try:
        response = requests.get(f"{host}/api/version", timeout=5)
        response.raise_for_status()
    except requests.exceptions.ConnectionError:
        error_msg = (
            f"❌ Could not connect to Ollama at {host}.\n\n"
            "Please ensure Ollama is running:\n"
            "  • macOS/Linux: ollama serve\n"
            "  • Windows: Make sure Ollama is running in the system tray\n\n"
            "Installation: https://ollama.com/download"
        )
        raise RuntimeError(error_msg)
    except Exception as e:
        raise RuntimeError(f"Unexpected error connecting to Ollama: {e}")
    # Check if model exists and provide helpful suggestions
    try:
        response = requests.get(f"{host}/api/tags", timeout=5)
        response.raise_for_status()
        models = response.json()
        model_names = [model["name"] for model in models.get("models", [])]
        # Filter for embedding models (models that support embeddings)
        embedding_models = []
        suggested_embedding_models = [
            "nomic-embed-text",
            "mxbai-embed-large",
            "bge-m3",
            "all-minilm",
            "snowflake-arctic-embed",
        ]
        for model in model_names:
            # Check if it's an embedding model (by name patterns or known models)
            base_name = model.split(":")[0]
            if any(emb in base_name for emb in ["embed", "bge", "minilm", "e5"]):
                embedding_models.append(model)
        # Check if model exists (handle versioned names)
        model_found = any(
            model_name == name.split(":")[0] or model_name == name for name in model_names
        )
        if not model_found:
            error_msg = f"❌ Model '{model_name}' not found in local Ollama.\n\n"
            # Suggest pulling the model
            error_msg += "📦 To install this embedding model:\n"
            error_msg += f"   ollama pull {model_name}\n\n"
            # Show available embedding models
            if embedding_models:
                error_msg += "✅ Available embedding models:\n"
                for model in embedding_models[:5]:
                    error_msg += f"   • {model}\n"
                if len(embedding_models) > 5:
                    error_msg += f"   ... and {len(embedding_models) - 5} more\n"
            else:
                error_msg += "💡 Popular embedding models to install:\n"
                for model in suggested_embedding_models[:3]:
                    error_msg += f"   • ollama pull {model}\n"
            error_msg += "\n📚 Browse more: https://ollama.com/library"
            raise ValueError(error_msg)
        # Verify the model supports embeddings by testing it
        try:
            test_response = requests.post(
                f"{host}/api/embeddings", json={"model": model_name, "prompt": "test"}, timeout=10
            )
            if test_response.status_code != 200:
                error_msg = (
                    f"⚠️ Model '{model_name}' exists but may not support embeddings.\n\n"
                    f"Please use an embedding model like:\n"
                )
                for model in suggested_embedding_models[:3]:
                    error_msg += f"   • {model}\n"
                raise ValueError(error_msg)
        except requests.exceptions.RequestException:
            # If test fails, continue anyway - model might still work
            pass
    except requests.exceptions.RequestException as e:
        logger.warning(f"Could not verify model existence: {e}")
    # Process embeddings with optimized concurrent processing
    import requests
    def get_single_embedding(text_idx_tuple):
        """Helper function to get embedding for a single text."""
        text, idx = text_idx_tuple
        max_retries = 3
        retry_count = 0
        # Truncate very long texts to avoid API issues
        truncated_text = text[:8000] if len(text) > 8000 else text
        while retry_count < max_retries:
            try:
                response = requests.post(
                    f"{host}/api/embeddings",
                    json={"model": model_name, "prompt": truncated_text},
                    timeout=30,
                )
                response.raise_for_status()
                result = response.json()
                embedding = result.get("embedding")
                if embedding is None:
                    raise ValueError(f"No embedding returned for text {idx}")
                return idx, embedding
            except requests.exceptions.Timeout:
                retry_count += 1
                if retry_count >= max_retries:
                    logger.warning(f"Timeout for text {idx} after {max_retries} retries")
                    return idx, None
            except Exception as e:
                if retry_count >= max_retries - 1:
                    logger.error(f"Failed to get embedding for text {idx}: {e}")
                    return idx, None
                retry_count += 1
        return idx, None
    # Determine if we should use concurrent processing
    use_concurrent = (
        len(texts) > 5 and not is_build
    )  # Don't use concurrent in build mode to avoid overwhelming
    max_workers = min(4, len(texts))  # Limit concurrent requests to avoid overwhelming Ollama
    all_embeddings = [None] * len(texts)  # Pre-allocate list to maintain order
    failed_indices = []
    if use_concurrent:
        logger.info(
            f"Using concurrent processing with {max_workers} workers for {len(texts)} texts"
        )
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            # Submit all tasks
            future_to_idx = {
                executor.submit(get_single_embedding, (text, idx)): idx
                for idx, text in enumerate(texts)
            }
            # Add progress bar for concurrent processing
            try:
                if is_build or len(texts) > 10:
                    from tqdm import tqdm
                    futures_iterator = tqdm(
                        as_completed(future_to_idx),
                        total=len(texts),
                        desc="Computing Ollama embeddings",
                    )
                else:
                    futures_iterator = as_completed(future_to_idx)
            except ImportError:
                futures_iterator = as_completed(future_to_idx)
            # Collect results as they complete
            for future in futures_iterator:
                try:
                    idx, embedding = future.result()
                    if embedding is not None:
                        all_embeddings[idx] = embedding
                    else:
                        failed_indices.append(idx)
                except Exception as e:
                    idx = future_to_idx[future]
                    logger.error(f"Exception for text {idx}: {e}")
                    failed_indices.append(idx)
    else:
        # Sequential processing with progress bar
        show_progress = is_build or len(texts) > 10
        try:
            if show_progress:
                from tqdm import tqdm
                iterator = tqdm(
                    enumerate(texts), total=len(texts), desc="Computing Ollama embeddings"
                )
            else:
                iterator = enumerate(texts)
        except ImportError:
            iterator = enumerate(texts)
        for idx, text in iterator:
            result_idx, embedding = get_single_embedding((text, idx))
            if embedding is not None:
                all_embeddings[idx] = embedding
            else:
                failed_indices.append(idx)
    # Handle failed embeddings
    if failed_indices:
        if len(failed_indices) == len(texts):
            raise RuntimeError("Failed to compute any embeddings")
        logger.warning(f"Failed to compute embeddings for {len(failed_indices)}/{len(texts)} texts")
        # Use zero embeddings as fallback for failed ones
        valid_embedding = next((e for e in all_embeddings if e is not None), None)
        if valid_embedding:
            embedding_dim = len(valid_embedding)
            for idx in failed_indices:
                all_embeddings[idx] = [0.0] * embedding_dim
    # Remove None values and convert to numpy array
    all_embeddings = [e for e in all_embeddings if e is not None]
    # Convert to numpy array and normalize
    embeddings = np.array(all_embeddings, dtype=np.float32)
    # Normalize embeddings (L2 normalization)
    norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
    embeddings = embeddings / (norms + 1e-8)  # Add small epsilon to avoid division by zero
    logger.info(f"Generated {len(embeddings)} embeddings, dimension: {embeddings.shape[1]}")
    return embeddings
@@ -0,0 +1,176 @@
 #!/usr/bin/env python3
 import json
 import subprocess
 import sys
 def handle_request(request):
    if request.get("method") == "initialize":
        return {
            "jsonrpc": "2.0",
            "id": request.get("id"),
            "result": {
                "capabilities": {"tools": {}},
                "protocolVersion": "2024-11-05",
                "serverInfo": {"name": "leann-mcp", "version": "1.0.0"},
            },
        }
    elif request.get("method") == "tools/list":
        return {
            "jsonrpc": "2.0",
            "id": request.get("id"),
            "result": {
                "tools": [
                    {
                        "name": "leann_search",
                        "description": """🔍 Search code using natural language - like having a coding assistant who knows your entire codebase!
 🎯 **Perfect for**:
 - "How does authentication work?" → finds auth-related code
 - "Error handling patterns" → locates try-catch blocks and error logic
 - "Database connection setup" → finds DB initialization code
 - "API endpoint definitions" → locates route handlers
 - "Configuration management" → finds config files and usage
 💡 **Pro tip**: Use this before making any changes to understand existing patterns and conventions.""",
                        "inputSchema": {
                            "type": "object",
                            "properties": {
                                "index_name": {
                                    "type": "string",
                                    "description": "Name of the LEANN index to search. Use 'leann_list' first to see available indexes.",
                                },
                                "query": {
                                    "type": "string",
                                    "description": "Search query - can be natural language (e.g., 'how to handle errors') or technical terms (e.g., 'async function definition')",
                                },
                                "top_k": {
                                    "type": "integer",
                                    "default": 5,
                                    "minimum": 1,
                                    "maximum": 20,
                                    "description": "Number of search results to return. Use 5-10 for focused results, 15-20 for comprehensive exploration.",
                                },
                                "complexity": {
                                    "type": "integer",
                                    "default": 32,
                                    "minimum": 16,
                                    "maximum": 128,
                                    "description": "Search complexity level. Use 16-32 for fast searches (recommended), 64+ for higher precision when needed.",
                                },
                            },
                            "required": ["index_name", "query"],
                        },
                    },
                    {
                        "name": "leann_status",
                        "description": "📊 Check the health and stats of your code indexes - like a medical checkup for your codebase knowledge!",
                        "inputSchema": {
                            "type": "object",
                            "properties": {
                                "index_name": {
                                    "type": "string",
                                    "description": "Optional: Name of specific index to check. If not provided, shows status of all indexes.",
                                }
                            },
                        },
                    },
                    {
                        "name": "leann_list",
                        "description": "📋 Show all your indexed codebases - your personal code library! Use this to see what's available for search.",
                        "inputSchema": {"type": "object", "properties": {}},
                    },
                ]
            },
        }
    elif request.get("method") == "tools/call":
        tool_name = request["params"]["name"]
        args = request["params"].get("arguments", {})
        try:
            if tool_name == "leann_search":
                # Validate required parameters
                if not args.get("index_name") or not args.get("query"):
                    return {
                        "jsonrpc": "2.0",
                        "id": request.get("id"),
                        "result": {
                            "content": [
                                {
                                    "type": "text",
                                    "text": "Error: Both index_name and query are required",
                                }
                            ]
                        },
                    }
                # Build simplified command
                cmd = [
                    "leann",
                    "search",
                    args["index_name"],
                    args["query"],
                    f"--top-k={args.get('top_k', 5)}",
                    f"--complexity={args.get('complexity', 32)}",
                ]
                result = subprocess.run(cmd, capture_output=True, text=True)
            elif tool_name == "leann_status":
                if args.get("index_name"):
                    # Check specific index status - for now, we'll use leann list and filter
                    result = subprocess.run(["leann", "list"], capture_output=True, text=True)
                    # We could enhance this to show more detailed status per index
                else:
                    # Show all indexes status
                    result = subprocess.run(["leann", "list"], capture_output=True, text=True)
            elif tool_name == "leann_list":
                result = subprocess.run(["leann", "list"], capture_output=True, text=True)
            return {
                "jsonrpc": "2.0",
                "id": request.get("id"),
                "result": {
                    "content": [
                        {
                            "type": "text",
                            "text": result.stdout
                            if result.returncode == 0
                            else f"Error: {result.stderr}",
                        }
                    ]
                },
            }
        except Exception as e:
            return {
                "jsonrpc": "2.0",
                "id": request.get("id"),
                "error": {"code": -1, "message": str(e)},
            }
 def main():
    for line in sys.stdin:
        try:
            request = json.loads(line.strip())
            response = handle_request(request)
            if response:
                print(json.dumps(response))
                sys.stdout.flush()
        except Exception as e:
            error_response = {
                "jsonrpc": "2.0",
                "id": None,
                "error": {"code": -1, "message": str(e)},
            }
            print(json.dumps(error_response))
            sys.stdout.flush()
 if __name__ == "__main__":
    main()
@@ -0,0 +1,91 @@
 # 🔥 LEANN Claude Code Integration
 Transform your development workflow with intelligent code assistance using LEANN's semantic search directly in Claude Code.
 ## Prerequisites
 **Step 1:** First, complete the basic LEANN installation following the [📦 Installation guide](../../README.md#installation) in the root README:
 ```bash
 uv venv
 source .venv/bin/activate
 uv pip install leann
 ```
 **Step 2:** Install LEANN globally for MCP integration:
 ```bash
 uv tool install leann-core
 ```
 This makes the `leann` command available system-wide, which `leann_mcp` requires.
 ## 🚀 Quick Setup
 Add the LEANN MCP server to Claude Code:
 ```bash
 claude mcp add leann-server -- leann_mcp
 ```
 ## 🛠️ Available Tools
 Once connected, you'll have access to these powerful semantic search tools in Claude Code:
 - **`leann_list`** - List all available indexes across your projects
 - **`leann_search`** - Perform semantic searches across code and documents
 - **`leann_ask`** - Ask natural language questions and get AI-powered answers from your codebase
 ## 🎯 Quick Start Example
 ```bash
 # Build an index for your project (change to your actual path)
 leann build my-project --docs ./
 # Start Claude Code
 claude
 ```
 **Try this in Claude Code:**
 ```
 Help me understand this codebase. List available indexes and search for authentication patterns.
 ```
 <p align="center">
  <img src="../../assets/claude_code_leann.png" alt="LEANN in Claude Code" width="80%">
 </p>
 ## 🧠 How It Works
 The integration consists of three key components working seamlessly together:
 - **`leann`** - Core CLI tool for indexing and searching (installed globally via `uv tool install`)
 - **`leann_mcp`** - MCP server that wraps `leann` commands for Claude Code integration
 - **Claude Code** - Calls `leann_mcp`, which executes `leann` commands and returns intelligent results
 ## 📁 File Support
 LEANN understands **30+ file types** including:
 - **Programming**: Python, JavaScript, TypeScript, Java, Go, Rust, C++, C#
 - **Data**: SQL, YAML, JSON, CSV, XML
 - **Documentation**: Markdown, TXT, PDF
 - **And many more!**
 ## 💾 Storage & Organization
 - **Project indexes**: Stored in `.leann/` directory (just like `.git`)
 - **Global registry**: Project tracking at `~/.leann/projects.json`
 - **Multi-project support**: Switch between different codebases seamlessly
 - **Portable**: Transfer indexes between machines with minimal overhead
 ## 🗑️ Uninstalling
 To remove the LEANN MCP server from Claude Code:
 ```bash
 claude mcp remove leann-server
 ```
 To remove LEANN
 ```
 uv pip uninstall leann leann-backend-hnsw leann-core
 ```
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "leann"
-version = "0.2.1"
+version = "0.2.6"
 description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
 readme = "README.md"
 requires-python = ">=3.9"
@@ -43,6 +43,9 @@ dependencies = [
    "mlx>=0.26.3; sys_platform == 'darwin'",
    "mlx-lm>=0.26.0; sys_platform == 'darwin'",
    "psutil>=5.8.0",
    "pathspec>=0.12.1",
    "nbconvert>=7.16.6",
    "gitignore-parser>=0.1.12",
 ]
 [project.optional-dependencies]
Author	SHA1	Message	Date
Andy Lee	fe942329d6	fix: improve gitignore and Jupyter notebook support - Add nbconvert dependency for .ipynb file support - Replace manual gitignore parsing with gitignore-parser library - Proper recursive .gitignore handling (all subdirectories) - Fix compliance with Git gitignore behavior - Simplify code and improve reliability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-10 18:52:55 -07:00
yichuan520030910320	9801aa581b	[Readme]update embedding model config according to reddit feedback	2025-08-09 21:33:33 -07:00
GitHub Actions	5e97916608	chore: release v0.2.6	2025-08-10 03:39:45 +00:00
Andy Lee	8b9c2be8c9	Feat/claude code refine (#24 ) * feat: Add Ollama embedding support for local embedding models * docs: Add clear documentation for Ollama embedding usage * fix: remove leann_ask * docs: remove ollama embedding extra instructions * simplify MCP interface for Claude Code - Remove unnecessary search parameters: search_mode, recompute_embeddings, file_types, min_score - Remove leann_clear tool (not needed for Claude Code workflow) - Streamline search to only use: query, index_name, top_k, complexity - Keep core tools: leann_index, leann_search, leann_status, leann_list 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * remove leann_index from MCP interface Users should use CLI command 'leann build' to create indexes first. MCP now only provides search functionality: - leann_search: search existing indexes - leann_status: check index health - leann_list: list available indexes This separates index creation (CLI) from search (Claude Code). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * improve CLI with auto project name and .gitignore support - Make index_name optional, auto-use current directory name - Read .gitignore patterns and respect them during indexing - Add _read_gitignore_patterns() to parse .gitignore files - Add _should_exclude_file() for pattern matching - Apply exclusion patterns to both PDF and general file processing - Show helpful messages about gitignore usage Now users can simply run: leann build And it will use project name + respect .gitignore patterns. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-09 20:37:17 -07:00
Andy Lee	3ff5aac8e0	Add Ollama embedding support to enable local embedding models (#22 ) * feat: Add Ollama embedding support for local embedding models * docs: Add clear documentation for Ollama embedding usage * feat: Enhance Ollama embedding with better error handling and concurrent processing - Add intelligent model validation and suggestions (inspired by OllamaChat) - Implement concurrent processing for better performance - Add retry mechanism with timeout handling - Provide user-friendly error messages with emojis - Auto-detect and recommend embedding models - Add text truncation for long texts - Improve progress bar display logic * docs: don't mention it in README	2025-08-08 18:44:07 -07:00
yichuan520030910320	67fef60466	[Readme]More about claude code	2025-08-08 16:05:35 -07:00
GitHub Actions	b6ab6f1993	chore: release v0.2.5	2025-08-08 22:32:27 +00:00
joshuashaffer	9f2e82a838	Propagate hosts argument for ollama through chat.py (#21 ) * Propigate hosts argument for ollama through chat.py * Apply suggestions from code review Good AI slop suggestions. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-08 15:31:15 -07:00
yichuan520030910320	0b2b799d5a	[README]fix instructions in cli	2025-08-08 01:04:13 -07:00
yichuan520030910320	0f790fbbd9	docs: polish README and add optimized MCP integration image - Improve grammar and sentence structure in MCP section - Add proper markdown image formatting with relative paths - Optimize mcp_leann.png size (1.3MB -> 224KB) - Update data description to be more specific about Chinese content	2025-08-08 00:58:36 -07:00
GitHub Actions	387ae21eba	chore: release v0.2.4	2025-08-08 07:14:51 +00:00
Andy Lee	3cc329c3e7	fix: remove hardcoded paths from MCP server and documentation	2025-08-08 00:08:56 -07:00
Andy Lee	5567302316	feat: promote Claude Code integration as primary RAG feature	2025-08-07 23:19:19 -07:00
GitHub Actions	075d4bd167	chore: release v0.2.2	2025-08-08 01:58:40 +00:00
yichuan520030910320	e4bcc76f88	fix cli & make recompute default true	2025-08-07 18:58:04 -07:00
yichuan520030910320	710e83b1fd	fix cli if there is no other type of doc to make it robust	2025-08-07 18:46:05 -07:00
yichuan520030910320	c96d653072	more support for type of docs in cli	2025-08-07 18:14:03 -07:00
Andy Lee	8b22d2b5d3	Merge pull request #19 from yichuan-w/feature/claude-code-research Feature/claude code research	2025-08-05 23:02:34 -07:00