Compare commits

..

12 Commits

Author SHA1 Message Date
GitHub Actions
b6ab6f1993 chore: release v0.2.5 2025-08-08 22:32:27 +00:00
joshuashaffer
9f2e82a838 Propagate hosts argument for ollama through chat.py (#21)
* Propigate hosts argument for ollama through chat.py

* Apply suggestions from code review

Good AI slop suggestions.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-08 15:31:15 -07:00
yichuan520030910320
0b2b799d5a [README]fix instructions in cli 2025-08-08 01:04:13 -07:00
yichuan520030910320
0f790fbbd9 docs: polish README and add optimized MCP integration image
- Improve grammar and sentence structure in MCP section
- Add proper markdown image formatting with relative paths
- Optimize mcp_leann.png size (1.3MB -> 224KB)
- Update data description to be more specific about Chinese content
2025-08-08 00:58:36 -07:00
GitHub Actions
387ae21eba chore: release v0.2.4 2025-08-08 07:14:51 +00:00
Andy Lee
3cc329c3e7 fix: remove hardcoded paths from MCP server and documentation 2025-08-08 00:08:56 -07:00
Andy Lee
5567302316 feat: promote Claude Code integration as primary RAG feature 2025-08-07 23:19:19 -07:00
GitHub Actions
075d4bd167 chore: release v0.2.2 2025-08-08 01:58:40 +00:00
yichuan520030910320
e4bcc76f88 fix cli & make recompute default true 2025-08-07 18:58:04 -07:00
yichuan520030910320
710e83b1fd fix cli if there is no other type of doc to make it robust 2025-08-07 18:46:05 -07:00
yichuan520030910320
c96d653072 more support for type of docs in cli 2025-08-07 18:14:03 -07:00
Andy Lee
8b22d2b5d3 Merge pull request #19 from yichuan-w/feature/claude-code-research
Feature/claude code research
2025-08-05 23:02:34 -07:00
12 changed files with 147 additions and 253 deletions

View File

@@ -6,6 +6,7 @@
<img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+">
<img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License">
<img src="https://img.shields.io/badge/Platform-Linux%20%7C%20macOS-lightgrey" alt="Platform">
<img src="https://img.shields.io/badge/MCP-Native%20Integration-blue?style=flat-square" alt="MCP Integration">
</p>
<h2 align="center" tabindex="-1" class="heading-element" dir="auto">
@@ -16,9 +17,10 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
> **🚀 NEW: Claude Code Integration!** LEANN now provides native MCP integration for Claude Code users. Index your codebase and get intelligent code assistance directly in Claude Code. [Setup Guide →](packages/leann-mcp/README.md)
\* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow.
@@ -28,7 +30,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
<img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%">
</p>
> **The numbers speak for themselves:** Index 60 million Wikipedia chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
> **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -221,7 +223,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
<img src="videos/paper_clear.gif" alt="LEANN Document Search Demo" width="600">
</p>
The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a README in Chinese) and this is the **easiest example** to run here:
The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:
```bash
source .venv/bin/activate # Don't forget to activate the virtual environment
@@ -416,7 +418,26 @@ Once the index is built, you can ask questions like:
</details>
### 🚀 Claude Code Integration: Transform Your Development Workflow!
**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
**Key features:**
- 🔍 **Semantic code search** across your entire project
- 📚 **Context-aware assistance** for debugging and development
- 🚀 **Zero-config setup** with automatic language detection
```bash
# Install LEANN globally for MCP integration
uv tool install leann-core
# Setup is automatic - just start using Claude Code!
```
Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
![LEANN MCP Integration](assets/mcp_leann.png)
**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
## 🖥️ Command Line Interface
@@ -446,11 +467,8 @@ leann --help
### Usage Examples
```bash
# Build an index from current directory (default)
leann build my-docs
# Or from specific directory
leann build my-docs --docs ./documents
# build from a specific directory, and my_docs is the index name
leann build my-docs --docs ./your_documents
# Search your documents
leann search my-docs "machine learning concepts"

BIN
assets/mcp_leann.png Normal file
View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

View File

@@ -1,150 +0,0 @@
# Claude Code x LEANN 集成指南
## ✅ 现状:已经可以工作!
好消息LEANN CLI已经完全可以在Claude Code中使用无需任何修改
## 🚀 立即开始
### 1. 激活环境
```bash
# 在LEANN项目目录下
source .venv/bin/activate.fish # fish shell
# 或
source .venv/bin/activate # bash shell
```
### 2. 基本命令
#### 查看现有索引
```bash
leann list
```
#### 搜索文档
```bash
leann search my-docs "machine learning" --recompute-embeddings
```
#### 问答对话
```bash
echo "What is machine learning?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
```
#### 构建新索引
```bash
leann build project-docs --docs ./src --recompute-embeddings
```
## 💡 Claude Code 使用技巧
### 在Claude Code中直接使用
1. **激活环境**
```bash
cd /Users/andyl/Projects/LEANN-RAG
source .venv/bin/activate.fish
```
2. **搜索代码库**
```bash
leann search my-docs "authentication patterns" --recompute-embeddings --top-k 10
```
3. **智能问答**
```bash
echo "How does the authentication system work?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
```
### 批量操作示例
```bash
# 构建项目文档索引
leann build project-docs --docs ./docs --force
# 搜索多个关键词
leann search project-docs "API authentication" --recompute-embeddings
leann search project-docs "database schema" --recompute-embeddings
leann search project-docs "deployment guide" --recompute-embeddings
# 问答模式
echo "What are the API endpoints?" | leann ask project-docs --recompute-embeddings
```
## 🎯 Claude 可以立即执行的工作流
### 代码分析工作流
```bash
# 1. 构建代码库索引
leann build codebase --docs ./src --backend hnsw --recompute-embeddings
# 2. 分析架构
echo "What is the overall architecture?" | leann ask codebase --recompute-embeddings
# 3. 查找特定功能
leann search codebase "user authentication" --recompute-embeddings --top-k 5
# 4. 理解实现细节
echo "How is user authentication implemented?" | leann ask codebase --recompute-embeddings
```
### 文档理解工作流
```bash
# 1. 索引项目文档
leann build docs --docs ./docs --recompute-embeddings
# 2. 快速查找信息
leann search docs "installation requirements" --recompute-embeddings
# 3. 获取详细说明
echo "What are the system requirements?" | leann ask docs --recompute-embeddings
```
## ⚠️ 重要提示
1. **必须使用 `--recompute-embeddings`** - 这是关键参数,不加会报错
2. **需要先激活虚拟环境** - 确保有LEANN的Python环境
3. **Ollama需要预先安装** - ask功能需要本地LLM
## 🔥 立即可用的Claude提示词
```
Help me analyze this codebase using LEANN:
1. First, activate the environment:
cd /Users/andyl/Projects/LEANN-RAG && source .venv/bin/activate.fish
2. Build an index of the source code:
leann build codebase --docs ./src --recompute-embeddings
3. Search for authentication patterns:
leann search codebase "authentication middleware" --recompute-embeddings --top-k 10
4. Ask about the authentication system:
echo "How does user authentication work in this codebase?" | leann ask codebase --recompute-embeddings
Please execute these commands and help me understand the code structure.
```
## 📈 下一步改进计划
虽然现在已经可以用,但还可以进一步优化:
1. **简化命令** - 默认启用recompute-embeddings
2. **配置文件** - 避免重复输入参数
3. **状态管理** - 自动检测环境和索引
4. **输出格式** - 更适合Claude解析的格式
但这些都是锦上添花,现在就能用起来!
## 🎉 总结
**LEANN现在就可以在Claude Code中完美工作**
- ✅ 搜索功能正常
- ✅ RAG问答功能正常
- ✅ 索引构建功能正常
- ✅ 支持多种数据源
- ✅ 支持本地LLM
只需要记住加上 `--recompute-embeddings` 参数就行!

View File

@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-diskann"
version = "0.2.1"
dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
version = "0.2.5"
dependencies = ["leann-core==0.2.5", "numpy", "protobuf>=3.19.0"]
[tool.scikit-build]
# Key: simplified CMake path

View File

@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
[project]
name = "leann-backend-hnsw"
version = "0.2.1"
version = "0.2.5"
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
dependencies = [
"leann-core==0.2.1",
"leann-core==0.2.5",
"numpy",
"pyzmq>=23.0.0",
"msgpack>=1.0.0",

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann-core"
version = "0.2.1"
version = "0.2.5"
description = "Core API and plugin system for LEANN"
readme = "README.md"
requires-python = ">=3.9"

View File

@@ -17,12 +17,12 @@ logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def check_ollama_models() -> list[str]:
def check_ollama_models(host: str) -> list[str]:
"""Check available Ollama models and return a list"""
try:
import requests
response = requests.get("http://localhost:11434/api/tags", timeout=5)
response = requests.get(f"{host}/api/tags", timeout=5)
if response.status_code == 200:
data = response.json()
return [model["name"] for model in data.get("models", [])]
@@ -309,10 +309,12 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:
return search_hf_models_fuzzy(query, limit)
def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
def validate_model_and_suggest(
model_name: str, llm_type: str, host: str = "http://localhost:11434"
) -> str | None:
"""Validate model name and provide suggestions if invalid"""
if llm_type == "ollama":
available_models = check_ollama_models()
available_models = check_ollama_models(host)
if available_models and model_name not in available_models:
error_msg = f"Model '{model_name}' not found in your local Ollama installation."
@@ -469,7 +471,7 @@ class OllamaChat(LLMInterface):
requests.get(host)
# Pre-check model availability with helpful suggestions
model_error = validate_model_and_suggest(model, "ollama")
model_error = validate_model_and_suggest(model, "ollama", host)
if model_error:
raise ValueError(model_error)

View File

@@ -74,10 +74,11 @@ class LeannCLI:
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
leann build my-docs --docs ./documents # Build index named my-docs
leann search my-docs "query" # Search in my-docs index
leann ask my-docs "question" # Ask my-docs index
leann list # List all stored indexes
leann build my-docs --docs ./documents # Build index named my-docs
leann build my-ppts --docs ./ --file-types .pptx,.pdf # Index only PowerPoint and PDF files
leann search my-docs "query" # Search in my-docs index
leann ask my-docs "question" # Ask my-docs index
leann list # List all stored indexes
""",
)
@@ -99,6 +100,11 @@ Examples:
build_parser.add_argument("--num-threads", type=int, default=1)
build_parser.add_argument("--compact", action="store_true", default=True)
build_parser.add_argument("--recompute", action="store_true", default=True)
build_parser.add_argument(
"--file-types",
type=str,
help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
)
# Search command
search_parser = subparsers.add_parser("search", help="Search documents")
@@ -108,7 +114,12 @@ Examples:
search_parser.add_argument("--complexity", type=int, default=64)
search_parser.add_argument("--beam-width", type=int, default=1)
search_parser.add_argument("--prune-ratio", type=float, default=0.0)
search_parser.add_argument("--recompute-embeddings", action="store_true")
search_parser.add_argument(
"--recompute-embeddings",
action="store_true",
default=True,
help="Recompute embeddings (default: True)",
)
search_parser.add_argument(
"--pruning-strategy",
choices=["global", "local", "proportional"],
@@ -131,7 +142,12 @@ Examples:
ask_parser.add_argument("--complexity", type=int, default=32)
ask_parser.add_argument("--beam-width", type=int, default=1)
ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
ask_parser.add_argument("--recompute-embeddings", action="store_true")
ask_parser.add_argument(
"--recompute-embeddings",
action="store_true",
default=True,
help="Recompute embeddings (default: True)",
)
ask_parser.add_argument(
"--pruning-strategy",
choices=["global", "local", "proportional"],
@@ -254,8 +270,10 @@ Examples:
print(f' leann search {example_name} "your query"')
print(f" leann ask {example_name} --interactive")
def load_documents(self, docs_dir: str):
def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
print(f"Loading documents from {docs_dir}...")
if custom_file_types:
print(f"Using custom file types: {custom_file_types}")
# Try to use better PDF parsers first
documents = []
@@ -287,66 +305,81 @@ Examples:
documents.extend(default_docs)
# Load other file types with default reader
code_extensions = [
# Original document types
".txt",
".md",
".docx",
# Code files for Claude Code integration
".py",
".js",
".ts",
".jsx",
".tsx",
".java",
".cpp",
".c",
".h",
".hpp",
".cs",
".go",
".rs",
".rb",
".php",
".swift",
".kt",
".scala",
".r",
".sql",
".sh",
".bash",
".zsh",
".fish",
".ps1",
".bat",
# Config and markup files
".json",
".yaml",
".yml",
".xml",
".toml",
".ini",
".cfg",
".conf",
".html",
".css",
".scss",
".less",
".vue",
".svelte",
# Data science
".ipynb",
".R",
".py",
".jl",
]
other_docs = SimpleDirectoryReader(
docs_dir,
recursive=True,
encoding="utf-8",
required_exts=code_extensions,
).load_data(show_progress=True)
documents.extend(other_docs)
if custom_file_types:
# Parse custom file types from comma-separated string
code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
# Ensure extensions start with a dot
code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
else:
# Use default supported file types
code_extensions = [
# Original document types
".txt",
".md",
".docx",
".pptx",
# Code files for Claude Code integration
".py",
".js",
".ts",
".jsx",
".tsx",
".java",
".cpp",
".c",
".h",
".hpp",
".cs",
".go",
".rs",
".rb",
".php",
".swift",
".kt",
".scala",
".r",
".sql",
".sh",
".bash",
".zsh",
".fish",
".ps1",
".bat",
# Config and markup files
".json",
".yaml",
".yml",
".xml",
".toml",
".ini",
".cfg",
".conf",
".html",
".css",
".scss",
".less",
".vue",
".svelte",
# Data science
".ipynb",
".R",
".py",
".jl",
]
# Try to load other file types, but don't fail if none are found
try:
other_docs = SimpleDirectoryReader(
docs_dir,
recursive=True,
encoding="utf-8",
required_exts=code_extensions,
).load_data(show_progress=True)
documents.extend(other_docs)
except ValueError as e:
if "No files found" in str(e):
print("No additional files found for other supported types.")
else:
raise e
all_texts = []
@@ -424,7 +457,7 @@ Examples:
print(f"Index '{index_name}' already exists. Use --force to rebuild.")
return
all_texts = self.load_documents(docs_dir)
all_texts = self.load_documents(docs_dir, args.file_types)
if not all_texts:
print("No documents found")
return

View File

@@ -1,7 +1,6 @@
#!/usr/bin/env python3
import json
import os
import subprocess
import sys
@@ -62,10 +61,6 @@ def handle_request(request):
tool_name = request["params"]["name"]
args = request["params"].get("arguments", {})
# Set working directory and environment
env = os.environ.copy()
cwd = "/Users/andyl/Projects/LEANN-RAG"
try:
if tool_name == "leann_search":
cmd = [
@@ -76,18 +71,14 @@ def handle_request(request):
"--recompute-embeddings",
f"--top-k={args.get('top_k', 5)}",
]
result = subprocess.run(cmd, capture_output=True, text=True, cwd=cwd, env=env)
result = subprocess.run(cmd, capture_output=True, text=True)
elif tool_name == "leann_ask":
cmd = f'echo "{args["question"]}" | leann ask {args["index_name"]} --recompute-embeddings --llm ollama --model qwen3:8b'
result = subprocess.run(
cmd, shell=True, capture_output=True, text=True, cwd=cwd, env=env
)
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
elif tool_name == "leann_list":
result = subprocess.run(
["leann", "list"], capture_output=True, text=True, cwd=cwd, env=env
)
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
return {
"jsonrpc": "2.0",

View File

@@ -7,7 +7,7 @@ Intelligent code assistance using LEANN's vector search directly in Claude Code.
First, install LEANN CLI globally:
```bash
uv tool install leann
uv tool install leann-core
```
This makes the `leann` command available system-wide, which `leann_mcp` requires.
@@ -30,7 +30,7 @@ claude mcp add leann-server -- leann_mcp
```bash
# Build an index for your project
leann build my-project
leann build my-project --docs ./ #change to your doc PATH
# Start Claude Code
claude

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "leann"
version = "0.2.1"
version = "0.2.5"
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
readme = "README.md"
requires-python = ">=3.9"