Compare commits
8 Commits
feature/cl
...
v0.2.4
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
387ae21eba | ||
|
|
3cc329c3e7 | ||
|
|
5567302316 | ||
|
|
075d4bd167 | ||
|
|
e4bcc76f88 | ||
|
|
710e83b1fd | ||
|
|
c96d653072 | ||
|
|
8b22d2b5d3 |
29
README.md
29
README.md
@@ -16,9 +16,7 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
|
||||
|
||||
LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#️-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
|
||||
|
||||
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
|
||||
|
||||
> **🚀 NEW: Claude Code Integration!** LEANN now provides native MCP integration for Claude Code users. Index your codebase and get intelligent code assistance directly in Claude Code. [Setup Guide →](packages/leann-mcp/README.md)
|
||||
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[codebase](#-claude-code-integration-transform-your-development-workflow)**, **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
|
||||
|
||||
|
||||
|
||||
@@ -213,6 +211,30 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
|
||||
|
||||
</details>
|
||||
|
||||
### 🚀 Claude Code Integration: Transform Your Development Workflow!
|
||||
|
||||
**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
|
||||
|
||||
<p align="center">
|
||||
<img src="https://img.shields.io/badge/MCP-Native%20Integration-blue?style=flat-square" alt="MCP Integration">
|
||||
<a href="https://github.com/yichuan-w/LEANN/tree/feature/graph-partition-support?tab=readme-ov-file#rag-on-everything"><img src="https://img.shields.io/twitter/url?url=https%3A%2F%2Fgithub.com%2Fyichuan-w%2FLEANN&style=social" alt="Twitter"></a>
|
||||
</p>
|
||||
|
||||
**Key features:**
|
||||
- 🔍 **Semantic code search** across your entire project
|
||||
- 📚 **Context-aware assistance** for debugging and development
|
||||
- 🚀 **Zero-config setup** with automatic language detection
|
||||
- 🔒 **Complete privacy** - your code never leaves your machine
|
||||
|
||||
```bash
|
||||
# Install LEANN globally for MCP integration
|
||||
uv tool install leann-core
|
||||
|
||||
# Setup is automatic - just start using Claude Code!
|
||||
```
|
||||
|
||||
**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
|
||||
|
||||
### 📄 Personal Data Manager: Process Any Documents (`.pdf`, `.txt`, `.md`)!
|
||||
|
||||
Ask questions directly about your personal PDFs, documents, and any directory containing your files!
|
||||
@@ -417,7 +439,6 @@ Once the index is built, you can ask questions like:
|
||||
</details>
|
||||
|
||||
|
||||
|
||||
## 🖥️ Command Line Interface
|
||||
|
||||
LEANN includes a powerful CLI for document processing and search. Perfect for quick document indexing and interactive chat.
|
||||
|
||||
@@ -1,150 +0,0 @@
|
||||
# Claude Code x LEANN 集成指南
|
||||
|
||||
## ✅ 现状:已经可以工作!
|
||||
|
||||
好消息:LEANN CLI已经完全可以在Claude Code中使用,无需任何修改!
|
||||
|
||||
## 🚀 立即开始
|
||||
|
||||
### 1. 激活环境
|
||||
```bash
|
||||
# 在LEANN项目目录下
|
||||
source .venv/bin/activate.fish # fish shell
|
||||
# 或
|
||||
source .venv/bin/activate # bash shell
|
||||
```
|
||||
|
||||
### 2. 基本命令
|
||||
|
||||
#### 查看现有索引
|
||||
```bash
|
||||
leann list
|
||||
```
|
||||
|
||||
#### 搜索文档
|
||||
```bash
|
||||
leann search my-docs "machine learning" --recompute-embeddings
|
||||
```
|
||||
|
||||
#### 问答对话
|
||||
```bash
|
||||
echo "What is machine learning?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
|
||||
```
|
||||
|
||||
#### 构建新索引
|
||||
```bash
|
||||
leann build project-docs --docs ./src --recompute-embeddings
|
||||
```
|
||||
|
||||
## 💡 Claude Code 使用技巧
|
||||
|
||||
### 在Claude Code中直接使用
|
||||
|
||||
1. **激活环境**:
|
||||
```bash
|
||||
cd /Users/andyl/Projects/LEANN-RAG
|
||||
source .venv/bin/activate.fish
|
||||
```
|
||||
|
||||
2. **搜索代码库**:
|
||||
```bash
|
||||
leann search my-docs "authentication patterns" --recompute-embeddings --top-k 10
|
||||
```
|
||||
|
||||
3. **智能问答**:
|
||||
```bash
|
||||
echo "How does the authentication system work?" | leann ask my-docs --llm ollama --model qwen3:8b --recompute-embeddings
|
||||
```
|
||||
|
||||
### 批量操作示例
|
||||
|
||||
```bash
|
||||
# 构建项目文档索引
|
||||
leann build project-docs --docs ./docs --force
|
||||
|
||||
# 搜索多个关键词
|
||||
leann search project-docs "API authentication" --recompute-embeddings
|
||||
leann search project-docs "database schema" --recompute-embeddings
|
||||
leann search project-docs "deployment guide" --recompute-embeddings
|
||||
|
||||
# 问答模式
|
||||
echo "What are the API endpoints?" | leann ask project-docs --recompute-embeddings
|
||||
```
|
||||
|
||||
## 🎯 Claude 可以立即执行的工作流
|
||||
|
||||
### 代码分析工作流
|
||||
```bash
|
||||
# 1. 构建代码库索引
|
||||
leann build codebase --docs ./src --backend hnsw --recompute-embeddings
|
||||
|
||||
# 2. 分析架构
|
||||
echo "What is the overall architecture?" | leann ask codebase --recompute-embeddings
|
||||
|
||||
# 3. 查找特定功能
|
||||
leann search codebase "user authentication" --recompute-embeddings --top-k 5
|
||||
|
||||
# 4. 理解实现细节
|
||||
echo "How is user authentication implemented?" | leann ask codebase --recompute-embeddings
|
||||
```
|
||||
|
||||
### 文档理解工作流
|
||||
```bash
|
||||
# 1. 索引项目文档
|
||||
leann build docs --docs ./docs --recompute-embeddings
|
||||
|
||||
# 2. 快速查找信息
|
||||
leann search docs "installation requirements" --recompute-embeddings
|
||||
|
||||
# 3. 获取详细说明
|
||||
echo "What are the system requirements?" | leann ask docs --recompute-embeddings
|
||||
```
|
||||
|
||||
## ⚠️ 重要提示
|
||||
|
||||
1. **必须使用 `--recompute-embeddings`** - 这是关键参数,不加会报错
|
||||
2. **需要先激活虚拟环境** - 确保有LEANN的Python环境
|
||||
3. **Ollama需要预先安装** - ask功能需要本地LLM
|
||||
|
||||
## 🔥 立即可用的Claude提示词
|
||||
|
||||
```
|
||||
Help me analyze this codebase using LEANN:
|
||||
|
||||
1. First, activate the environment:
|
||||
cd /Users/andyl/Projects/LEANN-RAG && source .venv/bin/activate.fish
|
||||
|
||||
2. Build an index of the source code:
|
||||
leann build codebase --docs ./src --recompute-embeddings
|
||||
|
||||
3. Search for authentication patterns:
|
||||
leann search codebase "authentication middleware" --recompute-embeddings --top-k 10
|
||||
|
||||
4. Ask about the authentication system:
|
||||
echo "How does user authentication work in this codebase?" | leann ask codebase --recompute-embeddings
|
||||
|
||||
Please execute these commands and help me understand the code structure.
|
||||
```
|
||||
|
||||
## 📈 下一步改进计划
|
||||
|
||||
虽然现在已经可以用,但还可以进一步优化:
|
||||
|
||||
1. **简化命令** - 默认启用recompute-embeddings
|
||||
2. **配置文件** - 避免重复输入参数
|
||||
3. **状态管理** - 自动检测环境和索引
|
||||
4. **输出格式** - 更适合Claude解析的格式
|
||||
|
||||
但这些都是锦上添花,现在就能用起来!
|
||||
|
||||
## 🎉 总结
|
||||
|
||||
**LEANN现在就可以在Claude Code中完美工作!**
|
||||
|
||||
- ✅ 搜索功能正常
|
||||
- ✅ RAG问答功能正常
|
||||
- ✅ 索引构建功能正常
|
||||
- ✅ 支持多种数据源
|
||||
- ✅ 支持本地LLM
|
||||
|
||||
只需要记住加上 `--recompute-embeddings` 参数就行!
|
||||
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
|
||||
|
||||
[project]
|
||||
name = "leann-backend-diskann"
|
||||
version = "0.2.1"
|
||||
dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
|
||||
version = "0.2.4"
|
||||
dependencies = ["leann-core==0.2.4", "numpy", "protobuf>=3.19.0"]
|
||||
|
||||
[tool.scikit-build]
|
||||
# Key: simplified CMake path
|
||||
|
||||
Submodule packages/leann-backend-diskann/third_party/DiskANN updated: 67a2611ad1...b2dc4ea2c7
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
|
||||
|
||||
[project]
|
||||
name = "leann-backend-hnsw"
|
||||
version = "0.2.1"
|
||||
version = "0.2.4"
|
||||
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
|
||||
dependencies = [
|
||||
"leann-core==0.2.1",
|
||||
"leann-core==0.2.4",
|
||||
"numpy",
|
||||
"pyzmq>=23.0.0",
|
||||
"msgpack>=1.0.0",
|
||||
|
||||
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "leann-core"
|
||||
version = "0.2.1"
|
||||
version = "0.2.4"
|
||||
description = "Core API and plugin system for LEANN"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.9"
|
||||
|
||||
@@ -74,10 +74,11 @@ class LeannCLI:
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
leann build my-docs --docs ./documents # Build index named my-docs
|
||||
leann search my-docs "query" # Search in my-docs index
|
||||
leann ask my-docs "question" # Ask my-docs index
|
||||
leann list # List all stored indexes
|
||||
leann build my-docs --docs ./documents # Build index named my-docs
|
||||
leann build my-ppts --docs ./ --file-types .pptx,.pdf # Index only PowerPoint and PDF files
|
||||
leann search my-docs "query" # Search in my-docs index
|
||||
leann ask my-docs "question" # Ask my-docs index
|
||||
leann list # List all stored indexes
|
||||
""",
|
||||
)
|
||||
|
||||
@@ -99,6 +100,11 @@ Examples:
|
||||
build_parser.add_argument("--num-threads", type=int, default=1)
|
||||
build_parser.add_argument("--compact", action="store_true", default=True)
|
||||
build_parser.add_argument("--recompute", action="store_true", default=True)
|
||||
build_parser.add_argument(
|
||||
"--file-types",
|
||||
type=str,
|
||||
help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
|
||||
)
|
||||
|
||||
# Search command
|
||||
search_parser = subparsers.add_parser("search", help="Search documents")
|
||||
@@ -108,7 +114,12 @@ Examples:
|
||||
search_parser.add_argument("--complexity", type=int, default=64)
|
||||
search_parser.add_argument("--beam-width", type=int, default=1)
|
||||
search_parser.add_argument("--prune-ratio", type=float, default=0.0)
|
||||
search_parser.add_argument("--recompute-embeddings", action="store_true")
|
||||
search_parser.add_argument(
|
||||
"--recompute-embeddings",
|
||||
action="store_true",
|
||||
default=True,
|
||||
help="Recompute embeddings (default: True)",
|
||||
)
|
||||
search_parser.add_argument(
|
||||
"--pruning-strategy",
|
||||
choices=["global", "local", "proportional"],
|
||||
@@ -131,7 +142,12 @@ Examples:
|
||||
ask_parser.add_argument("--complexity", type=int, default=32)
|
||||
ask_parser.add_argument("--beam-width", type=int, default=1)
|
||||
ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
|
||||
ask_parser.add_argument("--recompute-embeddings", action="store_true")
|
||||
ask_parser.add_argument(
|
||||
"--recompute-embeddings",
|
||||
action="store_true",
|
||||
default=True,
|
||||
help="Recompute embeddings (default: True)",
|
||||
)
|
||||
ask_parser.add_argument(
|
||||
"--pruning-strategy",
|
||||
choices=["global", "local", "proportional"],
|
||||
@@ -254,8 +270,10 @@ Examples:
|
||||
print(f' leann search {example_name} "your query"')
|
||||
print(f" leann ask {example_name} --interactive")
|
||||
|
||||
def load_documents(self, docs_dir: str):
|
||||
def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
|
||||
print(f"Loading documents from {docs_dir}...")
|
||||
if custom_file_types:
|
||||
print(f"Using custom file types: {custom_file_types}")
|
||||
|
||||
# Try to use better PDF parsers first
|
||||
documents = []
|
||||
@@ -287,66 +305,81 @@ Examples:
|
||||
documents.extend(default_docs)
|
||||
|
||||
# Load other file types with default reader
|
||||
code_extensions = [
|
||||
# Original document types
|
||||
".txt",
|
||||
".md",
|
||||
".docx",
|
||||
# Code files for Claude Code integration
|
||||
".py",
|
||||
".js",
|
||||
".ts",
|
||||
".jsx",
|
||||
".tsx",
|
||||
".java",
|
||||
".cpp",
|
||||
".c",
|
||||
".h",
|
||||
".hpp",
|
||||
".cs",
|
||||
".go",
|
||||
".rs",
|
||||
".rb",
|
||||
".php",
|
||||
".swift",
|
||||
".kt",
|
||||
".scala",
|
||||
".r",
|
||||
".sql",
|
||||
".sh",
|
||||
".bash",
|
||||
".zsh",
|
||||
".fish",
|
||||
".ps1",
|
||||
".bat",
|
||||
# Config and markup files
|
||||
".json",
|
||||
".yaml",
|
||||
".yml",
|
||||
".xml",
|
||||
".toml",
|
||||
".ini",
|
||||
".cfg",
|
||||
".conf",
|
||||
".html",
|
||||
".css",
|
||||
".scss",
|
||||
".less",
|
||||
".vue",
|
||||
".svelte",
|
||||
# Data science
|
||||
".ipynb",
|
||||
".R",
|
||||
".py",
|
||||
".jl",
|
||||
]
|
||||
other_docs = SimpleDirectoryReader(
|
||||
docs_dir,
|
||||
recursive=True,
|
||||
encoding="utf-8",
|
||||
required_exts=code_extensions,
|
||||
).load_data(show_progress=True)
|
||||
documents.extend(other_docs)
|
||||
if custom_file_types:
|
||||
# Parse custom file types from comma-separated string
|
||||
code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
|
||||
# Ensure extensions start with a dot
|
||||
code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
|
||||
else:
|
||||
# Use default supported file types
|
||||
code_extensions = [
|
||||
# Original document types
|
||||
".txt",
|
||||
".md",
|
||||
".docx",
|
||||
".pptx",
|
||||
# Code files for Claude Code integration
|
||||
".py",
|
||||
".js",
|
||||
".ts",
|
||||
".jsx",
|
||||
".tsx",
|
||||
".java",
|
||||
".cpp",
|
||||
".c",
|
||||
".h",
|
||||
".hpp",
|
||||
".cs",
|
||||
".go",
|
||||
".rs",
|
||||
".rb",
|
||||
".php",
|
||||
".swift",
|
||||
".kt",
|
||||
".scala",
|
||||
".r",
|
||||
".sql",
|
||||
".sh",
|
||||
".bash",
|
||||
".zsh",
|
||||
".fish",
|
||||
".ps1",
|
||||
".bat",
|
||||
# Config and markup files
|
||||
".json",
|
||||
".yaml",
|
||||
".yml",
|
||||
".xml",
|
||||
".toml",
|
||||
".ini",
|
||||
".cfg",
|
||||
".conf",
|
||||
".html",
|
||||
".css",
|
||||
".scss",
|
||||
".less",
|
||||
".vue",
|
||||
".svelte",
|
||||
# Data science
|
||||
".ipynb",
|
||||
".R",
|
||||
".py",
|
||||
".jl",
|
||||
]
|
||||
# Try to load other file types, but don't fail if none are found
|
||||
try:
|
||||
other_docs = SimpleDirectoryReader(
|
||||
docs_dir,
|
||||
recursive=True,
|
||||
encoding="utf-8",
|
||||
required_exts=code_extensions,
|
||||
).load_data(show_progress=True)
|
||||
documents.extend(other_docs)
|
||||
except ValueError as e:
|
||||
if "No files found" in str(e):
|
||||
print("No additional files found for other supported types.")
|
||||
else:
|
||||
raise e
|
||||
|
||||
all_texts = []
|
||||
|
||||
@@ -424,7 +457,7 @@ Examples:
|
||||
print(f"Index '{index_name}' already exists. Use --force to rebuild.")
|
||||
return
|
||||
|
||||
all_texts = self.load_documents(docs_dir)
|
||||
all_texts = self.load_documents(docs_dir, args.file_types)
|
||||
if not all_texts:
|
||||
print("No documents found")
|
||||
return
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
@@ -62,10 +61,6 @@ def handle_request(request):
|
||||
tool_name = request["params"]["name"]
|
||||
args = request["params"].get("arguments", {})
|
||||
|
||||
# Set working directory and environment
|
||||
env = os.environ.copy()
|
||||
cwd = "/Users/andyl/Projects/LEANN-RAG"
|
||||
|
||||
try:
|
||||
if tool_name == "leann_search":
|
||||
cmd = [
|
||||
@@ -76,18 +71,14 @@ def handle_request(request):
|
||||
"--recompute-embeddings",
|
||||
f"--top-k={args.get('top_k', 5)}",
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, cwd=cwd, env=env)
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
|
||||
elif tool_name == "leann_ask":
|
||||
cmd = f'echo "{args["question"]}" | leann ask {args["index_name"]} --recompute-embeddings --llm ollama --model qwen3:8b'
|
||||
result = subprocess.run(
|
||||
cmd, shell=True, capture_output=True, text=True, cwd=cwd, env=env
|
||||
)
|
||||
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
|
||||
|
||||
elif tool_name == "leann_list":
|
||||
result = subprocess.run(
|
||||
["leann", "list"], capture_output=True, text=True, cwd=cwd, env=env
|
||||
)
|
||||
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
|
||||
|
||||
return {
|
||||
"jsonrpc": "2.0",
|
||||
|
||||
@@ -7,7 +7,7 @@ Intelligent code assistance using LEANN's vector search directly in Claude Code.
|
||||
First, install LEANN CLI globally:
|
||||
|
||||
```bash
|
||||
uv tool install leann
|
||||
uv tool install leann-core
|
||||
```
|
||||
|
||||
This makes the `leann` command available system-wide, which `leann_mcp` requires.
|
||||
|
||||
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "leann"
|
||||
version = "0.2.1"
|
||||
version = "0.2.4"
|
||||
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.9"
|
||||
|
||||
Reference in New Issue
Block a user