docs: Simplify README by removing excessive documentation
- Remove overly complex CLI reference and getting started sections (lines 61-334) - Remove emojis from section headers for cleaner appearance - Keep README simple and focused as requested - Maintain essential MCP integration documentation This addresses feedback to keep documentation minimal and avoid auto-generated content.
This commit is contained in:
293
README.md
293
README.md
@@ -48,7 +48,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
|
|||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### 📦 Prerequisites: Install uv
|
### Prerequisites: Install uv
|
||||||
|
|
||||||
[Install uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) first if you don't have it. Typically, you can install it with:
|
[Install uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) first if you don't have it. Typically, you can install it with:
|
||||||
|
|
||||||
@@ -56,280 +56,23 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
|
|||||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
```
|
```
|
||||||
|
|
||||||
### 🚀 Quick Install (Recommended for Most Users)
|
### Quick Install
|
||||||
|
|
||||||
|
Clone the repository to access all examples and try amazing applications,
|
||||||
|
|
||||||
**Step 1: Clone and Setup**
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/yichuan-w/LEANN.git leann
|
git clone https://github.com/yichuan-w/LEANN.git leann
|
||||||
cd leann
|
cd leann
|
||||||
|
```
|
||||||
|
|
||||||
|
and install LEANN from [PyPI](https://pypi.org/project/leann/) to run them immediately:
|
||||||
|
|
||||||
|
```bash
|
||||||
uv venv
|
uv venv
|
||||||
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
source .venv/bin/activate
|
||||||
```
|
|
||||||
|
|
||||||
**Step 2: Install LEANN**
|
|
||||||
```bash
|
|
||||||
uv pip install leann
|
uv pip install leann
|
||||||
```
|
```
|
||||||
|
|
||||||
**Step 3: Verify Installation**
|
|
||||||
```bash
|
|
||||||
leann --help
|
|
||||||
```
|
|
||||||
|
|
||||||
You should see the LEANN CLI help message. If you get an error, see [Troubleshooting](#-troubleshooting) below.
|
|
||||||
|
|
||||||
### 🌐 Global Installation (For MCP Integration)
|
|
||||||
|
|
||||||
To use LEANN with MCP servers (like Claude Code integration), install globally:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Install globally using uv tool
|
|
||||||
uv tool install leann-core --with leann
|
|
||||||
|
|
||||||
# Verify global installation
|
|
||||||
leann --help
|
|
||||||
```
|
|
||||||
|
|
||||||
> **When to use global installation:** Required for MCP integration, Claude Code, and when you want to use `leann` commands from any directory.
|
|
||||||
## 📖 CLI Reference
|
|
||||||
|
|
||||||
LEANN provides a simple but powerful command-line interface. Here are the essential commands:
|
|
||||||
|
|
||||||
### 🔨 Building Indexes
|
|
||||||
|
|
||||||
**Basic Usage:**
|
|
||||||
```bash
|
|
||||||
leann build <index-name> --docs <files-or-directories>
|
|
||||||
```
|
|
||||||
|
|
||||||
**Examples:**
|
|
||||||
```bash
|
|
||||||
# Index a single directory
|
|
||||||
leann build my-docs --docs ./documents
|
|
||||||
|
|
||||||
# Index multiple directories
|
|
||||||
leann build my-project --docs ./src ./tests ./docs
|
|
||||||
|
|
||||||
# Index specific files and directories
|
|
||||||
leann build my-files --docs ./README.md ./src/ ./config.json
|
|
||||||
|
|
||||||
# Index only specific file types
|
|
||||||
leann build my-pdfs --docs ./documents --file-types .pdf,.docx
|
|
||||||
|
|
||||||
# Use different embedding models
|
|
||||||
leann build my-docs --docs ./documents --embedding-model sentence-transformers/all-MiniLM-L6-v2
|
|
||||||
```
|
|
||||||
|
|
||||||
### 🔍 Searching and Querying
|
|
||||||
|
|
||||||
**Search (returns ranked results):**
|
|
||||||
```bash
|
|
||||||
leann search <index-name> "your search query"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Ask (conversational Q&A):**
|
|
||||||
```bash
|
|
||||||
leann ask <index-name> "your question"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Examples:**
|
|
||||||
```bash
|
|
||||||
# Search for documents
|
|
||||||
leann search my-docs "machine learning algorithms"
|
|
||||||
|
|
||||||
# Ask questions about your data
|
|
||||||
leann ask my-code "How does the authentication system work?"
|
|
||||||
|
|
||||||
# Interactive mode (keeps asking questions)
|
|
||||||
leann ask my-docs --interactive
|
|
||||||
```
|
|
||||||
|
|
||||||
### 📋 Index Management
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# List all indexes
|
|
||||||
leann list
|
|
||||||
|
|
||||||
# Remove an index
|
|
||||||
leann remove my-docs
|
|
||||||
|
|
||||||
# Get index information
|
|
||||||
leann info my-docs
|
|
||||||
```
|
|
||||||
|
|
||||||
### ⚙️ Configuration Options
|
|
||||||
|
|
||||||
**Embedding Models:**
|
|
||||||
```bash
|
|
||||||
# Use different embedding backends
|
|
||||||
--embedding-mode sentence-transformers # Default, runs locally
|
|
||||||
--embedding-mode openai # Requires OPENAI_API_KEY
|
|
||||||
--embedding-mode ollama # Requires Ollama server
|
|
||||||
--embedding-mode mlx # Apple Silicon only
|
|
||||||
|
|
||||||
# Specify embedding model
|
|
||||||
--embedding-model sentence-transformers/all-MiniLM-L6-v2 # Fast, 384-dim
|
|
||||||
--embedding-model sentence-transformers/all-mpnet-base-v2 # Better quality, 768-dim
|
|
||||||
--embedding-model text-embedding-ada-002 # OpenAI (requires API key)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Vector Database Backends:**
|
|
||||||
```bash
|
|
||||||
--backend hnsw # Default, good for most use cases
|
|
||||||
--backend diskann # Better for large datasets (>1M documents)
|
|
||||||
```
|
|
||||||
|
|
||||||
**File Processing:**
|
|
||||||
```bash
|
|
||||||
--file-types .pdf,.docx,.txt # Only process specific file types
|
|
||||||
--chunk-size 512 # Adjust text chunk size (default: 256)
|
|
||||||
--chunk-overlap 128 # Adjust chunk overlap (default: 128)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 🌐 Environment Variables
|
|
||||||
|
|
||||||
Configure LEANN behavior with environment variables:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# OpenAI Configuration
|
|
||||||
export OPENAI_API_KEY="your-api-key"
|
|
||||||
export OPENAI_BASE_URL="https://api.openai.com/v1" # Custom endpoint
|
|
||||||
|
|
||||||
# Ollama Configuration
|
|
||||||
export OLLAMA_HOST="http://localhost:11434" # Default Ollama URL
|
|
||||||
export OLLAMA_HOST="http://your-server:11434" # Custom Ollama server
|
|
||||||
|
|
||||||
# LEANN Configuration
|
|
||||||
export LEANN_LOG_LEVEL="INFO" # DEBUG, INFO, WARNING, ERROR
|
|
||||||
```
|
|
||||||
|
|
||||||
### 🔧 Troubleshooting
|
|
||||||
|
|
||||||
**Common Issues:**
|
|
||||||
|
|
||||||
1. **"leann: command not found"**
|
|
||||||
```bash
|
|
||||||
# Make sure you're in the right environment
|
|
||||||
source .venv/bin/activate
|
|
||||||
|
|
||||||
# Or install globally
|
|
||||||
uv tool install leann-core --with leann
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Ollama connection issues**
|
|
||||||
```bash
|
|
||||||
# Check if Ollama is running
|
|
||||||
curl http://localhost:11434/api/tags
|
|
||||||
|
|
||||||
# Set custom Ollama URL
|
|
||||||
export OLLAMA_HOST="http://your-ollama-server:11434"
|
|
||||||
leann build my-docs --docs ./documents --embedding-mode ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **OpenAI API errors**
|
|
||||||
```bash
|
|
||||||
# Set your API key
|
|
||||||
export OPENAI_API_KEY="your-api-key"
|
|
||||||
|
|
||||||
# Use custom endpoint (e.g., Azure OpenAI)
|
|
||||||
export OPENAI_BASE_URL="https://your-endpoint.openai.azure.com/v1"
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Memory issues with large datasets**
|
|
||||||
```bash
|
|
||||||
# Use smaller batch sizes
|
|
||||||
leann build my-docs --docs ./documents --batch-size 16
|
|
||||||
|
|
||||||
# Use DiskANN for large datasets
|
|
||||||
leann build my-docs --docs ./documents --backend diskann
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🚀 Getting Started Guide
|
|
||||||
|
|
||||||
**New to LEANN?** Follow this step-by-step guide to get up and running quickly.
|
|
||||||
|
|
||||||
### Step 1: Choose Your Installation Method
|
|
||||||
|
|
||||||
**For most users (recommended):**
|
|
||||||
```bash
|
|
||||||
# Quick setup - works for 90% of use cases
|
|
||||||
git clone https://github.com/yichuan-w/LEANN.git leann
|
|
||||||
cd leann
|
|
||||||
uv venv && source .venv/bin/activate
|
|
||||||
uv pip install leann
|
|
||||||
```
|
|
||||||
|
|
||||||
**For MCP integration (Claude Code, live data sources):**
|
|
||||||
```bash
|
|
||||||
# Global installation required for MCP servers
|
|
||||||
uv tool install leann-core --with leann
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Verify Installation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
leann --help
|
|
||||||
```
|
|
||||||
|
|
||||||
If you see the help message, you're ready to go! If not, see [Troubleshooting](#-troubleshooting) above.
|
|
||||||
|
|
||||||
### Step 3: Create Your First Index
|
|
||||||
|
|
||||||
**Simple example:**
|
|
||||||
```bash
|
|
||||||
# Create a test directory with some documents
|
|
||||||
mkdir test-docs
|
|
||||||
echo "LEANN is a vector database for personal AI" > test-docs/about.txt
|
|
||||||
echo "It uses 97% less storage than traditional solutions" > test-docs/features.txt
|
|
||||||
|
|
||||||
# Build your first index
|
|
||||||
leann build my-first-index --docs test-docs
|
|
||||||
|
|
||||||
# Search it
|
|
||||||
leann search my-first-index "vector database"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Try Real Data
|
|
||||||
|
|
||||||
**Index your documents:**
|
|
||||||
```bash
|
|
||||||
leann build my-docs --docs ~/Documents
|
|
||||||
leann search my-docs "your search query"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Index your code:**
|
|
||||||
```bash
|
|
||||||
leann build my-code --docs ./src ./tests
|
|
||||||
leann ask my-code "How does authentication work?"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 5: Explore Advanced Features
|
|
||||||
|
|
||||||
Once you're comfortable with the basics:
|
|
||||||
|
|
||||||
- **Try different embedding models**: Add `--embedding-model sentence-transformers/all-MiniLM-L6-v2`
|
|
||||||
- **Use Ollama for local LLMs**: Set up Ollama and use `--embedding-mode ollama`
|
|
||||||
- **Connect live data**: Try MCP integration for Slack, Twitter, etc.
|
|
||||||
- **Explore specialized apps**: Use `python -m apps.email_rag`, `python -m apps.browser_rag`, etc.
|
|
||||||
|
|
||||||
### Understanding LEANN vs Apps
|
|
||||||
|
|
||||||
**LEANN has two interfaces:**
|
|
||||||
|
|
||||||
1. **CLI Commands** (`leann build`, `leann search`, `leann ask`)
|
|
||||||
- General-purpose document indexing and search
|
|
||||||
- Works with any files and directories
|
|
||||||
- Best for: Personal documents, code, general use
|
|
||||||
|
|
||||||
2. **Specialized Apps** (`python -m apps.email_rag`, `python -m apps.chatgpt_rag`, etc.)
|
|
||||||
- Pre-built applications for specific data sources
|
|
||||||
- Handle data extraction and formatting automatically
|
|
||||||
- Best for: Email, browser history, chat exports, live data
|
|
||||||
|
|
||||||
**When to use which:**
|
|
||||||
- Use **CLI** for general documents and code
|
|
||||||
- Use **Apps** for specialized data sources (email, chats, etc.)
|
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
> Low-resource? See "Low-resource setups" in the [Configuration Guide](docs/configuration-guide.md#low-resource-setups). -->
|
> Low-resource? See "Low-resource setups" in the [Configuration Guide](docs/configuration-guide.md#low-resource-setups). -->
|
||||||
|
|
||||||
@@ -1032,18 +775,18 @@ Once your iMessage conversations are indexed, you can search with queries like:
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
### 🔌 MCP Integration: RAG on Live Data from Any Platform!
|
### MCP Integration: RAG on Live Data from Any Platform
|
||||||
|
|
||||||
**NEW!** Connect to live data sources through the Model Context Protocol (MCP). LEANN now supports real-time RAG on platforms like Slack, Twitter, and more through standardized MCP servers.
|
**NEW!** Connect to live data sources through the Model Context Protocol (MCP). LEANN now supports real-time RAG on platforms like Slack, Twitter, and more through standardized MCP servers.
|
||||||
|
|
||||||
**Key Benefits:**
|
**Key Benefits:**
|
||||||
- 🔄 **Live Data Access**: Fetch real-time data without manual exports
|
- **Live Data Access**: Fetch real-time data without manual exports
|
||||||
- 🔌 **Standardized Protocol**: Use any MCP-compatible server
|
- **Standardized Protocol**: Use any MCP-compatible server
|
||||||
- 🚀 **Easy Extension**: Add new platforms with minimal code
|
- **Easy Extension**: Add new platforms with minimal code
|
||||||
- 🔒 **Secure Access**: MCP servers handle authentication
|
- **Secure Access**: MCP servers handle authentication
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong>💬 Slack Messages: Search Your Team Conversations</strong></summary>
|
<summary><strong>Slack Messages: Search Your Team Conversations</strong></summary>
|
||||||
|
|
||||||
Transform your Slack workspace into a searchable knowledge base! Find discussions, decisions, and shared knowledge across all your channels.
|
Transform your Slack workspace into a searchable knowledge base! Find discussions, decisions, and shared knowledge across all your channels.
|
||||||
|
|
||||||
@@ -1078,7 +821,7 @@ python -m apps.slack_rag \
|
|||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong>🐦 Twitter Bookmarks: Your Personal Tweet Library</strong></summary>
|
<summary><strong>Twitter Bookmarks: Your Personal Tweet Library</strong></summary>
|
||||||
|
|
||||||
Search through your Twitter bookmarks! Find that perfect article, thread, or insight you saved for later.
|
Search through your Twitter bookmarks! Find that perfect article, thread, or insight you saved for later.
|
||||||
|
|
||||||
@@ -1204,7 +947,7 @@ Try our fully agentic pipeline with auto query rewriting, semantic search planni
|
|||||||
|
|
||||||
**🔥 Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
|
**🔥 Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
|
||||||
|
|
||||||
## 🖥️ Command Line Interface
|
## Command Line Interface
|
||||||
|
|
||||||
LEANN includes a powerful CLI for document processing and search. Perfect for quick document indexing and interactive chat.
|
LEANN includes a powerful CLI for document processing and search. Perfect for quick document indexing and interactive chat.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user