docs: Simplify README by removing excessive documentation

- Remove overly complex CLI reference and getting started sections (lines 61-334)
- Remove emojis from section headers for cleaner appearance
- Keep README simple and focused as requested
- Maintain essential MCP integration documentation

This addresses feedback to keep documentation minimal and avoid auto-generated content.
This commit is contained in:
aakash
2025-10-06 16:00:22 -07:00
parent a43fafe44e
commit dfae37d0ee

293
README.md
View File

@@ -48,7 +48,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
## Installation ## Installation
### 📦 Prerequisites: Install uv ### Prerequisites: Install uv
[Install uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) first if you don't have it. Typically, you can install it with: [Install uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods) first if you don't have it. Typically, you can install it with:
@@ -56,280 +56,23 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
curl -LsSf https://astral.sh/uv/install.sh | sh curl -LsSf https://astral.sh/uv/install.sh | sh
``` ```
### 🚀 Quick Install (Recommended for Most Users) ### Quick Install
Clone the repository to access all examples and try amazing applications,
**Step 1: Clone and Setup**
```bash ```bash
git clone https://github.com/yichuan-w/LEANN.git leann git clone https://github.com/yichuan-w/LEANN.git leann
cd leann cd leann
```
and install LEANN from [PyPI](https://pypi.org/project/leann/) to run them immediately:
```bash
uv venv uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate source .venv/bin/activate
```
**Step 2: Install LEANN**
```bash
uv pip install leann uv pip install leann
``` ```
**Step 3: Verify Installation**
```bash
leann --help
```
You should see the LEANN CLI help message. If you get an error, see [Troubleshooting](#-troubleshooting) below.
### 🌐 Global Installation (For MCP Integration)
To use LEANN with MCP servers (like Claude Code integration), install globally:
```bash
# Install globally using uv tool
uv tool install leann-core --with leann
# Verify global installation
leann --help
```
> **When to use global installation:** Required for MCP integration, Claude Code, and when you want to use `leann` commands from any directory.
## 📖 CLI Reference
LEANN provides a simple but powerful command-line interface. Here are the essential commands:
### 🔨 Building Indexes
**Basic Usage:**
```bash
leann build <index-name> --docs <files-or-directories>
```
**Examples:**
```bash
# Index a single directory
leann build my-docs --docs ./documents
# Index multiple directories
leann build my-project --docs ./src ./tests ./docs
# Index specific files and directories
leann build my-files --docs ./README.md ./src/ ./config.json
# Index only specific file types
leann build my-pdfs --docs ./documents --file-types .pdf,.docx
# Use different embedding models
leann build my-docs --docs ./documents --embedding-model sentence-transformers/all-MiniLM-L6-v2
```
### 🔍 Searching and Querying
**Search (returns ranked results):**
```bash
leann search <index-name> "your search query"
```
**Ask (conversational Q&A):**
```bash
leann ask <index-name> "your question"
```
**Examples:**
```bash
# Search for documents
leann search my-docs "machine learning algorithms"
# Ask questions about your data
leann ask my-code "How does the authentication system work?"
# Interactive mode (keeps asking questions)
leann ask my-docs --interactive
```
### 📋 Index Management
```bash
# List all indexes
leann list
# Remove an index
leann remove my-docs
# Get index information
leann info my-docs
```
### ⚙️ Configuration Options
**Embedding Models:**
```bash
# Use different embedding backends
--embedding-mode sentence-transformers # Default, runs locally
--embedding-mode openai # Requires OPENAI_API_KEY
--embedding-mode ollama # Requires Ollama server
--embedding-mode mlx # Apple Silicon only
# Specify embedding model
--embedding-model sentence-transformers/all-MiniLM-L6-v2 # Fast, 384-dim
--embedding-model sentence-transformers/all-mpnet-base-v2 # Better quality, 768-dim
--embedding-model text-embedding-ada-002 # OpenAI (requires API key)
```
**Vector Database Backends:**
```bash
--backend hnsw # Default, good for most use cases
--backend diskann # Better for large datasets (>1M documents)
```
**File Processing:**
```bash
--file-types .pdf,.docx,.txt # Only process specific file types
--chunk-size 512 # Adjust text chunk size (default: 256)
--chunk-overlap 128 # Adjust chunk overlap (default: 128)
```
### 🌐 Environment Variables
Configure LEANN behavior with environment variables:
```bash
# OpenAI Configuration
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1" # Custom endpoint
# Ollama Configuration
export OLLAMA_HOST="http://localhost:11434" # Default Ollama URL
export OLLAMA_HOST="http://your-server:11434" # Custom Ollama server
# LEANN Configuration
export LEANN_LOG_LEVEL="INFO" # DEBUG, INFO, WARNING, ERROR
```
### 🔧 Troubleshooting
**Common Issues:**
1. **"leann: command not found"**
```bash
# Make sure you're in the right environment
source .venv/bin/activate
# Or install globally
uv tool install leann-core --with leann
```
2. **Ollama connection issues**
```bash
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Set custom Ollama URL
export OLLAMA_HOST="http://your-ollama-server:11434"
leann build my-docs --docs ./documents --embedding-mode ollama
```
3. **OpenAI API errors**
```bash
# Set your API key
export OPENAI_API_KEY="your-api-key"
# Use custom endpoint (e.g., Azure OpenAI)
export OPENAI_BASE_URL="https://your-endpoint.openai.azure.com/v1"
```
4. **Memory issues with large datasets**
```bash
# Use smaller batch sizes
leann build my-docs --docs ./documents --batch-size 16
# Use DiskANN for large datasets
leann build my-docs --docs ./documents --backend diskann
```
## 🚀 Getting Started Guide
**New to LEANN?** Follow this step-by-step guide to get up and running quickly.
### Step 1: Choose Your Installation Method
**For most users (recommended):**
```bash
# Quick setup - works for 90% of use cases
git clone https://github.com/yichuan-w/LEANN.git leann
cd leann
uv venv && source .venv/bin/activate
uv pip install leann
```
**For MCP integration (Claude Code, live data sources):**
```bash
# Global installation required for MCP servers
uv tool install leann-core --with leann
```
### Step 2: Verify Installation
```bash
leann --help
```
If you see the help message, you're ready to go! If not, see [Troubleshooting](#-troubleshooting) above.
### Step 3: Create Your First Index
**Simple example:**
```bash
# Create a test directory with some documents
mkdir test-docs
echo "LEANN is a vector database for personal AI" > test-docs/about.txt
echo "It uses 97% less storage than traditional solutions" > test-docs/features.txt
# Build your first index
leann build my-first-index --docs test-docs
# Search it
leann search my-first-index "vector database"
```
### Step 4: Try Real Data
**Index your documents:**
```bash
leann build my-docs --docs ~/Documents
leann search my-docs "your search query"
```
**Index your code:**
```bash
leann build my-code --docs ./src ./tests
leann ask my-code "How does authentication work?"
```
### Step 5: Explore Advanced Features
Once you're comfortable with the basics:
- **Try different embedding models**: Add `--embedding-model sentence-transformers/all-MiniLM-L6-v2`
- **Use Ollama for local LLMs**: Set up Ollama and use `--embedding-mode ollama`
- **Connect live data**: Try MCP integration for Slack, Twitter, etc.
- **Explore specialized apps**: Use `python -m apps.email_rag`, `python -m apps.browser_rag`, etc.
### Understanding LEANN vs Apps
**LEANN has two interfaces:**
1. **CLI Commands** (`leann build`, `leann search`, `leann ask`)
- General-purpose document indexing and search
- Works with any files and directories
- Best for: Personal documents, code, general use
2. **Specialized Apps** (`python -m apps.email_rag`, `python -m apps.chatgpt_rag`, etc.)
- Pre-built applications for specific data sources
- Handle data extraction and formatting automatically
- Best for: Email, browser history, chat exports, live data
**When to use which:**
- Use **CLI** for general documents and code
- Use **Apps** for specialized data sources (email, chats, etc.)
<!-- <!--
> Low-resource? See "Low-resource setups" in the [Configuration Guide](docs/configuration-guide.md#low-resource-setups). --> > Low-resource? See "Low-resource setups" in the [Configuration Guide](docs/configuration-guide.md#low-resource-setups). -->
@@ -1032,18 +775,18 @@ Once your iMessage conversations are indexed, you can search with queries like:
</details> </details>
### 🔌 MCP Integration: RAG on Live Data from Any Platform! ### MCP Integration: RAG on Live Data from Any Platform
**NEW!** Connect to live data sources through the Model Context Protocol (MCP). LEANN now supports real-time RAG on platforms like Slack, Twitter, and more through standardized MCP servers. **NEW!** Connect to live data sources through the Model Context Protocol (MCP). LEANN now supports real-time RAG on platforms like Slack, Twitter, and more through standardized MCP servers.
**Key Benefits:** **Key Benefits:**
- 🔄 **Live Data Access**: Fetch real-time data without manual exports - **Live Data Access**: Fetch real-time data without manual exports
- 🔌 **Standardized Protocol**: Use any MCP-compatible server - **Standardized Protocol**: Use any MCP-compatible server
- 🚀 **Easy Extension**: Add new platforms with minimal code - **Easy Extension**: Add new platforms with minimal code
- 🔒 **Secure Access**: MCP servers handle authentication - **Secure Access**: MCP servers handle authentication
<details> <details>
<summary><strong>💬 Slack Messages: Search Your Team Conversations</strong></summary> <summary><strong>Slack Messages: Search Your Team Conversations</strong></summary>
Transform your Slack workspace into a searchable knowledge base! Find discussions, decisions, and shared knowledge across all your channels. Transform your Slack workspace into a searchable knowledge base! Find discussions, decisions, and shared knowledge across all your channels.
@@ -1078,7 +821,7 @@ python -m apps.slack_rag \
</details> </details>
<details> <details>
<summary><strong>🐦 Twitter Bookmarks: Your Personal Tweet Library</strong></summary> <summary><strong>Twitter Bookmarks: Your Personal Tweet Library</strong></summary>
Search through your Twitter bookmarks! Find that perfect article, thread, or insight you saved for later. Search through your Twitter bookmarks! Find that perfect article, thread, or insight you saved for later.
@@ -1204,7 +947,7 @@ Try our fully agentic pipeline with auto query rewriting, semantic search planni
**🔥 Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md) **🔥 Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
## 🖥️ Command Line Interface ## Command Line Interface
LEANN includes a powerful CLI for document processing and search. Perfect for quick document indexing and interactive chat. LEANN includes a powerful CLI for document processing and search. Perfect for quick document indexing and interactive chat.