docs: Improve parameter categorization in README
- Clearly separate core (shared) vs specific parameters - Move LLM and embedding examples to 'Example Commands' section - Add descriptive comments for all specific parameters - Keep only truly data-source-specific parameters in specific sections
This commit is contained in:
90
README.md
90
README.md
@@ -41,8 +41,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
|
|||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
<details>
|
### 📦 Prerequisites: Install uv (if you don't have it)
|
||||||
<summary><strong>📦 Prerequisites: Install uv (if you don't have it)</strong></summary>
|
|
||||||
|
|
||||||
Install uv first if you don't have it:
|
Install uv first if you don't have it:
|
||||||
|
|
||||||
@@ -52,29 +51,30 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
|
|||||||
|
|
||||||
📖 [Detailed uv installation methods →](https://docs.astral.sh/uv/getting-started/installation/#installation-methods)
|
📖 [Detailed uv installation methods →](https://docs.astral.sh/uv/getting-started/installation/#installation-methods)
|
||||||
|
|
||||||
</details>
|
### 🚀 Quick Install
|
||||||
|
|
||||||
|
Clone the repository to access all examples,
|
||||||
LEANN provides two installation methods: **pip install** (quick and easy) and **build from source** (recommended for development).
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### 🚀 Quick Install (Recommended for most users)
|
|
||||||
|
|
||||||
Clone the repository to access all examples and install LEANN from [PyPI](https://pypi.org/project/leann/) to run them immediately:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone git@github.com:yichuan-w/LEANN.git leann
|
git clone https://github.com/yichuan-w/LEANN.git leann
|
||||||
cd leann
|
cd leann
|
||||||
|
```
|
||||||
|
|
||||||
|
and install LEANN from [PyPI](https://pypi.org/project/leann/) to run them immediately:
|
||||||
|
|
||||||
|
```bash
|
||||||
uv venv
|
uv venv
|
||||||
source .venv/bin/activate
|
source .venv/bin/activate
|
||||||
uv pip install leann
|
uv pip install leann
|
||||||
```
|
```
|
||||||
|
|
||||||
### 🔧 Build from Source (Recommended for development)
|
<details>
|
||||||
|
<summary>
|
||||||
|
<h3>🔧 Build from Source (Recommended for development)</h3>
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone git@github.com:yichuan-w/LEANN.git leann
|
git clone https://github.com/yichuan-w/LEANN.git leann
|
||||||
cd leann
|
cd leann
|
||||||
git submodule update --init --recursive
|
git submodule update --init --recursive
|
||||||
```
|
```
|
||||||
@@ -91,14 +91,14 @@ sudo apt-get install libomp-dev libboost-all-dev protobuf-compiler libabsl-dev l
|
|||||||
uv sync
|
uv sync
|
||||||
```
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
Our declarative API makes RAG as easy as writing a config file.
|
Our declarative API makes RAG as easy as writing a config file.
|
||||||
|
|
||||||
[](https://colab.research.google.com/github/yichuan-w/LEANN/blob/main/demo.ipynb) [Try in this ipynb file →](demo.ipynb)
|
Check out [demo.ipynb](demo.ipynb) or [](https://colab.research.google.com/github/yichuan-w/LEANN/blob/main/demo.ipynb)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from leann import LeannBuilder, LeannSearcher, LeannChat
|
from leann import LeannBuilder, LeannSearcher, LeannChat
|
||||||
@@ -122,11 +122,11 @@ response = chat.ask("How much storage does LEANN save?", top_k=1)
|
|||||||
|
|
||||||
## RAG on Everything!
|
## RAG on Everything!
|
||||||
|
|
||||||
LEANN supports RAG on various data sources including documents (.pdf, .txt, .md), Apple Mail, Google Search History, WeChat, and more.
|
LEANN supports RAG on various data sources including documents (`.pdf`, `.txt`, `.md`), Apple Mail, Google Search History, WeChat, and more.
|
||||||
|
|
||||||
|
### Generation Model Setup
|
||||||
|
|
||||||
> **Generation Model Setup**
|
LEANN supports multiple LLM providers for text generation (OpenAI API, HuggingFace, Ollama).
|
||||||
> LEANN supports multiple LLM providers for text generation (OpenAI API, HuggingFace, Ollama).
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong>🔑 OpenAI API Setup (Default)</strong></summary>
|
<summary><strong>🔑 OpenAI API Setup (Default)</strong></summary>
|
||||||
@@ -166,7 +166,7 @@ ollama pull llama3.2:1b
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
### 📄 Personal Data Manager: Process Any Documents (.pdf, .txt, .md)!
|
### 📄 Personal Data Manager: Process Any Documents (`.pdf`, `.txt`, `.md`)!
|
||||||
|
|
||||||
Ask questions directly about your personal PDFs, documents, and any directory containing your files!
|
Ask questions directly about your personal PDFs, documents, and any directory containing your files!
|
||||||
|
|
||||||
@@ -177,7 +177,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
|
|||||||
The example below asks a question about summarizing two papers (uses default data in `examples/data`) and this is the easiest example to run here:
|
The example below asks a question about summarizing two papers (uses default data in `examples/data`) and this is the easiest example to run here:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
source .venv/bin/activate
|
source .venv/bin/activate # Don't forget to activate the virtual environment
|
||||||
python ./examples/document_rag.py --query "What are the main techniques LEANN explores?"
|
python ./examples/document_rag.py --query "What are the main techniques LEANN explores?"
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -203,14 +203,25 @@ python ./examples/document_rag.py --query "What are the main techniques LEANN ex
|
|||||||
|
|
||||||
#### Document-Specific Parameters
|
#### Document-Specific Parameters
|
||||||
```bash
|
```bash
|
||||||
|
--data-dir DIR # Directory containing documents to process
|
||||||
|
--file-types .ext .ext # File extensions to process (e.g., .pdf .txt .md)
|
||||||
|
--chunk-size N # Size of text chunks (default: 2048)
|
||||||
|
--chunk-overlap N # Overlap between chunks (default: 25)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Example Commands
|
||||||
|
```bash
|
||||||
# Process custom documents
|
# Process custom documents
|
||||||
python examples/document_rag.py --data-dir "./my_documents" --file-types .pdf .txt .md
|
python examples/document_rag.py --data-dir "./my_documents" --file-types .pdf .txt .md
|
||||||
|
|
||||||
# Process with custom chunking
|
# Process with custom chunking
|
||||||
python examples/document_rag.py --chunk-size 512 --chunk-overlap 256
|
python examples/document_rag.py --chunk-size 512 --chunk-overlap 256
|
||||||
|
|
||||||
# Use different LLM
|
# Use local LLM for privacy
|
||||||
python examples/document_rag.py --llm ollama --llm-model llama3.2:1b
|
python examples/document_rag.py --llm ollama --llm-model llama3.2:1b
|
||||||
|
|
||||||
|
# Use OpenAI embeddings
|
||||||
|
python examples/document_rag.py --embedding-model text-embedding-3-small --embedding-mode openai
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
@@ -224,7 +235,8 @@ python examples/document_rag.py --llm ollama --llm-model llama3.2:1b
|
|||||||
<img src="videos/mail_clear.gif" alt="LEANN Email Search Demo" width="600">
|
<img src="videos/mail_clear.gif" alt="LEANN Email Search Demo" width="600">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
**Note:** You need to grant full disk access to your terminal/VS Code in System Preferences → Privacy & Security → Full Disk Access.
|
Before running the example below, you need to grant full disk access to your terminal/VS Code in System Preferences → Privacy & Security → Full Disk Access.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python examples/email_rag.py --query "What's the food I ordered by DoorDash or Uber Eats mostly?"
|
python examples/email_rag.py --query "What's the food I ordered by DoorDash or Uber Eats mostly?"
|
||||||
```
|
```
|
||||||
@@ -235,6 +247,12 @@ python examples/email_rag.py --query "What's the food I ordered by DoorDash or U
|
|||||||
|
|
||||||
#### Email-Specific Parameters
|
#### Email-Specific Parameters
|
||||||
```bash
|
```bash
|
||||||
|
--mail-path PATH # Path to specific mail directory (auto-detects if omitted)
|
||||||
|
--include-html # Include HTML content in processing
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Example Commands
|
||||||
|
```bash
|
||||||
# Auto-detect and process all Apple Mail accounts
|
# Auto-detect and process all Apple Mail accounts
|
||||||
python examples/email_rag.py
|
python examples/email_rag.py
|
||||||
|
|
||||||
@@ -247,8 +265,11 @@ python examples/email_rag.py --max-items -1
|
|||||||
# Include HTML content
|
# Include HTML content
|
||||||
python examples/email_rag.py --include-html
|
python examples/email_rag.py --include-html
|
||||||
|
|
||||||
# Use different embedding model
|
# Use OpenAI embeddings for better results
|
||||||
python examples/email_rag.py --embedding-model text-embedding-3-small --embedding-mode openai
|
python examples/email_rag.py --embedding-model text-embedding-3-small --embedding-mode openai
|
||||||
|
|
||||||
|
# Use local LLM for privacy
|
||||||
|
python examples/email_rag.py --llm ollama --llm-model llama3.2:1b
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
@@ -278,6 +299,11 @@ python examples/browser_rag.py --query "Tell me my browser history about machine
|
|||||||
|
|
||||||
#### Browser-Specific Parameters
|
#### Browser-Specific Parameters
|
||||||
```bash
|
```bash
|
||||||
|
--chrome-profile PATH # Path to Chrome profile directory (auto-detects if omitted)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Example Commands
|
||||||
|
```bash
|
||||||
# Auto-detect and process all Chrome profiles
|
# Auto-detect and process all Chrome profiles
|
||||||
python examples/browser_rag.py
|
python examples/browser_rag.py
|
||||||
|
|
||||||
@@ -292,6 +318,9 @@ python examples/browser_rag.py # Without --query for interactive mode
|
|||||||
|
|
||||||
# Use local LLM for privacy
|
# Use local LLM for privacy
|
||||||
python examples/browser_rag.py --llm ollama --llm-model llama3.2:1b
|
python examples/browser_rag.py --llm ollama --llm-model llama3.2:1b
|
||||||
|
|
||||||
|
# Use better embeddings
|
||||||
|
python examples/browser_rag.py --embedding-model text-embedding-3-small --embedding-mode openai
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
@@ -359,6 +388,12 @@ Failed to find or export WeChat data. Exiting.
|
|||||||
|
|
||||||
#### WeChat-Specific Parameters
|
#### WeChat-Specific Parameters
|
||||||
```bash
|
```bash
|
||||||
|
--export-dir DIR # Directory to store exported WeChat data
|
||||||
|
--force-export # Force re-export even if data exists
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Example Commands
|
||||||
|
```bash
|
||||||
# Auto-export and index WeChat data
|
# Auto-export and index WeChat data
|
||||||
python examples/wechat_rag.py
|
python examples/wechat_rag.py
|
||||||
|
|
||||||
@@ -373,6 +408,9 @@ python examples/wechat_rag.py --max-items 1000
|
|||||||
|
|
||||||
# Use HuggingFace model for Chinese support
|
# Use HuggingFace model for Chinese support
|
||||||
python examples/wechat_rag.py --llm hf --llm-model Qwen/Qwen2.5-1.5B-Instruct
|
python examples/wechat_rag.py --llm hf --llm-model Qwen/Qwen2.5-1.5B-Instruct
|
||||||
|
|
||||||
|
# Use Qwen embedding model (better for Chinese)
|
||||||
|
python examples/wechat_rag.py --embedding-model Qwen/Qwen3-Embedding-0.6B
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
@@ -473,8 +511,8 @@ Options:
|
|||||||
## Benchmarks
|
## Benchmarks
|
||||||
|
|
||||||
|
|
||||||
📊 **[Simple Example: Compare LEANN vs FAISS →](examples/compare_faiss_vs_leann.py)**
|
**[Simple Example: Compare LEANN vs FAISS →](examples/compare_faiss_vs_leann.py)**
|
||||||
### Storage Comparison
|
### 📊 Storage Comparison
|
||||||
|
|
||||||
| System | DPR (2.1M) | Wiki (60M) | Chat (400K) | Email (780K) | Browser (38K) |
|
| System | DPR (2.1M) | Wiki (60M) | Chat (400K) | Email (780K) | Browser (38K) |
|
||||||
|--------|-------------|------------|-------------|--------------|---------------|
|
|--------|-------------|------------|-------------|--------------|---------------|
|
||||||
|
|||||||
Reference in New Issue
Block a user