refactor: Unify examples interface with BaseRAGExample

- Create BaseRAGExample base class for all RAG examples - Refactor 4 examples to use unified interface: - document_rag.py (replaces main_cli_example.py) - email_rag.py (replaces mail_reader_leann.py) - browser_rag.py (replaces google_history_reader_leann.py) - wechat_rag.py (replaces wechat_history_reader_leann.py) - Maintain 100% parameter compatibility with original files - Add interactive mode support for all examples - Unify parameter names (--max-items replaces --max-emails/--max-entries) - Update README.md with new examples usage - Add PARAMETER_CONSISTENCY.md documenting all parameter mappings - Keep main_cli_example.py for backward compatibility with migration notice All default values, LeannBuilder parameters, and chunking settings remain identical to ensure full compatibility with existing indexes.
2025-07-28 23:11:16 -07:00
parent 19bcc07814
commit 46f6f76fc3
8 changed files with 988 additions and 180 deletions
--- a/README.md
+++ b/README.md
@@ -178,21 +178,39 @@ The example below asks a question about summarizing two papers (uses default dat

 ```bash
 source .venv/bin/activate
-python ./examples/main_cli_example.py
+python ./examples/document_rag.py --query "What are the main techniques LEANN explores?"
 ```

 <details>
 <summary><strong>📋 Click to expand: User Configurable Arguments</strong></summary>

+#### Core Parameters (All Examples Share These)
 ```bash
-# Use custom index directory
-python examples/main_cli_example.py --index-dir "./my_custom_index"
+--index-dir DIR          # Directory to store the index
+--query "YOUR QUESTION"  # Single query to run (interactive mode if omitted)
+--max-items N           # Max items to process (default: 1000, -1 for all)
+--force-rebuild         # Force rebuild index even if it exists

-# Use custom data directory
-python examples/main_cli_example.py --data-dir "./my_documents"
+# Embedding Parameters
+--embedding-model MODEL  # e.g., facebook/contriever, text-embedding-3-small
+--embedding-mode MODE    # sentence-transformers, openai, or mlx

-# Ask a specific question
-python examples/main_cli_example.py --query "What are the main findings in these papers?"
+# LLM Parameters  
+--llm TYPE              # openai, ollama, or hf
+--llm-model MODEL       # e.g., gpt-4o, llama3.2:1b
+--top-k N               # Number of results to retrieve (default: 20)
+```
+
+#### Document-Specific Parameters
+```bash
+# Process custom documents
+python examples/document_rag.py --data-dir "./my_documents" --file-types .pdf .txt .md
+
+# Process with custom chunking
+python examples/document_rag.py --chunk-size 512 --chunk-overlap 256
+
+# Use different LLM
+python examples/document_rag.py --llm ollama --llm-model llama3.2:1b
 ```

 </details>
@@ -208,28 +226,29 @@ python examples/main_cli_example.py --query "What are the main findings in these

 **Note:** You need to grant full disk access to your terminal/VS Code in System Preferences → Privacy & Security → Full Disk Access.
 ```bash
-python examples/mail_reader_leann.py --query "What's the food I ordered by DoorDash or Uber Eats mostly?"
+python examples/email_rag.py --query "What's the food I ordered by DoorDash or Uber Eats mostly?"
 ```
 **780K email chunks → 78MB storage.** Finally, search your email like you search Google.

 <details>
 <summary><strong>📋 Click to expand: User Configurable Arguments</strong></summary>

+#### Email-Specific Parameters
 ```bash
-# Use default mail path (works for most macOS setups)
-python examples/mail_reader_leann.py
+# Auto-detect and process all Apple Mail accounts
+python examples/email_rag.py

-# Run with custom index directory
-python examples/mail_reader_leann.py --index-dir "./my_mail_index"
+# Process specific mail directory
+python examples/email_rag.py --mail-path "~/Library/Mail/V10/..."

-# Process all emails (may take time but indexes everything)
-python examples/mail_reader_leann.py --max-emails -1
+# Process all emails (may take time)
+python examples/email_rag.py --max-items -1

-# Limit number of emails processed (useful for testing)
-python examples/mail_reader_leann.py --max-emails 1000
+# Include HTML content
+python examples/email_rag.py --include-html

-# Run a single query
-python examples/mail_reader_leann.py --query "What did my boss say about deadlines?"
+# Use different embedding model
+python examples/email_rag.py --embedding-model text-embedding-3-small --embedding-mode openai
 ```

 </details>
@@ -250,25 +269,29 @@ Once the index is built, you can ask questions like:
 </p>

 ```bash
-python examples/google_history_reader_leann.py --query "Tell me my browser history about machine learning?"
+python examples/browser_rag.py --query "Tell me my browser history about machine learning?"
 ```
 **38K browser entries → 6MB storage.** Your browser history becomes your personal search engine.

 <details>
 <summary><strong>📋 Click to expand: User Configurable Arguments</strong></summary>

+#### Browser-Specific Parameters
 ```bash
-# Use default Chrome profile (auto-finds all profiles)
-python examples/google_history_reader_leann.py
+# Auto-detect and process all Chrome profiles
+python examples/browser_rag.py

-# Run with custom index directory
-python examples/google_history_reader_leann.py --index-dir "./my_chrome_index"
+# Process specific Chrome profile
+python examples/browser_rag.py --chrome-profile "~/Library/Application Support/Google/Chrome/Default"

-# Limit number of history entries processed (useful for testing)
-python examples/google_history_reader_leann.py --max-entries 500
+# Limit history entries for testing
+python examples/browser_rag.py --max-items 500

-# Run a single query
-python examples/google_history_reader_leann.py --query "What websites did I visit about machine learning?"
+# Interactive search mode
+python examples/browser_rag.py  # Without --query for interactive mode
+
+# Use local LLM for privacy
+python examples/browser_rag.py --llm ollama --llm-model llama3.2:1b
 ```

 </details>
@@ -308,7 +331,7 @@ Once the index is built, you can ask questions like:
 </p>

 ```bash
-python examples/wechat_history_reader_leann.py --query "Show me all group chats about weekend plans"
+python examples/wechat_rag.py --query "Show me all group chats about weekend plans"
 ```
 **400K messages → 64MB storage** Search years of chat history in any language.

@@ -334,21 +357,22 @@ Failed to find or export WeChat data. Exiting.
 <details>
 <summary><strong>📋 Click to expand: User Configurable Arguments</strong></summary>

+#### WeChat-Specific Parameters
 ```bash
-# Use default settings (recommended for first run)
-python examples/wechat_history_reader_leann.py
+# Auto-export and index WeChat data
+python examples/wechat_rag.py

-# Run with custom export directory and wehn we run the first time, LEANN will export all chat history automatically for you
-python examples/wechat_history_reader_leann.py --export-dir "./my_wechat_exports"
+# Use custom export directory
+python examples/wechat_rag.py --export-dir "./my_wechat_exports"

-# Run with custom index directory
-python examples/wechat_history_reader_leann.py --index-dir "./my_wechat_index"
+# Force re-export even if data exists
+python examples/wechat_rag.py --force-export

-# Limit number of chat entries processed (useful for testing)
-python examples/wechat_history_reader_leann.py --max-entries 1000
+# Limit chat entries for testing
+python examples/wechat_rag.py --max-items 1000

-# Run a single query
-python examples/wechat_history_reader_leann.py --query "Show me conversations about travel plans"
+# Use HuggingFace model for Chinese support
+python examples/wechat_rag.py --llm hf --llm-model Qwen/Qwen2.5-1.5B-Instruct
 ```

 </details>