refactor: Unify examples interface with BaseRAGExample
- Create BaseRAGExample base class for all RAG examples - Refactor 4 examples to use unified interface: - document_rag.py (replaces main_cli_example.py) - email_rag.py (replaces mail_reader_leann.py) - browser_rag.py (replaces google_history_reader_leann.py) - wechat_rag.py (replaces wechat_history_reader_leann.py) - Maintain 100% parameter compatibility with original files - Add interactive mode support for all examples - Unify parameter names (--max-items replaces --max-emails/--max-entries) - Update README.md with new examples usage - Add PARAMETER_CONSISTENCY.md documenting all parameter mappings - Keep main_cli_example.py for backward compatibility with migration notice All default values, LeannBuilder parameters, and chunking settings remain identical to ensure full compatibility with existing indexes.
This commit is contained in:
64
examples/PARAMETER_CONSISTENCY.md
Normal file
64
examples/PARAMETER_CONSISTENCY.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Parameter Consistency Guide
|
||||
|
||||
This document ensures that the new unified interface maintains exact parameter compatibility with the original examples.
|
||||
|
||||
## Parameter Mapping
|
||||
|
||||
### Common Parameters (All Examples)
|
||||
|
||||
| Parameter | Default Value | Notes |
|
||||
|-----------|--------------|-------|
|
||||
| `backend_name` | `"hnsw"` | All examples use HNSW backend |
|
||||
| `graph_degree` | `32` | Consistent across all |
|
||||
| `complexity` | `64` | Consistent across all |
|
||||
| `is_compact` | `True` | NOT `compact_index` |
|
||||
| `is_recompute` | `True` | NOT `use_recomputed_embeddings` |
|
||||
| `num_threads` | `1` | Force single-threaded mode |
|
||||
| `chunk_size` | `256` | Consistent across all |
|
||||
|
||||
### Example-Specific Defaults
|
||||
|
||||
#### document_rag.py (replaces main_cli_example.py)
|
||||
- `index_dir`: `"./test_doc_files"` (matches original)
|
||||
- `chunk_overlap`: `128` (matches original)
|
||||
- `embedding_model`: `"facebook/contriever"`
|
||||
- `embedding_mode`: `"sentence-transformers"`
|
||||
- No max limit by default
|
||||
|
||||
#### email_rag.py (replaces mail_reader_leann.py)
|
||||
- `index_dir`: `"./mail_index"` (matches original)
|
||||
- `max_items`: `1000` (was `max_emails`)
|
||||
- `chunk_overlap`: `25` (matches original)
|
||||
- `embedding_model`: `"facebook/contriever"`
|
||||
- NO `embedding_mode` parameter in LeannBuilder (original doesn't have it)
|
||||
|
||||
#### browser_rag.py (replaces google_history_reader_leann.py)
|
||||
- `index_dir`: `"./google_history_index"` (matches original)
|
||||
- `max_items`: `1000` (was `max_entries`)
|
||||
- `chunk_overlap`: `25` (primary value in original)
|
||||
- `embedding_model`: `"facebook/contriever"`
|
||||
- `embedding_mode`: `"sentence-transformers"`
|
||||
|
||||
#### wechat_rag.py (replaces wechat_history_reader_leann.py)
|
||||
- `index_dir`: `"./wechat_history_magic_test_11Debug_new"` (matches original)
|
||||
- `max_items`: `50` (was `max_entries`, much lower default)
|
||||
- `chunk_overlap`: `25` (matches original)
|
||||
- `embedding_model`: `"Qwen/Qwen3-Embedding-0.6B"` (special model for Chinese)
|
||||
- NO `embedding_mode` parameter in LeannBuilder (original doesn't have it)
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
1. **Parameter Names**: The original files use `is_compact` and `is_recompute`, not the newer names.
|
||||
|
||||
2. **Chunk Overlap**: Most examples use `25` except for documents which uses `128`.
|
||||
|
||||
3. **Embedding Mode**: Only `google_history_reader_leann.py` and `main_cli_example.py` have this parameter.
|
||||
|
||||
4. **Max Items**: Each example has different defaults:
|
||||
- Email/Browser: 1000
|
||||
- WeChat: 50
|
||||
- Documents: unlimited
|
||||
|
||||
5. **Special Cases**:
|
||||
- WeChat uses a specific Chinese embedding model
|
||||
- Email reader includes HTML processing option
|
||||
Reference in New Issue
Block a user