Files
LEANN/examples/PARAMETER_CONSISTENCY.md
Andy Lee 46f6f76fc3 refactor: Unify examples interface with BaseRAGExample
- Create BaseRAGExample base class for all RAG examples
- Refactor 4 examples to use unified interface:
  - document_rag.py (replaces main_cli_example.py)
  - email_rag.py (replaces mail_reader_leann.py)
  - browser_rag.py (replaces google_history_reader_leann.py)
  - wechat_rag.py (replaces wechat_history_reader_leann.py)
- Maintain 100% parameter compatibility with original files
- Add interactive mode support for all examples
- Unify parameter names (--max-items replaces --max-emails/--max-entries)
- Update README.md with new examples usage
- Add PARAMETER_CONSISTENCY.md documenting all parameter mappings
- Keep main_cli_example.py for backward compatibility with migration notice

All default values, LeannBuilder parameters, and chunking settings
remain identical to ensure full compatibility with existing indexes.
2025-07-28 23:11:16 -07:00

64 lines
2.5 KiB
Markdown

# Parameter Consistency Guide
This document ensures that the new unified interface maintains exact parameter compatibility with the original examples.
## Parameter Mapping
### Common Parameters (All Examples)
| Parameter | Default Value | Notes |
|-----------|--------------|-------|
| `backend_name` | `"hnsw"` | All examples use HNSW backend |
| `graph_degree` | `32` | Consistent across all |
| `complexity` | `64` | Consistent across all |
| `is_compact` | `True` | NOT `compact_index` |
| `is_recompute` | `True` | NOT `use_recomputed_embeddings` |
| `num_threads` | `1` | Force single-threaded mode |
| `chunk_size` | `256` | Consistent across all |
### Example-Specific Defaults
#### document_rag.py (replaces main_cli_example.py)
- `index_dir`: `"./test_doc_files"` (matches original)
- `chunk_overlap`: `128` (matches original)
- `embedding_model`: `"facebook/contriever"`
- `embedding_mode`: `"sentence-transformers"`
- No max limit by default
#### email_rag.py (replaces mail_reader_leann.py)
- `index_dir`: `"./mail_index"` (matches original)
- `max_items`: `1000` (was `max_emails`)
- `chunk_overlap`: `25` (matches original)
- `embedding_model`: `"facebook/contriever"`
- NO `embedding_mode` parameter in LeannBuilder (original doesn't have it)
#### browser_rag.py (replaces google_history_reader_leann.py)
- `index_dir`: `"./google_history_index"` (matches original)
- `max_items`: `1000` (was `max_entries`)
- `chunk_overlap`: `25` (primary value in original)
- `embedding_model`: `"facebook/contriever"`
- `embedding_mode`: `"sentence-transformers"`
#### wechat_rag.py (replaces wechat_history_reader_leann.py)
- `index_dir`: `"./wechat_history_magic_test_11Debug_new"` (matches original)
- `max_items`: `50` (was `max_entries`, much lower default)
- `chunk_overlap`: `25` (matches original)
- `embedding_model`: `"Qwen/Qwen3-Embedding-0.6B"` (special model for Chinese)
- NO `embedding_mode` parameter in LeannBuilder (original doesn't have it)
## Implementation Notes
1. **Parameter Names**: The original files use `is_compact` and `is_recompute`, not the newer names.
2. **Chunk Overlap**: Most examples use `25` except for documents which uses `128`.
3. **Embedding Mode**: Only `google_history_reader_leann.py` and `main_cli_example.py` have this parameter.
4. **Max Items**: Each example has different defaults:
- Email/Browser: 1000
- WeChat: 50
- Documents: unlimited
5. **Special Cases**:
- WeChat uses a specific Chinese embedding model
- Email reader includes HTML processing option