- Create BaseRAGExample base class for all RAG examples - Refactor 4 examples to use unified interface: - document_rag.py (replaces main_cli_example.py) - email_rag.py (replaces mail_reader_leann.py) - browser_rag.py (replaces google_history_reader_leann.py) - wechat_rag.py (replaces wechat_history_reader_leann.py) - Maintain 100% parameter compatibility with original files - Add interactive mode support for all examples - Unify parameter names (--max-items replaces --max-emails/--max-entries) - Update README.md with new examples usage - Add PARAMETER_CONSISTENCY.md documenting all parameter mappings - Keep main_cli_example.py for backward compatibility with migration notice All default values, LeannBuilder parameters, and chunking settings remain identical to ensure full compatibility with existing indexes.
2.5 KiB
2.5 KiB
Parameter Consistency Guide
This document ensures that the new unified interface maintains exact parameter compatibility with the original examples.
Parameter Mapping
Common Parameters (All Examples)
| Parameter | Default Value | Notes |
|---|---|---|
backend_name |
"hnsw" |
All examples use HNSW backend |
graph_degree |
32 |
Consistent across all |
complexity |
64 |
Consistent across all |
is_compact |
True |
NOT compact_index |
is_recompute |
True |
NOT use_recomputed_embeddings |
num_threads |
1 |
Force single-threaded mode |
chunk_size |
256 |
Consistent across all |
Example-Specific Defaults
document_rag.py (replaces main_cli_example.py)
index_dir:"./test_doc_files"(matches original)chunk_overlap:128(matches original)embedding_model:"facebook/contriever"embedding_mode:"sentence-transformers"- No max limit by default
email_rag.py (replaces mail_reader_leann.py)
index_dir:"./mail_index"(matches original)max_items:1000(wasmax_emails)chunk_overlap:25(matches original)embedding_model:"facebook/contriever"- NO
embedding_modeparameter in LeannBuilder (original doesn't have it)
browser_rag.py (replaces google_history_reader_leann.py)
index_dir:"./google_history_index"(matches original)max_items:1000(wasmax_entries)chunk_overlap:25(primary value in original)embedding_model:"facebook/contriever"embedding_mode:"sentence-transformers"
wechat_rag.py (replaces wechat_history_reader_leann.py)
index_dir:"./wechat_history_magic_test_11Debug_new"(matches original)max_items:50(wasmax_entries, much lower default)chunk_overlap:25(matches original)embedding_model:"Qwen/Qwen3-Embedding-0.6B"(special model for Chinese)- NO
embedding_modeparameter in LeannBuilder (original doesn't have it)
Implementation Notes
-
Parameter Names: The original files use
is_compactandis_recompute, not the newer names. -
Chunk Overlap: Most examples use
25except for documents which uses128. -
Embedding Mode: Only
google_history_reader_leann.pyandmain_cli_example.pyhave this parameter. -
Max Items: Each example has different defaults:
- Email/Browser: 1000
- WeChat: 50
- Documents: unlimited
-
Special Cases:
- WeChat uses a specific Chinese embedding model
- Email reader includes HTML processing option