Files
LEANN/examples/PARAMETER_CONSISTENCY.md
Andy Lee 46f6f76fc3 refactor: Unify examples interface with BaseRAGExample
- Create BaseRAGExample base class for all RAG examples
- Refactor 4 examples to use unified interface:
  - document_rag.py (replaces main_cli_example.py)
  - email_rag.py (replaces mail_reader_leann.py)
  - browser_rag.py (replaces google_history_reader_leann.py)
  - wechat_rag.py (replaces wechat_history_reader_leann.py)
- Maintain 100% parameter compatibility with original files
- Add interactive mode support for all examples
- Unify parameter names (--max-items replaces --max-emails/--max-entries)
- Update README.md with new examples usage
- Add PARAMETER_CONSISTENCY.md documenting all parameter mappings
- Keep main_cli_example.py for backward compatibility with migration notice

All default values, LeannBuilder parameters, and chunking settings
remain identical to ensure full compatibility with existing indexes.
2025-07-28 23:11:16 -07:00

2.5 KiB

Parameter Consistency Guide

This document ensures that the new unified interface maintains exact parameter compatibility with the original examples.

Parameter Mapping

Common Parameters (All Examples)

Parameter Default Value Notes
backend_name "hnsw" All examples use HNSW backend
graph_degree 32 Consistent across all
complexity 64 Consistent across all
is_compact True NOT compact_index
is_recompute True NOT use_recomputed_embeddings
num_threads 1 Force single-threaded mode
chunk_size 256 Consistent across all

Example-Specific Defaults

document_rag.py (replaces main_cli_example.py)

  • index_dir: "./test_doc_files" (matches original)
  • chunk_overlap: 128 (matches original)
  • embedding_model: "facebook/contriever"
  • embedding_mode: "sentence-transformers"
  • No max limit by default

email_rag.py (replaces mail_reader_leann.py)

  • index_dir: "./mail_index" (matches original)
  • max_items: 1000 (was max_emails)
  • chunk_overlap: 25 (matches original)
  • embedding_model: "facebook/contriever"
  • NO embedding_mode parameter in LeannBuilder (original doesn't have it)

browser_rag.py (replaces google_history_reader_leann.py)

  • index_dir: "./google_history_index" (matches original)
  • max_items: 1000 (was max_entries)
  • chunk_overlap: 25 (primary value in original)
  • embedding_model: "facebook/contriever"
  • embedding_mode: "sentence-transformers"

wechat_rag.py (replaces wechat_history_reader_leann.py)

  • index_dir: "./wechat_history_magic_test_11Debug_new" (matches original)
  • max_items: 50 (was max_entries, much lower default)
  • chunk_overlap: 25 (matches original)
  • embedding_model: "Qwen/Qwen3-Embedding-0.6B" (special model for Chinese)
  • NO embedding_mode parameter in LeannBuilder (original doesn't have it)

Implementation Notes

  1. Parameter Names: The original files use is_compact and is_recompute, not the newer names.

  2. Chunk Overlap: Most examples use 25 except for documents which uses 128.

  3. Embedding Mode: Only google_history_reader_leann.py and main_cli_example.py have this parameter.

  4. Max Items: Each example has different defaults:

    • Email/Browser: 1000
    • WeChat: 50
    • Documents: unlimited
  5. Special Cases:

    • WeChat uses a specific Chinese embedding model
    • Email reader includes HTML processing option