Files
LEANN/examples/PARAMETER_CONSISTENCY.md
Andy Lee 3cde4fc7b3 fix: Fix pre-commit issues and update tests
- Fix import sorting and unused imports
- Update type annotations to use built-in types (list, dict) instead of typing.List/Dict
- Fix trailing whitespace and end-of-file issues
- Fix Chinese fullwidth comma to regular comma
- Update test_main_cli.py to test_document_rag.py
- Add backward compatibility test for main_cli_example.py
- Pass all pre-commit hooks (ruff, ruff-format, etc.)
2025-07-29 10:19:05 -07:00

2.5 KiB

Parameter Consistency Guide

This document ensures that the new unified interface maintains exact parameter compatibility with the original examples.

Parameter Mapping

Common Parameters (All Examples)

Parameter Default Value Notes
backend_name "hnsw" All examples use HNSW backend
graph_degree 32 Consistent across all
complexity 64 Consistent across all
is_compact True NOT compact_index
is_recompute True NOT use_recomputed_embeddings
num_threads 1 Force single-threaded mode
chunk_size 256 Consistent across all

Example-Specific Defaults

document_rag.py (replaces main_cli_example.py)

  • index_dir: "./test_doc_files" (matches original)
  • chunk_overlap: 128 (matches original)
  • embedding_model: "facebook/contriever"
  • embedding_mode: "sentence-transformers"
  • No max limit by default

email_rag.py (replaces mail_reader_leann.py)

  • index_dir: "./mail_index" (matches original)
  • max_items: 1000 (was max_emails)
  • chunk_overlap: 25 (matches original)
  • embedding_model: "facebook/contriever"
  • NO embedding_mode parameter in LeannBuilder (original doesn't have it)

browser_rag.py (replaces google_history_reader_leann.py)

  • index_dir: "./google_history_index" (matches original)
  • max_items: 1000 (was max_entries)
  • chunk_overlap: 25 (primary value in original)
  • embedding_model: "facebook/contriever"
  • embedding_mode: "sentence-transformers"

wechat_rag.py (replaces wechat_history_reader_leann.py)

  • index_dir: "./wechat_history_magic_test_11Debug_new" (matches original)
  • max_items: 50 (was max_entries, much lower default)
  • chunk_overlap: 25 (matches original)
  • embedding_model: "Qwen/Qwen3-Embedding-0.6B" (special model for Chinese)
  • NO embedding_mode parameter in LeannBuilder (original doesn't have it)

Implementation Notes

  1. Parameter Names: The original files use is_compact and is_recompute, not the newer names.

  2. Chunk Overlap: Most examples use 25 except for documents which uses 128.

  3. Embedding Mode: Only google_history_reader_leann.py and main_cli_example.py have this parameter.

  4. Max Items: Each example has different defaults:

    • Email/Browser: 1000
    • WeChat: 50
    • Documents: unlimited
  5. Special Cases:

    • WeChat uses a specific Chinese embedding model
    • Email reader includes HTML processing option