Files
LEANN/examples/PARAMETER_CONSISTENCY.md
Andy Lee 3cde4fc7b3 fix: Fix pre-commit issues and update tests
- Fix import sorting and unused imports
- Update type annotations to use built-in types (list, dict) instead of typing.List/Dict
- Fix trailing whitespace and end-of-file issues
- Fix Chinese fullwidth comma to regular comma
- Update test_main_cli.py to test_document_rag.py
- Add backward compatibility test for main_cli_example.py
- Pass all pre-commit hooks (ruff, ruff-format, etc.)
2025-07-29 10:19:05 -07:00

65 lines
2.5 KiB
Markdown

# Parameter Consistency Guide
This document ensures that the new unified interface maintains exact parameter compatibility with the original examples.
## Parameter Mapping
### Common Parameters (All Examples)
| Parameter | Default Value | Notes |
|-----------|--------------|-------|
| `backend_name` | `"hnsw"` | All examples use HNSW backend |
| `graph_degree` | `32` | Consistent across all |
| `complexity` | `64` | Consistent across all |
| `is_compact` | `True` | NOT `compact_index` |
| `is_recompute` | `True` | NOT `use_recomputed_embeddings` |
| `num_threads` | `1` | Force single-threaded mode |
| `chunk_size` | `256` | Consistent across all |
### Example-Specific Defaults
#### document_rag.py (replaces main_cli_example.py)
- `index_dir`: `"./test_doc_files"` (matches original)
- `chunk_overlap`: `128` (matches original)
- `embedding_model`: `"facebook/contriever"`
- `embedding_mode`: `"sentence-transformers"`
- No max limit by default
#### email_rag.py (replaces mail_reader_leann.py)
- `index_dir`: `"./mail_index"` (matches original)
- `max_items`: `1000` (was `max_emails`)
- `chunk_overlap`: `25` (matches original)
- `embedding_model`: `"facebook/contriever"`
- NO `embedding_mode` parameter in LeannBuilder (original doesn't have it)
#### browser_rag.py (replaces google_history_reader_leann.py)
- `index_dir`: `"./google_history_index"` (matches original)
- `max_items`: `1000` (was `max_entries`)
- `chunk_overlap`: `25` (primary value in original)
- `embedding_model`: `"facebook/contriever"`
- `embedding_mode`: `"sentence-transformers"`
#### wechat_rag.py (replaces wechat_history_reader_leann.py)
- `index_dir`: `"./wechat_history_magic_test_11Debug_new"` (matches original)
- `max_items`: `50` (was `max_entries`, much lower default)
- `chunk_overlap`: `25` (matches original)
- `embedding_model`: `"Qwen/Qwen3-Embedding-0.6B"` (special model for Chinese)
- NO `embedding_mode` parameter in LeannBuilder (original doesn't have it)
## Implementation Notes
1. **Parameter Names**: The original files use `is_compact` and `is_recompute`, not the newer names.
2. **Chunk Overlap**: Most examples use `25` except for documents which uses `128`.
3. **Embedding Mode**: Only `google_history_reader_leann.py` and `main_cli_example.py` have this parameter.
4. **Max Items**: Each example has different defaults:
- Email/Browser: 1000
- WeChat: 50
- Documents: unlimited
5. **Special Cases**:
- WeChat uses a specific Chinese embedding model
- Email reader includes HTML processing option