* docs: config guidance * feat: add comprehensive configuration guide and update README - Create docs/configuration-guide.md with detailed guidance on: - Embedding model selection (small/medium/large) - Index selection (HNSW vs DiskANN) - LLM engine and model comparison - Parameter tuning (build/search complexity, top-k) - Performance optimization tips - Deep dive into LEANN's recomputation feature - Update README.md to link to the configuration guide - Include latest 2025 model recommendations (Qwen3, DeepSeek-R1, O3-mini) * chore: move evaluation data .gitattributes to correct location * docs: Weaken DiskANN emphasis in README - Change backend description to emphasize HNSW as default - DiskANN positioned as optional for billion-scale datasets - Simplify evaluation commands to be more generic * docs: Adjust DiskANN positioning in features and roadmap - features.md: Put HNSW/FAISS first as default, DiskANN as optional - roadmap.md: Reorder to show HNSW integration before DiskANN - Consistent with positioning DiskANN as advanced option for large-scale use * docs: Improve configuration guide based on feedback - List specific files in default data/ directory (2 AI papers, literature, tech report) - Update examples to use English and better RAG-suitable queries - Change full dataset reference to use --max-items -1 - Adjust small model guidance about upgrading to larger models when time allows - Update top-k defaults to reflect actual default of 20 - Ensure consistent use of full model name Qwen/Qwen3-Embedding-0.6B - Reorder optimization steps, move MLX to third position - Remove incorrect chunk size tuning guidance - Change README from 'Having trouble' to 'Need best practices' * docs: Address all configuration guide feedback - Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations * docs: add a link
83 lines
4.1 KiB
Plaintext
83 lines
4.1 KiB
Plaintext
*.7z filter=lfs diff=lfs merge=lfs -text
|
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
|
*.lz4 filter=lfs diff=lfs merge=lfs -text
|
|
*.mds filter=lfs diff=lfs merge=lfs -text
|
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
|
*.model filter=lfs diff=lfs merge=lfs -text
|
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
# Audio files - uncompressed
|
|
*.pcm filter=lfs diff=lfs merge=lfs -text
|
|
*.sam filter=lfs diff=lfs merge=lfs -text
|
|
*.raw filter=lfs diff=lfs merge=lfs -text
|
|
# Audio files - compressed
|
|
*.aac filter=lfs diff=lfs merge=lfs -text
|
|
*.flac filter=lfs diff=lfs merge=lfs -text
|
|
*.mp3 filter=lfs diff=lfs merge=lfs -text
|
|
*.ogg filter=lfs diff=lfs merge=lfs -text
|
|
*.wav filter=lfs diff=lfs merge=lfs -text
|
|
# Image files - uncompressed
|
|
*.bmp filter=lfs diff=lfs merge=lfs -text
|
|
*.gif filter=lfs diff=lfs merge=lfs -text
|
|
*.png filter=lfs diff=lfs merge=lfs -text
|
|
*.tiff filter=lfs diff=lfs merge=lfs -text
|
|
# Image files - compressed
|
|
*.jpg filter=lfs diff=lfs merge=lfs -text
|
|
*.jpeg filter=lfs diff=lfs merge=lfs -text
|
|
*.webp filter=lfs diff=lfs merge=lfs -text
|
|
# Video files - compressed
|
|
*.mp4 filter=lfs diff=lfs merge=lfs -text
|
|
*.webm filter=lfs diff=lfs merge=lfs -text
|
|
ground_truth/dpr/id_map.json filter=lfs diff=lfs merge=lfs -text
|
|
indices/dpr/dpr_diskann.passages.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/dpr/dpr_diskann.passages.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/dpr/dpr_diskann_disk.index filter=lfs diff=lfs merge=lfs -text
|
|
indices/dpr/leann.labels.map filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/leann.labels.map filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.index filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.0.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.0.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.1.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.1.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.2.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.2.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.3.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.3.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.4.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.4.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.5.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.5.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.6.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.6.jsonl filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.7.idx filter=lfs diff=lfs merge=lfs -text
|
|
indices/rpj_wiki/rpj_wiki.passages.7.jsonl filter=lfs diff=lfs merge=lfs -text
|