LEANN

Author	SHA1	Message	Date
Andy Lee	45bdad4fa7	debug: add detailed logging for CI path resolution debugging - Add logging in DiskANN embedding server to show metadata_file_path - Add debug logging in PassageManager to trace path resolution - This will help identify why CI fails to find passage files	2025-08-07 00:00:12 -07:00
Andy Lee	d217adbe40	fix: diskann building and partitioning	2025-08-06 21:32:03 -07:00
yichuan520030910320	669e622430	chore: Update DiskANN submodule to latest with graph partition tools - Update DiskANN submodule to commit b2dc4ea - Includes graph partition tools and CMake integration - Enables graph partitioning functionality in DiskANN backend	2025-08-05 23:14:19 -07:00
yichuan520030910320	f94ce63d51	add gpt oss! serve your RAG using ollama	2025-08-05 16:49:52 -07:00
Andy Lee	0d448c4a41	docs: config guidance (#17 ) * docs: config guidance * feat: add comprehensive configuration guide and update README - Create docs/configuration-guide.md with detailed guidance on: - Embedding model selection (small/medium/large) - Index selection (HNSW vs DiskANN) - LLM engine and model comparison - Parameter tuning (build/search complexity, top-k) - Performance optimization tips - Deep dive into LEANN's recomputation feature - Update README.md to link to the configuration guide - Include latest 2025 model recommendations (Qwen3, DeepSeek-R1, O3-mini) * chore: move evaluation data .gitattributes to correct location * docs: Weaken DiskANN emphasis in README - Change backend description to emphasize HNSW as default - DiskANN positioned as optional for billion-scale datasets - Simplify evaluation commands to be more generic * docs: Adjust DiskANN positioning in features and roadmap - features.md: Put HNSW/FAISS first as default, DiskANN as optional - roadmap.md: Reorder to show HNSW integration before DiskANN - Consistent with positioning DiskANN as advanced option for large-scale use * docs: Improve configuration guide based on feedback - List specific files in default data/ directory (2 AI papers, literature, tech report) - Update examples to use English and better RAG-suitable queries - Change full dataset reference to use --max-items -1 - Adjust small model guidance about upgrading to larger models when time allows - Update top-k defaults to reflect actual default of 20 - Ensure consistent use of full model name Qwen/Qwen3-Embedding-0.6B - Reorder optimization steps, move MLX to third position - Remove incorrect chunk size tuning guidance - Change README from 'Having trouble' to 'Need best practices' * docs: Address all configuration guide feedback - Fix grammar: 'If time is not a constraint' instead of 'time expense is not large' - Highlight Qwen3-Embedding-0.6B performance (nearly OpenAI API level) - Add OpenAI quick start section with configuration example - Fold Cloud vs Local trade-offs into collapsible section - Update HNSW as 'default and recommended for extreme low storage' - Add DiskANN beta warning and explain PQ+rerank architecture - Expand Ollama models: add qwen3:0.6b, 4b, 7b variants - Note OpenAI as current default but recommend Ollama switch - Add 'need to install extra software' warning for Ollama - Remove incorrect latency numbers from search-complexity recommendations * docs: add a link	2025-08-04 22:50:32 -07:00
yichuan520030910320	33521d6d00	add logs	2025-08-04 14:15:52 -07:00
Andy Lee	54df6310c5	fix: diskann build and prevent termination from hanging - Fix OpenMP library linking in DiskANN CMake configuration - Add timeout protection for HuggingFace model loading to prevent hangs - Improve embedding server process termination with better timeouts - Make DiskANN backend default enabled alongside HNSW - Update documentation to reflect both backends included by default	2025-08-03 21:16:52 -07:00
yichuan520030910320	af1790395a	fix ruff errors and formatting	2025-07-27 02:22:54 -07:00
yichuan520030910320	52153bbb69	update faiss compare	2025-07-25 01:45:50 -07:00
Andy Lee	673fd9b7cd	fix: upgrade to actions v4 and handle manylinux2014 compatibility - Upgrade all GitHub Actions to v4 (v3 is deprecated) - Use manual git checkout in manylinux2014 containers to avoid Node.js issues - Update artifact naming to ensure uniqueness (required by v4) - Add fail-fast: false to build strategies - This maintains manylinux2014 compatibility while using latest actions	2025-07-25 00:20:21 -07:00
yichuan520030910320	b6d43f5fd9	add gif	2025-07-25 00:12:35 -07:00
Andy Lee	a44dccecac	fix: make TestPyPI upload optional and non-blocking - Add continue-on-error to TestPyPI step - Check if TEST_PYPI_API_TOKEN exists before attempting upload - Add graceful failure handling with clear messages - Update docs to explain TestPyPI token configuration - Clarify that TestPyPI testing is optional Now the release won't fail if TestPyPI is not configured or upload fails	2025-07-24 16:02:07 -07:00
yichuan520030910320	de252fef31	[chat] update 30s example	2025-07-24 14:40:33 -07:00
Andy Lee	7add391b2c	chore: build and package	2025-07-24 00:47:46 -07:00
yichuan520030910320	88eca75917	fix readme	2025-07-23 18:22:10 -07:00
Andy Lee	e86da38726	fix: ollama hint for similar models	2025-07-23 15:45:10 -07:00
yichuan520030910320	5dd74982ba	fix readme	2025-07-22 23:14:31 -07:00
Andy Lee	30e5f12616	docs: quick start	2025-07-22 22:33:04 -07:00
yichuan520030910320	aa9a14a917	make the email wonderful format	2025-07-22 21:41:58 -07:00
Andy Lee	d3f85678ec	perf: much faster loading and embedding serving	2025-07-22 19:38:22 -07:00
yichuan520030910320	2a96d05b21	upd readme	2025-07-22 17:06:33 -07:00
Andy Lee	8513471573	feat: make diskann runnable	2025-07-22 14:26:03 -07:00
Andy Lee	71ef4b7d4c	fix: reproducible dpr on mac	2025-07-12 18:13:22 -07:00
Andy Lee	8e0ab4a28d	chore: update deps	2025-07-12 22:48:13 +00:00
Andy Lee	eb6f504789	Datastore reproduce (#3 ) * fix: diskann zmq port and passages * feat: auto discovery of packages and fix passage gen for diskann * docs: embedding pruning * refactor: passage structure * feat: reproducible research datas, rpj_wiki & dpr * refactor: chat and base searcher * feat: chat on mps	2025-07-11 23:37:23 -07:00
Andy Lee	27b3a26e75	fix(deps): Update DiskANN with cleaned up CMake configuration	2025-07-08 23:27:05 +00:00
Andy Lee	41d872504e	feat(deps): Update DiskANN to use system-installed Boost and Protobuf	2025-07-08 23:13:36 +00:00
Andy Lee	963cd05273	chore: diskann modules	2025-07-08 21:57:38 +00:00
Andy Lee	09b6e67baf	chore: diskann upg boost	2025-07-08 21:44:44 +00:00
Andy Lee	a6c400cd4f	chroe: linux boost and protobuf	2025-07-08 21:25:43 +00:00
Andy Lee	c013e5ccce	chore: linux deps	2025-07-08 13:55:39 -07:00
Andy Lee	f25a1a3840	chore: macos compatible	2025-07-08 13:32:00 -07:00
yichuan520030910320	44369a8138	update diskann module	2025-07-07 18:27:07 -07:00
Andy Lee	cf17c85607	Make DiskANN and HNSW work on main example (#2 ) * fix: diskann zmq port and passages * feat: auto discovery of packages and fix passage gen for diskann	2025-07-05 22:18:12 -07:00
yichuan520030910320	a075fd6f47	Add DiskANN and faiss as submodules	2025-06-30 10:11:39 +00:00
yichuan520030910320	46f6cc100b	Initial commit	2025-06-30 09:05:05 +00:00

36 Commits