LEANN

Author	SHA1	Message	Date
Andy Lee	6bde28584b	feat: Add Google Gemini API support for chat and embeddings (#57 ) - Add GeminiChat class with gemini-2.5-flash model support - Add compute_embeddings_gemini function with text-embedding-004 model - Update get_llm factory to support "gemini" type - Update API documentation to include gemini embedding mode - Support temperature, max_tokens, top_p parameters for Gemini chat - Support batch embedding processing with progress bars - Add proper error handling and API key validation	2025-08-15 21:54:11 -07:00
Andy Lee	14e84d9e2d	fix(core): skip empty/invalid chunks before embedding; guard OpenAI embeddings (#55 ) Avoid 400 errors from OpenAI when chunker yields empty strings by filtering invalid texts in LeannBuilder.build_index. Add validation fail-fast in OpenAI embedding path to surface upstream issues earlier. Keeps passages and embeddings aligned during build. Refs #54	2025-08-15 17:53:53 -07:00
yichuan520030910320	42c8370709	add chunk size in leann build& fix batch size in oai& docs	2025-08-14 13:14:14 -07:00
yichuan520030910320	b2390ccc14	[Ollama] fix ollama recompute	2025-08-12 00:24:20 -07:00
Andy Lee	e8fca2c84a	fix: detect and report Ollama embedding dimension inconsistency (#37 ) - Add validation for embedding dimension consistency in Ollama mode - Provide clear error message with troubleshooting steps when dimensions mismatch - Fail fast instead of silent fallback to prevent data corruption Fixes #31	2025-08-11 17:41:52 -07:00
Andy Lee	3ff5aac8e0	Add Ollama embedding support to enable local embedding models (#22 ) * feat: Add Ollama embedding support for local embedding models * docs: Add clear documentation for Ollama embedding usage * feat: Enhance Ollama embedding with better error handling and concurrent processing - Add intelligent model validation and suggestions (inspired by OllamaChat) - Implement concurrent processing for better performance - Add retry mechanism with timeout handling - Provide user-friendly error messages with emojis - Auto-detect and recommend embedding models - Add text truncation for long texts - Improve progress bar display logic * docs: don't mention it in README	2025-08-08 18:44:07 -07:00
Andy Lee	b3e9ee96fa	fix: resolve all ruff linting errors and add lint CI check - Fix ambiguous fullwidth characters (commas, parentheses) in strings and comments - Replace Chinese comments with English equivalents - Fix unused imports with proper noqa annotations for intentional imports - Fix bare except clauses with specific exception types - Fix redefined variables and undefined names - Add ruff noqa annotations for generated protobuf files - Add lint and format check to GitHub Actions CI pipeline	2025-07-26 22:38:13 -07:00
yichuan520030910320	cdb92f7cf4	update pytoml version && fix colab env && fix pdf extract in pip	2025-07-26 16:33:13 -07:00
yichuan520030910320	851f0f04c3	fix some para	2025-07-23 01:46:34 -07:00
Andy Lee	d3f85678ec	perf: much faster loading and embedding serving	2025-07-22 19:38:22 -07:00
Andy Lee	8513471573	feat: make diskann runnable	2025-07-22 14:26:03 -07:00
Andy Lee	ab72a2ab9d	fix: more logs	2025-07-21 23:08:53 -07:00
Andy Lee	c2f35c8e73	fix: logs	2025-07-21 23:02:13 -07:00
Andy Lee	573313f0b6	refactor: logs	2025-07-21 22:45:24 -07:00
Andy Lee	b3970793cf	fix: cache the loaded model	2025-07-21 21:20:53 -07:00
yichuan520030910320	727724990e	add todo	2025-07-21 20:59:09 -07:00
yichuan520030910320	530f6e4af5	add progress bar in build	2025-07-21 20:55:18 -07:00
Andy Lee	1b6272ce0e	Building, CLI tool & Embedding Server Fixed (#5 ) * chore: shorter build time * chore: update faiss * fix: no longger do embedding server reuse * fix: do not reuse emb_server and close it properly * feat: cli tool * feat: cli more args * fix: same embedding logic	2025-07-21 20:17:25 -07:00

18 Commits