Commit Graph

15 Commits

Author SHA1 Message Date
Andy Lee
9a5c197acd fix: auto-detect normalized embeddings and use cosine distance
- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.
2025-07-27 20:21:05 -07:00
Andy Lee
a7cba078dd chore: consolidate essential fixes and add pre-commit hooks
- Add pre-commit configuration with ruff and black
- Fix lint CI job to use uv tool install instead of sync
- Add essential LlamaIndex dependencies to leann-core

Co-Authored-By: Yichuan Wang <73766326+yichuan-w@users.noreply.github.com>
2025-07-27 01:24:24 -07:00
Andy Lee
b3e9ee96fa fix: resolve all ruff linting errors and add lint CI check
- Fix ambiguous fullwidth characters (commas, parentheses) in strings and comments
- Replace Chinese comments with English equivalents
- Fix unused imports with proper noqa annotations for intentional imports
- Fix bare except clauses with specific exception types
- Fix redefined variables and undefined names
- Add ruff noqa annotations for generated protobuf files
- Add lint and format check to GitHub Actions CI pipeline
2025-07-26 22:38:13 -07:00
yichuan520030910320
c87c0fe662 update colab install & fix colab path 2025-07-26 18:07:31 -07:00
yichuan520030910320
cdb92f7cf4 update pytoml version && fix colab env && fix pdf extract in pip 2025-07-26 16:33:13 -07:00
yichuan520030910320
d502fa24b0 [installation] update install for linux 2025-07-24 02:17:17 +00:00
Andy Lee
2420c5fd35 chore: update sentence-transformer to prevent MixIn not found error 2025-07-22 23:27:25 -07:00
Andy Lee
1b6272ce0e Building, CLI tool & Embedding Server Fixed (#5)
* chore: shorter build time

* chore: update faiss

* fix: no longger do embedding server reuse

* fix: do not reuse emb_server and close it properly

* feat: cli tool

* feat: cli more args

* fix: same embedding logic
2025-07-21 20:17:25 -07:00
Andy Lee
3da5b44d7f fix: mlx when searching, added to embedding_server 2025-07-14 01:11:21 -07:00
Andy Lee
eb6f504789 Datastore reproduce (#3)
* fix: diskann zmq port and passages

* feat: auto discovery of packages and fix passage gen for diskann

* docs: embedding pruning

* refactor: passage structure

* feat: reproducible research datas, rpj_wiki & dpr

* refactor: chat and base searcher

* feat: chat on mps
2025-07-11 23:37:23 -07:00
yichuan520030910320
dfca00c21b add mac support in this repo 2025-07-07 18:22:24 -07:00
Andy Lee
0aa84e147b feat: hnsw embedding server and csr format 2025-07-05 23:04:41 +00:00
yichuan520030910320
ee507bfe7a Initial commit 2025-06-30 11:01:12 +00:00
Andy Lee
30898814ae chore: docling deps 2025-06-30 10:52:10 +00:00
yichuan520030910320
46f6cc100b Initial commit 2025-06-30 09:05:05 +00:00