Commit Graph

15 Commits

Author SHA1 Message Date
Andy Lee
5c8921673a fix: auto-detect normalized embeddings and use cosine distance (#8)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format
2025-07-27 21:19:29 -07:00
Andy Lee
b3e9ee96fa fix: resolve all ruff linting errors and add lint CI check
- Fix ambiguous fullwidth characters (commas, parentheses) in strings and comments
- Replace Chinese comments with English equivalents
- Fix unused imports with proper noqa annotations for intentional imports
- Fix bare except clauses with specific exception types
- Fix redefined variables and undefined names
- Add ruff noqa annotations for generated protobuf files
- Add lint and format check to GitHub Actions CI pipeline
2025-07-26 22:38:13 -07:00
Andy Lee
8513471573 feat: make diskann runnable 2025-07-22 14:26:03 -07:00
Andy Lee
b3970793cf fix: cache the loaded model 2025-07-21 21:20:53 -07:00
Andy Lee
1b6272ce0e Building, CLI tool & Embedding Server Fixed (#5)
* chore: shorter build time

* chore: update faiss

* fix: no longger do embedding server reuse

* fix: do not reuse emb_server and close it properly

* feat: cli tool

* feat: cli more args

* fix: same embedding logic
2025-07-21 20:17:25 -07:00
Andy Lee
2a1a152073 refactor: nits 2025-07-16 15:39:58 -07:00
Andy Lee
7b9406a3ea feat: different search_args and docstrings 2025-07-16 15:25:58 -07:00
Andy Lee
eb6f504789 Datastore reproduce (#3)
* fix: diskann zmq port and passages

* feat: auto discovery of packages and fix passage gen for diskann

* docs: embedding pruning

* refactor: passage structure

* feat: reproducible research datas, rpj_wiki & dpr

* refactor: chat and base searcher

* feat: chat on mps
2025-07-11 23:37:23 -07:00
Andy Lee
cf17c85607 Make DiskANN and HNSW work on main example (#2)
* fix: diskann zmq port and passages

* feat: auto discovery of packages and fix passage gen for diskann
2025-07-05 22:18:12 -07:00
Andy Lee
a38bc0a3fc refactor: embedding server manager 2025-07-06 01:54:46 +00:00
yichuan520030910320
df63526503 merge main 2025-07-06 00:50:58 +00:00
yichuan520030910320
e92deee1e8 fix larger file read and add faq 2025-07-06 00:48:57 +00:00
Andy Lee
910927a405 feat: support more embedders 2025-07-06 00:35:07 +00:00
yichuan520030910320
371e3de04e add configuable funcname 2025-07-01 05:02:01 +00:00
yichuan520030910320
46f6cc100b Initial commit 2025-06-30 09:05:05 +00:00