Commit Graph

13 Commits

Author SHA1 Message Date
Andy Lee
d505dcc5e3 Fix/OpenAI embeddings cosine distance (#10)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format

* feat: add OpenAI embeddings support to google_history_reader_leann.py

- Add --embedding-model and --embedding-mode arguments
- Support automatic detection of normalized embeddings
- Works correctly with cosine distance for OpenAI embeddings

* feat: add --use-existing-index option to google_history_reader_leann.py

- Allow using existing index without rebuilding
- Useful for testing pre-built indices

* fix: Improve OpenAI embeddings handling in HNSW backend
2025-07-28 14:35:49 -07:00
yichuan520030910320
af1790395a fix ruff errors and formatting 2025-07-27 02:22:54 -07:00
Andy Lee
b3e9ee96fa fix: resolve all ruff linting errors and add lint CI check
- Fix ambiguous fullwidth characters (commas, parentheses) in strings and comments
- Replace Chinese comments with English equivalents
- Fix unused imports with proper noqa annotations for intentional imports
- Fix bare except clauses with specific exception types
- Fix redefined variables and undefined names
- Add ruff noqa annotations for generated protobuf files
- Add lint and format check to GitHub Actions CI pipeline
2025-07-26 22:38:13 -07:00
yichuan520030910320
8537a6b17e default args change 2025-07-26 21:51:14 -07:00
yichuan520030910320
cdb92f7cf4 update pytoml version && fix colab env && fix pdf extract in pip 2025-07-26 16:33:13 -07:00
yichuan520030910320
b6d43f5fd9 add gif 2025-07-25 00:12:35 -07:00
yichuan520030910320
f3f5d91207 make the google history wonderful format 2025-07-22 20:43:56 -07:00
yichuan520030910320
0db81c16cd update readme and chrome example 2025-07-17 12:58:11 -07:00
yichuan520030910320
e5a9ca8787 fix mem compare 2025-07-14 22:55:10 -07:00
yichuan520030910320
c611d0f30f upd readme mail application 2025-07-13 21:48:57 -07:00
Fangzhou66
88ca09440d fix some hf problem 2025-07-12 16:13:15 -07:00
yichuan520030910320
74ffd7ec64 add email test code 2025-07-11 23:59:47 -07:00
yichuan520030910320
8239bbb48f add google hostory api 2025-07-11 21:21:36 -07:00