Commit Graph

322 Commits

Author SHA1 Message Date
yichuan520030910320
27d0d73f99 add some env in ci 2025-07-28 16:11:44 -07:00
Andy Lee
b124709bcd fix: use virtual environment in CI instead of system packages
- uv-managed Python environments don't allow --system installs
- Create and activate virtual environment before installing packages
- Update all CI steps to use the virtual environment
2025-07-28 16:04:49 -07:00
Andy Lee
78251a6d4c fix: remove Python 3.10+ dependencies for compatibility
- Comment out llama-index-readers-docling and llama-index-node-parser-docling
- These packages require Python >= 3.10 and were causing CI failures on Python 3.9
- Regenerate uv.lock file to resolve dependency conflicts
2025-07-28 15:50:05 -07:00
Andy Lee
16c833da86 fix: handle MPS memory issues in CI tests
- Use smaller MiniLM-L6-v2 model (384 dimensions) for README tests in CI
- Skip other memory-intensive tests in CI environment
- Add minimal CI tests that don't require model loading
- Set CI environment variable and disable MPS fallback
- Ensure README examples always run correctly in CI
2025-07-28 15:26:23 -07:00
Andy Lee
c246cb4a01 fix: align Python version requirements to 3.9
- Update root project to support Python 3.9, matching subpackages
- Restore macOS Python 3.9 support in CI
- This fixes the CI failure for Python 3.9 environments
2025-07-28 15:09:59 -07:00
Andy Lee
0f34aee5db fix: update macOS deployment target for DiskANN to 13.3
- DiskANN uses sgesdd_ LAPACK function which is only available on macOS 13.3+
- Update MACOSX_DEPLOYMENT_TARGET from 11.0 to 13.3 for DiskANN builds
- This fixes the compilation error on GitHub Actions macOS runners
2025-07-28 15:00:50 -07:00
Andy Lee
3e53d3d264 docs: remove obsolete C++ ABI compatibility warnings
- Remove outdated macOS C++ compatibility warnings from README
- Simplify CI workflow by removing macOS-specific failure handling
- All tests now pass consistently on macOS after ABI fixes
2025-07-28 14:54:47 -07:00
Andy Lee
22c8f861bc Merge branch 'main' into fix-macos-abi 2025-07-28 14:52:15 -07:00
Andy Lee
a52e3c583a chore: update lock file with test dependencies 2025-07-28 14:50:21 -07:00
Andy Lee
ab339886dd fix: add --distance-metric support to DiskANN embedding server and remove obsolete macOS ABI test markers
- Add --distance-metric parameter to diskann_embedding_server.py for consistency with other backends
- Remove pytest.skip and pytest.xfail markers for macOS C++ ABI issues as they have been fixed
- Fix test assertions to handle SearchResult objects correctly
- All tests now pass on macOS with the C++ ABI compatibility fixes
2025-07-28 14:49:51 -07:00
yichuan520030910320
055c086398 add ablation of embedding model compare 2025-07-28 14:43:42 -07:00
Andy Lee
d505dcc5e3 Fix/OpenAI embeddings cosine distance (#10)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format

* feat: add OpenAI embeddings support to google_history_reader_leann.py

- Add --embedding-model and --embedding-mode arguments
- Support automatic detection of normalized embeddings
- Works correctly with cosine distance for OpenAI embeddings

* feat: add --use-existing-index option to google_history_reader_leann.py

- Allow using existing index without rebuilding
- Useful for testing pre-built indices

* fix: Improve OpenAI embeddings handling in HNSW backend
2025-07-28 14:35:49 -07:00
Andy Lee
8c988cf98b refactor: improve test structure and fix main_cli example
- Move pytest configuration from pytest.ini to pyproject.toml
- Remove unnecessary run_tests.py script (use test extras instead)
- Fix main_cli_example.py to properly use command line arguments for LLM config
- Add test_readme_examples.py to test code examples from README
- Refactor tests to use pytest fixtures and parametrization
- Update test documentation to reflect new structure
- Set proper environment variables in CI for test execution
2025-07-28 14:25:48 -07:00
Andy Lee
ac5fd844a5 fix: improve macOS C++ compatibility and add CI tests 2025-07-28 14:01:52 -07:00
Andy Lee
4b4b825fec Merge remote-tracking branch 'origin/main' into fix/openai-embeddings-cosine-distance 2025-07-28 10:17:55 -07:00
Andy Lee
34ef0db42f fix: Improve OpenAI embeddings handling in HNSW backend 2025-07-28 10:15:56 -07:00
Andy Lee
41812c7d22 feat: add --use-existing-index option to google_history_reader_leann.py
- Allow using existing index without rebuilding
- Useful for testing pre-built indices
2025-07-28 00:36:57 -07:00
Andy Lee
2047a1a128 feat: add OpenAI embeddings support to google_history_reader_leann.py
- Add --embedding-model and --embedding-mode arguments
- Support automatic detection of normalized embeddings
- Works correctly with cosine distance for OpenAI embeddings
2025-07-27 23:10:20 -07:00
Andy Lee
261006c36a docs: revert 2025-07-27 22:07:36 -07:00
GitHub Actions
b2eba23e21 chore: release v0.1.15 v0.1.15 2025-07-28 05:05:30 +00:00
yichuan520030910320
e9ee687472 nit: fix readme 2025-07-27 21:56:05 -07:00
yichuan520030910320
6f5d5e4a77 fix some readme 2025-07-27 21:50:09 -07:00
Andy Lee
5c8921673a fix: auto-detect normalized embeddings and use cosine distance (#8)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format
2025-07-27 21:19:29 -07:00
yichuan520030910320
e9d2d420bd fix some readme 2025-07-27 20:48:23 -07:00
yichuan520030910320
ebabfad066 Merge branch 'main' of https://github.com/yichuan-w/LEANN 2025-07-27 20:44:36 -07:00
yichuan520030910320
e6f612b5e8 fix install and readme 2025-07-27 20:44:28 -07:00
Andy Lee
51c41acd82 docs: add comprehensive CONTRIBUTING.md guide with pre-commit setup 2025-07-27 20:40:42 -07:00
Andy Lee
402e8f97ad style: format 2025-07-27 20:25:40 -07:00
Andy Lee
9a5c197acd fix: auto-detect normalized embeddings and use cosine distance
- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.
2025-07-27 20:21:05 -07:00
yichuan520030910320
455f93fb7c fix emaple and add pypi example 2025-07-27 18:20:13 -07:00
yichuan520030910320
48207c3b69 add pypi example 2025-07-27 17:08:49 -07:00
yichuan520030910320
4de1caa40f fix redame install method 2025-07-27 17:00:28 -07:00
yichuan520030910320
60eaa8165c fix precommit and fix redame install method 2025-07-27 16:36:30 -07:00
yichuan520030910320
c1a5d0c624 fix readme 2025-07-27 02:24:28 -07:00
yichuan520030910320
af1790395a fix ruff errors and formatting 2025-07-27 02:22:54 -07:00
yichuan520030910320
383c6d8d7e add clear instructions 2025-07-27 02:19:27 -07:00
yichuan520030910320
bc0d839693 Merge branch 'main' of https://github.com/yichuan-w/LEANN 2025-07-27 02:07:41 -07:00
yichuan520030910320
8596562de5 add pip install option to README 2025-07-27 02:06:40 -07:00
GitHub Actions
5d09586853 chore: release v0.1.14 v0.1.14 2025-07-27 08:50:56 +00:00
Andy Lee
a7cba078dd chore: consolidate essential fixes and add pre-commit hooks
- Add pre-commit configuration with ruff and black
- Fix lint CI job to use uv tool install instead of sync
- Add essential LlamaIndex dependencies to leann-core

Co-Authored-By: Yichuan Wang <73766326+yichuan-w@users.noreply.github.com>
2025-07-27 01:24:24 -07:00
Andy Lee
b3e9ee96fa fix: resolve all ruff linting errors and add lint CI check
- Fix ambiguous fullwidth characters (commas, parentheses) in strings and comments
- Replace Chinese comments with English equivalents
- Fix unused imports with proper noqa annotations for intentional imports
- Fix bare except clauses with specific exception types
- Fix redefined variables and undefined names
- Add ruff noqa annotations for generated protobuf files
- Add lint and format check to GitHub Actions CI pipeline
2025-07-26 22:38:13 -07:00
yichuan520030910320
8537a6b17e default args change 2025-07-26 21:51:14 -07:00
yichuan520030910320
7c8d7dc5c2 tones down 2025-07-26 21:47:55 -07:00
yichuan520030910320
8e23d663e6 Merge branch 'main' of https://github.com/yichuan-w/LEANN 2025-07-26 21:46:02 -07:00
yichuan520030910320
8a3994bf80 update colab now it works perfect 2025-07-26 21:45:56 -07:00
GitHub Actions
8375f601ba chore: release v0.1.13 v0.1.13 2025-07-27 01:08:17 +00:00
yichuan520030910320
c87c0fe662 update colab install & fix colab path 2025-07-26 18:07:31 -07:00
yichuan520030910320
73927b68ef Merge branch 'main' of https://github.com/yichuan-w/LEANN 2025-07-26 17:09:55 -07:00
yichuan520030910320
cc1a62e5aa update pytoml version again 2025-07-26 17:09:45 -07:00
GitHub Actions
802020cb41 chore: release v0.1.12 v0.1.12 2025-07-26 23:35:28 +00:00