Commit Graph

88 Commits

Author SHA1 Message Date
Andy Lee
b5c80edb03 Merge branch 'main' into fix-mac-intel-build 2025-08-11 01:54:29 -07:00
Andy Lee
3a1cb49e20 fix: complete Python 3.9 type annotation fixes in backend packages
Fix remaining Python 3.9 incompatible type annotations in backend packages
that were causing test failures. The union operator (|) syntax for type hints
was introduced in Python 3.10 and causes "TypeError: unsupported operand
type(s) for |" errors in Python 3.9.

Changes in leann-backend-diskann:
- Convert zmq_port: int | None to Optional[int] in diskann_backend.py
- Convert passages_file: str | None to Optional[str] in diskann_embedding_server.py
- Add Optional imports to both files

Changes in leann-backend-hnsw:
- Convert zmq_port: int | None to Optional[int] in hnsw_backend.py
- Add Optional import

This resolves the final test failures related to type annotation syntax and
ensures full Python 3.9 compatibility across all packages.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 23:58:31 -07:00
GitHub Actions
239e35e2e6 chore: release v0.2.7 2025-08-11 03:11:46 +00:00
Andy Lee
2be39db799 fix: update Faiss submodule with override keyword fix 2025-08-10 02:01:49 -07:00
Andy Lee
756864d058 fix: update Faiss submodule with override keyword fix
- Add missing override keyword to IDSelectorModulo::is_member function
- Fixes C++ compilation warning that was treated as error due to -Werror flag
- Resolves "warning: 'is_member' overrides a member function but is not marked 'override'"
- Improves code conformance to modern C++ best practices

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 01:59:10 -07:00
GitHub Actions
5e97916608 chore: release v0.2.6 2025-08-10 03:39:45 +00:00
Andy Lee
1dfc2f3737 fix: configure CMake paths in pyproject.toml for proper Homebrew detection
- Add CMAKE_PREFIX_PATH and OpenMP_ROOT environment variable mapping in both backends
- Remove CMAKE_ARGS from GitHub Actions workflow (cleaner separation)
- Ensure scikit-build-core correctly uses environment variables for CMake configuration
- This should fix the hardcoded /opt/homebrew paths on Intel Macs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 18:14:48 -07:00
Andy Lee
c1832765cd Merge branch 'main' into fix-mac-intel-build 2025-08-09 17:35:10 -07:00
Andy Lee
4a5db385f0 fix: clean build system and Python 3.9 compatibility
Build system improvements:
- Simplify macOS environment detection using brew --prefix
- Remove complex hardcoded paths and CMAKE_ARGS
- Let CMake automatically find Homebrew packages via CMAKE_PREFIX_PATH
- Clean separation between Intel (/usr/local) and Apple Silicon (/opt/homebrew)

Python 3.9 compatibility:
- Set ruff target-version to py39 to match project requirements
- Replace str | None with Union[str, None] in type annotations
- Add Union imports where needed
- Fix core interface, CLI, chat, and embedding server files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 17:27:00 -07:00
Andy Lee
e16c369bfb fix: auto-detect Homebrew path for Intel vs Apple Silicon Macs
This fixes the hardcoded /opt/homebrew path which only works on Apple
Silicon Macs. Intel Macs use /usr/local as the Homebrew prefix.
2025-08-09 23:16:41 +00:00
Andy Lee
3ff5aac8e0 Add Ollama embedding support to enable local embedding models (#22)
* feat: Add Ollama embedding support for local embedding models

* docs: Add clear documentation for Ollama embedding usage

* feat: Enhance Ollama embedding with better error handling and concurrent processing

- Add intelligent model validation and suggestions (inspired by OllamaChat)
- Implement concurrent processing for better performance
- Add retry mechanism with timeout handling
- Provide user-friendly error messages with emojis
- Auto-detect and recommend embedding models
- Add text truncation for long texts
- Improve progress bar display logic

* docs: don't mention it in README
2025-08-08 18:44:07 -07:00
GitHub Actions
b6ab6f1993 chore: release v0.2.5 2025-08-08 22:32:27 +00:00
GitHub Actions
387ae21eba chore: release v0.2.4 2025-08-08 07:14:51 +00:00
GitHub Actions
075d4bd167 chore: release v0.2.2 2025-08-08 01:58:40 +00:00
GitHub Actions
4271ff9d84 chore: release v0.2.1 2025-08-05 05:50:56 +00:00
GitHub Actions
8bee1d4100 chore: release v0.2.0 2025-08-04 21:34:31 +00:00
GitHub Actions
08eac5c821 chore: release v0.1.16 2025-07-29 00:15:18 +00:00
Andy Lee
4671ed9b36 Fix macos ABI by using system default clang (#11)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format

* feat: add OpenAI embeddings support to google_history_reader_leann.py

- Add --embedding-model and --embedding-mode arguments
- Support automatic detection of normalized embeddings
- Works correctly with cosine distance for OpenAI embeddings

* feat: add --use-existing-index option to google_history_reader_leann.py

- Allow using existing index without rebuilding
- Useful for testing pre-built indices

* fix: Improve OpenAI embeddings handling in HNSW backend

* fix: improve macOS C++ compatibility and add CI tests

* refactor: improve test structure and fix main_cli example

- Move pytest configuration from pytest.ini to pyproject.toml
- Remove unnecessary run_tests.py script (use test extras instead)
- Fix main_cli_example.py to properly use command line arguments for LLM config
- Add test_readme_examples.py to test code examples from README
- Refactor tests to use pytest fixtures and parametrization
- Update test documentation to reflect new structure
- Set proper environment variables in CI for test execution

* fix: add --distance-metric support to DiskANN embedding server and remove obsolete macOS ABI test markers

- Add --distance-metric parameter to diskann_embedding_server.py for consistency with other backends
- Remove pytest.skip and pytest.xfail markers for macOS C++ ABI issues as they have been fixed
- Fix test assertions to handle SearchResult objects correctly
- All tests now pass on macOS with the C++ ABI compatibility fixes

* chore: update lock file with test dependencies

* docs: remove obsolete C++ ABI compatibility warnings

- Remove outdated macOS C++ compatibility warnings from README
- Simplify CI workflow by removing macOS-specific failure handling
- All tests now pass consistently on macOS after ABI fixes

* fix: update macOS deployment target for DiskANN to 13.3

- DiskANN uses sgesdd_ LAPACK function which is only available on macOS 13.3+
- Update MACOSX_DEPLOYMENT_TARGET from 11.0 to 13.3 for DiskANN builds
- This fixes the compilation error on GitHub Actions macOS runners

* fix: align Python version requirements to 3.9

- Update root project to support Python 3.9, matching subpackages
- Restore macOS Python 3.9 support in CI
- This fixes the CI failure for Python 3.9 environments

* fix: handle MPS memory issues in CI tests

- Use smaller MiniLM-L6-v2 model (384 dimensions) for README tests in CI
- Skip other memory-intensive tests in CI environment
- Add minimal CI tests that don't require model loading
- Set CI environment variable and disable MPS fallback
- Ensure README examples always run correctly in CI

* fix: remove Python 3.10+ dependencies for compatibility

- Comment out llama-index-readers-docling and llama-index-node-parser-docling
- These packages require Python >= 3.10 and were causing CI failures on Python 3.9
- Regenerate uv.lock file to resolve dependency conflicts

* fix: use virtual environment in CI instead of system packages

- uv-managed Python environments don't allow --system installs
- Create and activate virtual environment before installing packages
- Update all CI steps to use the virtual environment

* add some env in ci

* fix: use --find-links to install platform-specific wheels

- Let uv automatically select the correct wheel for the current platform
- Fixes error when trying to install macOS wheels on Linux
- Simplifies the installation logic

* fix: disable OpenMP parallelism in CI to avoid libomp crashes

- Set OMP_NUM_THREADS=1 to avoid OpenMP thread synchronization issues
- Set MKL_NUM_THREADS=1 for single-threaded MKL operations
- This prevents segfaults in LayerNorm on macOS CI runners
- Addresses the libomp compatibility issues with PyTorch on Apple Silicon

* skip several macos test because strange issue on ci

---------

Co-authored-by: yichuan520030910320 <yichuan_wang@berkeley.edu>
2025-07-28 17:14:42 -07:00
Andy Lee
d505dcc5e3 Fix/OpenAI embeddings cosine distance (#10)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format

* feat: add OpenAI embeddings support to google_history_reader_leann.py

- Add --embedding-model and --embedding-mode arguments
- Support automatic detection of normalized embeddings
- Works correctly with cosine distance for OpenAI embeddings

* feat: add --use-existing-index option to google_history_reader_leann.py

- Allow using existing index without rebuilding
- Useful for testing pre-built indices

* fix: Improve OpenAI embeddings handling in HNSW backend
2025-07-28 14:35:49 -07:00
GitHub Actions
b2eba23e21 chore: release v0.1.15 2025-07-28 05:05:30 +00:00
Andy Lee
5c8921673a fix: auto-detect normalized embeddings and use cosine distance (#8)
* fix: auto-detect normalized embeddings and use cosine distance

- Add automatic detection for normalized embedding models (OpenAI, Voyage AI, Cohere)
- Automatically set distance_metric='cosine' for normalized embeddings
- Add warnings when using non-optimal distance metrics
- Implement manual L2 normalization in HNSW backend (custom Faiss build lacks normalize_L2)
- Fix DiskANN zmq_port compatibility with lazy loading strategy
- Add documentation for normalized embeddings feature

This fixes the low accuracy issue when using OpenAI text-embedding-3-small model with default MIPS metric.

* style: format
2025-07-27 21:19:29 -07:00
yichuan520030910320
af1790395a fix ruff errors and formatting 2025-07-27 02:22:54 -07:00
GitHub Actions
5d09586853 chore: release v0.1.14 2025-07-27 08:50:56 +00:00
Andy Lee
b3e9ee96fa fix: resolve all ruff linting errors and add lint CI check
- Fix ambiguous fullwidth characters (commas, parentheses) in strings and comments
- Replace Chinese comments with English equivalents
- Fix unused imports with proper noqa annotations for intentional imports
- Fix bare except clauses with specific exception types
- Fix redefined variables and undefined names
- Add ruff noqa annotations for generated protobuf files
- Add lint and format check to GitHub Actions CI pipeline
2025-07-26 22:38:13 -07:00
GitHub Actions
8375f601ba chore: release v0.1.13 2025-07-27 01:08:17 +00:00
GitHub Actions
802020cb41 chore: release v0.1.12 2025-07-26 23:35:28 +00:00
GitHub Actions
cf2ef48967 chore: release v0.1.11 2025-07-26 00:12:37 +00:00
yichuan520030910320
0692bbf7a2 change workflow 2025-07-25 17:11:56 -07:00
GitHub Actions
52584a171f chore: release v0.1.10 2025-07-25 23:12:16 +00:00
Andy Lee
2baaa4549b fix: handle relative paths in HNSW embedding server metadata
- Convert relative paths to absolute paths based on metadata file location
- Fixes FileNotFoundError when starting embedding server
- Resolves issue with passages file not found in different working directories
2025-07-25 16:09:53 -07:00
GitHub Actions
75ddcd6158 chore: release v0.1.9 2025-07-25 20:04:42 +00:00
yichuan520030910320
cd8b970eff Merge branch 'main' of https://github.com/yichuan-w/LEANN 2025-07-25 01:45:57 -07:00
yichuan520030910320
52153bbb69 update faiss compare 2025-07-25 01:45:50 -07:00
GitHub Actions
e1ae087207 chore: release v0.1.8 2025-07-25 08:24:40 +00:00
GitHub Actions
e64b599276 chore: release v0.1.7 2025-07-25 04:47:57 +00:00
GitHub Actions
166986d5e6 chore: release v0.1.6 2025-07-25 04:30:07 +00:00
GitHub Actions
ed27a127d5 chore: release v0.1.5 2025-07-25 04:00:54 +00:00
GitHub Actions
9000a7083d chore: release v0.1.4 2025-07-25 02:23:36 +00:00
GitHub Actions
20f2aece08 chore: release v0.1.3 2025-07-25 02:05:11 +00:00
GitHub Actions
cea1f6f87c chore: release v0.1.2 2025-07-25 01:53:29 +00:00
Andy Lee
133e715832 fix: resolve CI issues and consolidate workflows
- Fix version dependencies: update backend packages to depend on leann-core==0.1.1
- Remove duplicate ci.yml workflow (keeping build-and-publish.yml as main CI)
- Update release-manual.yml to reference correct CI workflow name

This fixes the dependency resolution error and eliminates duplicate builds.
2025-07-24 17:20:58 -07:00
GitHub Actions
faf5ae3533 chore: release v0.1.1 2025-07-24 23:36:23 +00:00
Andy Lee
7add391b2c chore: build and package 2025-07-24 00:47:46 -07:00
Andy Lee
43155d2811 fix: supress resources leak logs 2025-07-22 19:53:45 -07:00
Andy Lee
8513471573 feat: make diskann runnable 2025-07-22 14:26:03 -07:00
Andy Lee
c2f35c8e73 fix: logs 2025-07-21 23:02:13 -07:00
Andy Lee
573313f0b6 refactor: logs 2025-07-21 22:45:24 -07:00
yichuan520030910320
587ce65cf6 Merge branch 'main' of https://github.com/yichuan-w/LEANN 2025-07-21 21:54:27 -07:00
yichuan520030910320
ccf6c8bfd7 fix flush print 2025-07-21 21:54:20 -07:00
Andy Lee
c112956d2d fix: mlx 2025-07-21 21:29:15 -07:00