Compare commits

...

62 Commits

Author SHA1 Message Date
Andy Lee
79ec7d1aee fix: properly handle Python 3.13 support with PyTorch compatibility
- Support Python 3.13 on most platforms (Ubuntu, ARM64 Mac)
- Exclude Intel Mac + Python 3.13 combination due to PyTorch wheel availability
- PyTorch <2.5 supports Intel Mac but not Python 3.13
- PyTorch 2.5+ supports Python 3.13 but not Intel Mac x86_64
- Document limitation in CI configuration comments
- Update README badges with detailed Python version support and CI status

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 16:38:24 -07:00
Andy Lee
288d3c4e75 chore: cleanup unused files and fix GitHub Actions warnings
- Remove unused packages/leann-backend-diskann/CMakeLists.txt
  (DiskANN uses cmake.source-dir=third_party/DiskANN instead)
- Replace macos-latest with macos-14 to avoid migration warnings
  (macos-latest will migrate to macOS 15 on August 4, 2025)
- Keep packages/leann-backend-hnsw/CMakeLists.txt (needed for Faiss config)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 16:16:26 -07:00
Andy Lee
7e554b2ba2 fix: restrict MLX dependencies to ARM64 Macs in workspace pyproject.toml
- Root pyproject.toml also had MLX dependencies without platform_machine restriction
- This caused test dependency installation to fail on Intel Macs
- Now consistent with packages/leann-core/pyproject.toml platform restrictions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 15:39:28 -07:00
Andy Lee
afd48d5901 fix: use --find-links during package installation to avoid PyPI MLX conflicts
- Backend wheels contain Requires-Dist: leann-core==0.2.7
- Without --find-links, uv resolves this from PyPI which has MLX for all Darwin
- With --find-links, uv uses local leann-core with proper platform restrictions
- Root cause: dependency resolution happens at install time, not just build time
- Local test confirms this fixes Intel Mac MLX dependency issues

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 15:24:52 -07:00
Andy Lee
36083dbf0f fix: revert all packages to consistent version 0.2.7
- This PR should not bump versions, only fix Intel Mac build
- Version bumps should be done in release_manual workflow
- All packages now use 0.2.7 consistently for --find-links to work

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 14:56:30 -07:00
Andy Lee
f819dacbb4 Merge commit '6762e5b' into fix-mac-intel-build 2025-08-11 14:39:18 -07:00
Andy Lee
6762e5b2c4 fix: correct version consistency for --find-links to work properly
- All packages now use version 0.2.7 consistently
- Backend packages can find exact leann-core==0.2.7 from local build
- This ensures --find-links works during CI builds instead of falling back to PyPI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 14:38:58 -07:00
Andy Lee
0797008a3f fix: use absolute path for find-links and upgrade backend version
- Use GITHUB_WORKSPACE for absolute path to ensure find-links works
- Upgrade leann-backend-hnsw to 0.2.8 to match leann-core version
2025-08-11 14:24:55 -07:00
Andy Lee
b835eb821e fix: use absolute path for find-links and upgrade backend version
- Use GITHUB_WORKSPACE for absolute path to ensure find-links works
- Upgrade leann-backend-hnsw to 0.2.8 to match leann-core version
2025-08-11 14:24:12 -07:00
Andy Lee
a0c790f285 fix: use local leann-core when building backend packages
Add --find-links to backend builds to ensure they use the locally built
leann-core with fixed MLX dependencies instead of downloading from PyPI.

Also bump leann-core version to 0.2.8 to ensure clean dependency resolution.
2025-08-11 13:38:03 -07:00
Andy Lee
b7516608ab fix: install backend wheels before meta packages
Install backend wheels first to ensure they're available when core/meta
packages are installed, preventing uv from trying to resolve backend
dependencies from PyPI.
2025-08-11 10:03:27 -07:00
Andy Lee
1bd1238db6 cleanup: simplify CI configuration
- Remove debug step with non-existent 'uv pip debug' command
- Simplify wheel installation logic - let uv handle compatibility
- Use -e .[test] instead of manually listing all test dependencies
2025-08-11 01:54:54 -07:00
Andy Lee
b5c80edb03 Merge branch 'main' into fix-mac-intel-build 2025-08-11 01:54:29 -07:00
Andy Lee
430969565e fix: restrict MLX dependencies to Apple Silicon Macs only
MLX framework only supports Apple Silicon (ARM64) Macs, not Intel x86_64.
Add platform_machine == 'arm64' condition to prevent installation failures
on Intel Macs (macos-13).
2025-08-11 01:53:04 -07:00
Andy Lee
578a89d180 fix: build and install leann meta package on all platforms
The leann meta package is pure Python and platform-independent, so there's
no reason to restrict it to Ubuntu only. This ensures all platforms use
consistent local builds instead of falling back to PyPI versions.
2025-08-11 01:40:55 -07:00
Andy Lee
068fb38bae fix: ensure leann-core package is built on all platforms, not just Ubuntu
This fixes the issue where CI was installing leann-core from PyPI instead of
using locally built package with Python 3.9 compatibility fixes.
2025-08-11 01:28:12 -07:00
Andy Lee
6aa1a97a07 fix: remove Python 3.10+ zip strict parameter for Python 3.9 compatibility
Remove the strict=False parameter from zip() call in api.py as it was
introduced in Python 3.10 and causes "TypeError: zip() takes no keyword
arguments" in Python 3.9.

The strict parameter controls whether zip() raises an exception when the
iterables have different lengths. Since we're not relying on this behavior
and the code works correctly without it, removing it maintains the same
functionality while ensuring Python 3.9 compatibility.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 00:47:43 -07:00
Andy Lee
3a1cb49e20 fix: complete Python 3.9 type annotation fixes in backend packages
Fix remaining Python 3.9 incompatible type annotations in backend packages
that were causing test failures. The union operator (|) syntax for type hints
was introduced in Python 3.10 and causes "TypeError: unsupported operand
type(s) for |" errors in Python 3.9.

Changes in leann-backend-diskann:
- Convert zmq_port: int | None to Optional[int] in diskann_backend.py
- Convert passages_file: str | None to Optional[str] in diskann_embedding_server.py
- Add Optional imports to both files

Changes in leann-backend-hnsw:
- Convert zmq_port: int | None to Optional[int] in hnsw_backend.py
- Add Optional import

This resolves the final test failures related to type annotation syntax and
ensures full Python 3.9 compatibility across all packages.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 23:58:31 -07:00
GitHub Actions
239e35e2e6 chore: release v0.2.7 2025-08-11 03:11:46 +00:00
Andy Lee
2fac0c6fbf fix: improve gitignore and Jupyter notebook support (#28)
- Add nbconvert dependency for .ipynb file support
- Replace manual gitignore parsing with gitignore-parser library
- Proper recursive .gitignore handling (all subdirectories)
- Fix compliance with Git gitignore behavior
- Simplify code and improve reliability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-10 20:02:46 -07:00
Andy Lee
fe9381fc8b fix: complete Python 3.9 type annotation compatibility fixes
Fix remaining Python 3.9 incompatible type annotations throughout the
leann-core package that were causing test failures in CI. The union operator
(|) syntax for type hints was introduced in Python 3.10 and causes
"TypeError: unsupported operand type(s) for |" errors in Python 3.9.

Changes:
- Convert dict[str, Any] | None to Optional[dict[str, Any]]
- Convert int | None to Optional[int]
- Convert subprocess.Popen | None to Optional[subprocess.Popen]
- Convert LeannBackendFactoryInterface | None to Optional[LeannBackendFactoryInterface]
- Add missing Optional imports to all affected files

This resolves all test failures related to type annotation syntax and ensures
compatibility with Python 3.9 as specified in pyproject.toml.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 18:48:10 -07:00
Andy Lee
037aad0870 fix: ensure virtual environment uses correct Python version in CI
Fix issue where uv venv was creating virtual environments with a different
Python version than specified in the matrix, causing wheel compatibility
errors. The problem occurred when the system had multiple Python versions
and uv venv defaulted to a different version than intended.

Changes:
- Add --python ${{ matrix.python }} flag to uv venv command
- Ensures virtual environment matches the matrix-specified Python version
- Fixes "The wheel is compatible with CPython 3.X but you're using CPython 3.Y" errors

This ensures wheel installation selects and installs the correctly built
wheels that match the runtime Python version.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 17:47:44 -07:00
Andy Lee
ded0decd17 fix: resolve wheel installation conflicts in CI matrix builds
Fix issue where multiple Python versions' wheels in the same dist directory
caused installation conflicts during CI testing. The problem occurred when
matrix builds for different Python versions accumulated wheels in shared
directories, and uv pip install would find incompatible wheels.

Changes:
- Add Python version detection using matrix.python variable
- Convert Python version to wheel tag format (e.g., 3.11 -> cp311)
- Use find with version-specific pattern matching to select correct wheels
- Add explicit error handling if no matching wheel is found

This ensures each CI job installs only wheels compatible with its specific
Python version, preventing "A path dependency is incompatible with the
current platform" errors.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 17:31:46 -07:00
Andy Lee
5094e6800a fix: use correct Python version for wheel builds
- Replace --python python with --python ${{ matrix.python }}
- This ensures wheels are built for the correct Python version in each matrix job
- Fixes Python version mismatch where cp39 wheels were used in cp311 environments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 17:20:24 -07:00
Andy Lee
f08132c525 debug: simplify wheel compatibility checking
- Fix YAML syntax error in debug step
- Use simpler approach to show platform tags and wheel names
- This will help identify platform tag compatibility issues

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 17:07:18 -07:00
Andy Lee
6ae9e0f4f9 fix: update DiskANN submodule with additional type cast fix
- Add missing type cast in DistanceFastL2::norm function SSE2 version
- Fixes const float* = const signed char* compilation error
- Ensures consistent type casting across all SIMD code paths
- Resolves template instantiation error for DistanceFastL2<int8_t>

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 15:09:01 -07:00
Andy Lee
2be39db799 fix: update Faiss submodule with override keyword fix 2025-08-10 02:01:49 -07:00
Andy Lee
756864d058 fix: update Faiss submodule with override keyword fix
- Add missing override keyword to IDSelectorModulo::is_member function
- Fixes C++ compilation warning that was treated as error due to -Werror flag
- Resolves "warning: 'is_member' overrides a member function but is not marked 'override'"
- Improves code conformance to modern C++ best practices

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 01:59:10 -07:00
Andy Lee
4fc8943ca7 fix: update DiskANN submodule with type cast fix for signed char templates
- Add missing type casts (float*)a and (float*)b in SSE2 version
- This matches the existing type casts in the AVX version
- Fixes compilation error when instantiating DistanceInnerProduct<int8_t>
- Resolves "cannot initialize const float* with const signed char*" error

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 01:39:16 -07:00
Andy Lee
1bc4bf06f0 fix: update DiskANN submodule with SIMD function name corrections
- Fix _mm128_loadu_ps to _mm_loadu_ps (and similar functions)
- This is a known issue in upstream DiskANN code where incorrect function names were used
- Resolves compilation errors on macOS Intel builds

References: Known DiskANN issue with SIMD intrinsics naming

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 01:27:04 -07:00
yichuan520030910320
9801aa581b [Readme]update embedding model config according to reddit feedback 2025-08-09 21:33:33 -07:00
GitHub Actions
5e97916608 chore: release v0.2.6 2025-08-10 03:39:45 +00:00
Andy Lee
8b9c2be8c9 Feat/claude code refine (#24)
* feat: Add Ollama embedding support for local embedding models

* docs: Add clear documentation for Ollama embedding usage

* fix: remove leann_ask

* docs: remove ollama embedding extra instructions

* simplify MCP interface for Claude Code

- Remove unnecessary search parameters: search_mode, recompute_embeddings, file_types, min_score
- Remove leann_clear tool (not needed for Claude Code workflow)
- Streamline search to only use: query, index_name, top_k, complexity
- Keep core tools: leann_index, leann_search, leann_status, leann_list

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* remove leann_index from MCP interface

Users should use CLI command 'leann build' to create indexes first.
MCP now only provides search functionality:
- leann_search: search existing indexes
- leann_status: check index health
- leann_list: list available indexes

This separates index creation (CLI) from search (Claude Code).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* improve CLI with auto project name and .gitignore support

- Make index_name optional, auto-use current directory name
- Read .gitignore patterns and respect them during indexing
- Add _read_gitignore_patterns() to parse .gitignore files
- Add _should_exclude_file() for pattern matching
- Apply exclusion patterns to both PDF and general file processing
- Show helpful messages about gitignore usage

Now users can simply run: leann build
And it will use project name + respect .gitignore patterns.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-09 20:37:17 -07:00
Andy Lee
8d1e04d7a1 fix: update DiskANN submodule with macOS Intel/Apple Silicon compatibility fixes
- Auto-detect Homebrew libomp path using OpenMP_ROOT environment variable
- Exclude mkl_set_num_threads on macOS (uses Accelerate framework instead of MKL)
- Fixes compilation on Intel Macs by using correct /usr/local paths

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 19:27:02 -07:00
Andy Lee
f009d2add3 fix: remove hardcoded /opt/homebrew paths from DiskANN CMake
- Auto-detect Homebrew libomp path using OpenMP_ROOT environment variable
- Fallback to CMAKE_PREFIX_PATH/opt/libomp if OpenMP_ROOT not set
- Final fallback to brew --prefix libomp for auto-detection
- Maintains backwards compatibility with old hardcoded path
- Fixes Intel Mac builds that were failing due to hardcoded Apple Silicon paths

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 18:30:42 -07:00
Andy Lee
1dfc2f3737 fix: configure CMake paths in pyproject.toml for proper Homebrew detection
- Add CMAKE_PREFIX_PATH and OpenMP_ROOT environment variable mapping in both backends
- Remove CMAKE_ARGS from GitHub Actions workflow (cleaner separation)
- Ensure scikit-build-core correctly uses environment variables for CMake configuration
- This should fix the hardcoded /opt/homebrew paths on Intel Macs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 18:14:48 -07:00
Andy Lee
0543d61572 fix: ensure CMAKE_PREFIX_PATH is passed to backend builds
- Add CMAKE_ARGS with CMAKE_PREFIX_PATH and OpenMP_ROOT for both HNSW and DiskANN backends
- This ensures CMake can find Homebrew packages on both Intel (/usr/local) and Apple Silicon (/opt/homebrew)
- Fixes the issue where CMake was still looking for hardcoded paths instead of using detected ones

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 17:58:49 -07:00
Andy Lee
abcc1fed31 fix: type 2025-08-09 17:40:13 -07:00
Andy Lee
c1832765cd Merge branch 'main' into fix-mac-intel-build 2025-08-09 17:35:10 -07:00
Andy Lee
4a5db385f0 fix: clean build system and Python 3.9 compatibility
Build system improvements:
- Simplify macOS environment detection using brew --prefix
- Remove complex hardcoded paths and CMAKE_ARGS
- Let CMake automatically find Homebrew packages via CMAKE_PREFIX_PATH
- Clean separation between Intel (/usr/local) and Apple Silicon (/opt/homebrew)

Python 3.9 compatibility:
- Set ruff target-version to py39 to match project requirements
- Replace str | None with Union[str, None] in type annotations
- Add Union imports where needed
- Fix core interface, CLI, chat, and embedding server files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 17:27:00 -07:00
Andy Lee
5f5b97fb54 fix: add abseil include path to CPPFLAGS for both Intel and Apple Silicon
- Add -I/opt/homebrew/opt/abseil/include to CPPFLAGS for Apple Silicon
- Add -I/usr/local/opt/abseil/include to CPPFLAGS for Intel
- Fixes 'absl/log/absl_log.h' file not found by ensuring abseil headers are in compiler include path

Root cause: CMAKE_PREFIX_PATH alone wasn't sufficient - compiler needs explicit -I flags

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 17:16:40 -07:00
Andy Lee
754c9aaedd fix: add abseil library path for protobuf compilation on macOS
- Include abseil in CMAKE_PREFIX_PATH for both Intel and Apple Silicon Macs
- Add explicit absl_DIR CMake variable to help find abseil for protobuf
- Fixes 'absl/log/absl_log.h' file not found error during compilation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 16:55:47 -07:00
Andy Lee
1b01725dd1 fix: improve macOS build reliability with proper OpenMP path detection
- Add proper CMAKE_PREFIX_PATH and OpenMP_ROOT detection for both Intel and Apple Silicon Macs
- Set LDFLAGS and CPPFLAGS for all Homebrew packages to ensure CMake can find them
- Apply CMAKE_ARGS to both HNSW and DiskANN backends for consistent builds
- Fix hardcoded paths that caused build failures on Intel Macs (macos-13)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 16:48:26 -07:00
Andy Lee
a620c2077a fix: auto-detect Homebrew paths for both DiskANN and HNSW backends
- Fix DiskANN CMakeLists.txt path reference
- Add macOS environment variable detection for OpenMP_ROOT
- Support both Intel (/usr/local) and Apple Silicon (/opt/homebrew) paths
2025-08-09 23:25:00 +00:00
Andy Lee
e16c369bfb fix: auto-detect Homebrew path for Intel vs Apple Silicon Macs
This fixes the hardcoded /opt/homebrew path which only works on Apple
Silicon Macs. Intel Macs use /usr/local as the Homebrew prefix.
2025-08-09 23:16:41 +00:00
Andy Lee
368c587c4f ci: add Mac Intel (x86_64) build support 2025-08-09 22:37:50 +00:00
Andy Lee
3ff5aac8e0 Add Ollama embedding support to enable local embedding models (#22)
* feat: Add Ollama embedding support for local embedding models

* docs: Add clear documentation for Ollama embedding usage

* feat: Enhance Ollama embedding with better error handling and concurrent processing

- Add intelligent model validation and suggestions (inspired by OllamaChat)
- Implement concurrent processing for better performance
- Add retry mechanism with timeout handling
- Provide user-friendly error messages with emojis
- Auto-detect and recommend embedding models
- Add text truncation for long texts
- Improve progress bar display logic

* docs: don't mention it in README
2025-08-08 18:44:07 -07:00
yichuan520030910320
67fef60466 [Readme]More about claude code 2025-08-08 16:05:35 -07:00
GitHub Actions
b6ab6f1993 chore: release v0.2.5 2025-08-08 22:32:27 +00:00
joshuashaffer
9f2e82a838 Propagate hosts argument for ollama through chat.py (#21)
* Propigate hosts argument for ollama through chat.py

* Apply suggestions from code review

Good AI slop suggestions.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-08 15:31:15 -07:00
yichuan520030910320
0b2b799d5a [README]fix instructions in cli 2025-08-08 01:04:13 -07:00
yichuan520030910320
0f790fbbd9 docs: polish README and add optimized MCP integration image
- Improve grammar and sentence structure in MCP section
- Add proper markdown image formatting with relative paths
- Optimize mcp_leann.png size (1.3MB -> 224KB)
- Update data description to be more specific about Chinese content
2025-08-08 00:58:36 -07:00
GitHub Actions
387ae21eba chore: release v0.2.4 2025-08-08 07:14:51 +00:00
Andy Lee
3cc329c3e7 fix: remove hardcoded paths from MCP server and documentation 2025-08-08 00:08:56 -07:00
Andy Lee
5567302316 feat: promote Claude Code integration as primary RAG feature 2025-08-07 23:19:19 -07:00
GitHub Actions
075d4bd167 chore: release v0.2.2 2025-08-08 01:58:40 +00:00
yichuan520030910320
e4bcc76f88 fix cli & make recompute default true 2025-08-07 18:58:04 -07:00
yichuan520030910320
710e83b1fd fix cli if there is no other type of doc to make it robust 2025-08-07 18:46:05 -07:00
yichuan520030910320
c96d653072 more support for type of docs in cli 2025-08-07 18:14:03 -07:00
Andy Lee
8b22d2b5d3 Merge pull request #19 from yichuan-w/feature/claude-code-research
Feature/claude code research
2025-08-05 23:02:34 -07:00
Andy Lee
4cb544ee38 docs: Update co-contributors with GitHub usernames (#18)
* docs: Update co-contributors with GitHub usernames

* docs: Use GitHub links for co-contributors and improve order

* docs: Change to Contributors and use personal homepage

* docs: Specify core contributors and welcome new contributors
2025-08-05 17:43:59 -07:00
yichuan520030910320
f94ce63d51 add gpt oss! serve your RAG using ollama 2025-08-05 16:49:52 -07:00
30 changed files with 5189 additions and 3660 deletions

View File

@@ -54,16 +54,26 @@ jobs:
python: '3.12' python: '3.12'
- os: ubuntu-22.04 - os: ubuntu-22.04
python: '3.13' python: '3.13'
- os: macos-latest - os: macos-14
python: '3.9' python: '3.9'
- os: macos-latest - os: macos-14
python: '3.10' python: '3.10'
- os: macos-latest - os: macos-14
python: '3.11' python: '3.11'
- os: macos-latest - os: macos-14
python: '3.12' python: '3.12'
- os: macos-latest - os: macos-14
python: '3.13' python: '3.13'
- os: macos-13
python: '3.9'
- os: macos-13
python: '3.10'
- os: macos-13
python: '3.11'
- os: macos-13
python: '3.12'
# Note: macos-13 + Python 3.13 excluded due to PyTorch compatibility
# (PyTorch 2.5+ supports Python 3.13 but not Intel Mac x86_64)
runs-on: ${{ matrix.os }} runs-on: ${{ matrix.os }}
steps: steps:
@@ -109,48 +119,59 @@ jobs:
uv pip install --system delocate uv pip install --system delocate
fi fi
- name: Set macOS environment variables
if: runner.os == 'macOS'
run: |
# Use brew --prefix to automatically detect Homebrew installation path
HOMEBREW_PREFIX=$(brew --prefix)
echo "HOMEBREW_PREFIX=${HOMEBREW_PREFIX}" >> $GITHUB_ENV
echo "OpenMP_ROOT=${HOMEBREW_PREFIX}/opt/libomp" >> $GITHUB_ENV
# Set CMAKE_PREFIX_PATH to let CMake find all packages automatically
echo "CMAKE_PREFIX_PATH=${HOMEBREW_PREFIX}" >> $GITHUB_ENV
# Set compiler flags for OpenMP (required for both backends)
echo "LDFLAGS=-L${HOMEBREW_PREFIX}/opt/libomp/lib" >> $GITHUB_ENV
echo "CPPFLAGS=-I${HOMEBREW_PREFIX}/opt/libomp/include" >> $GITHUB_ENV
- name: Build packages - name: Build packages
run: | run: |
# Build core (platform independent) # Build core (platform independent)
if [[ "${{ matrix.os }}" == ubuntu-* ]]; then cd packages/leann-core
cd packages/leann-core uv build
uv build cd ../..
cd ../..
fi
# Build HNSW backend # Build HNSW backend
cd packages/leann-backend-hnsw cd packages/leann-backend-hnsw
if [ "${{ matrix.os }}" == "macos-latest" ]; then if [[ "${{ matrix.os }}" == macos-* ]]; then
# Use system clang instead of homebrew LLVM for better compatibility # Use system clang for better compatibility
export CC=clang export CC=clang
export CXX=clang++ export CXX=clang++
export MACOSX_DEPLOYMENT_TARGET=11.0 export MACOSX_DEPLOYMENT_TARGET=11.0
uv build --wheel --python python uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
else else
uv build --wheel --python python uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
fi fi
cd ../.. cd ../..
# Build DiskANN backend # Build DiskANN backend
cd packages/leann-backend-diskann cd packages/leann-backend-diskann
if [ "${{ matrix.os }}" == "macos-latest" ]; then if [[ "${{ matrix.os }}" == macos-* ]]; then
# Use system clang instead of homebrew LLVM for better compatibility # Use system clang for better compatibility
export CC=clang export CC=clang
export CXX=clang++ export CXX=clang++
# DiskANN requires macOS 13.3+ for sgesdd_ LAPACK function # DiskANN requires macOS 13.3+ for sgesdd_ LAPACK function
export MACOSX_DEPLOYMENT_TARGET=13.3 export MACOSX_DEPLOYMENT_TARGET=13.3
uv build --wheel --python python uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
else else
uv build --wheel --python python uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
fi fi
cd ../.. cd ../..
# Build meta package (platform independent) # Build meta package (platform independent)
if [[ "${{ matrix.os }}" == ubuntu-* ]]; then cd packages/leann
cd packages/leann uv build
uv build cd ../..
cd ../..
fi
- name: Repair wheels (Linux) - name: Repair wheels (Linux)
if: runner.os == 'Linux' if: runner.os == 'Linux'
@@ -199,20 +220,18 @@ jobs:
echo "📦 Built packages:" echo "📦 Built packages:"
find packages/*/dist -name "*.whl" -o -name "*.tar.gz" | sort find packages/*/dist -name "*.whl" -o -name "*.tar.gz" | sort
- name: Install built packages for testing - name: Install built packages for testing
run: | run: |
# Create a virtual environment # Create a virtual environment with the correct Python version
uv venv uv venv --python ${{ matrix.python }}
source .venv/bin/activate || source .venv/Scripts/activate source .venv/bin/activate || source .venv/Scripts/activate
# Install the built wheels # Install packages using --find-links to prioritize local builds
# Use --find-links to let uv choose the correct wheel for the platform uv pip install --find-links packages/leann-core/dist --find-links packages/leann-backend-hnsw/dist --find-links packages/leann-backend-diskann/dist packages/leann-core/dist/*.whl || uv pip install --find-links packages/leann-core/dist packages/leann-core/dist/*.tar.gz
if [[ "${{ matrix.os }}" == ubuntu-* ]]; then uv pip install --find-links packages/leann-core/dist packages/leann-backend-hnsw/dist/*.whl
uv pip install leann-core --find-links packages/leann-core/dist uv pip install --find-links packages/leann-core/dist packages/leann-backend-diskann/dist/*.whl
uv pip install leann --find-links packages/leann/dist uv pip install packages/leann/dist/*.whl || uv pip install packages/leann/dist/*.tar.gz
fi
uv pip install leann-backend-hnsw --find-links packages/leann-backend-hnsw/dist
uv pip install leann-backend-diskann --find-links packages/leann-backend-diskann/dist
# Install test dependencies using extras # Install test dependencies using extras
uv pip install -e ".[test]" uv pip install -e ".[test]"

View File

@@ -3,9 +3,11 @@
</p> </p>
<p align="center"> <p align="center">
<img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+"> <img src="https://img.shields.io/badge/Python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue.svg" alt="Python Versions">
<img src="https://github.com/yichuan-w/LEANN/actions/workflows/build-and-publish.yml/badge.svg" alt="CI Status">
<img src="https://img.shields.io/badge/Platform-Ubuntu%20%7C%20macOS%20(ARM64%2FIntel)-lightgrey" alt="Platform">
<img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License"> <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License">
<img src="https://img.shields.io/badge/Platform-Linux%20%7C%20macOS-lightgrey" alt="Platform"> <img src="https://img.shields.io/badge/MCP-Native%20Integration-blue" alt="MCP Integration">
</p> </p>
<h2 align="center" tabindex="-1" class="heading-element" dir="auto"> <h2 align="center" tabindex="-1" class="heading-element" dir="auto">
@@ -16,7 +18,10 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276) LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy. **Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
\* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow. 🔥 Check out [the easy setup →](packages/leann-mcp/README.md)
@@ -26,7 +31,7 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg
<img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%"> <img src="assets/effects.png" alt="LEANN vs Traditional Vector DB Storage Comparison" width="70%">
</p> </p>
> **The numbers speak for themselves:** Index 60 million Wikipedia chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison) > **The numbers speak for themselves:** Index 60 million text chunks in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks for different applications below ↓](#storage-comparison)
🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service". 🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service".
@@ -166,7 +171,7 @@ ollama pull llama3.2:1b
</details> </details>
### Flexible Configuration ### Flexible Configuration
LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs. LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.
@@ -185,12 +190,13 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
--force-rebuild # Force rebuild index even if it exists --force-rebuild # Force rebuild index even if it exists
# Embedding Parameters # Embedding Parameters
--embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small or mlx-community/multilingual-e5-base-mlx --embedding-model MODEL # e.g., facebook/contriever, text-embedding-3-small, nomic-embed-text, mlx-community/Qwen3-Embedding-0.6B-8bit or nomic-embed-text
--embedding-mode MODE # sentence-transformers, openai, or mlx --embedding-mode MODE # sentence-transformers, openai, mlx, or ollama
# LLM Parameters (Text generation models) # LLM Parameters (Text generation models)
--llm TYPE # LLM backend: openai, ollama, or hf (default: openai) --llm TYPE # LLM backend: openai, ollama, or hf (default: openai)
--llm-model MODEL # Model name (default: gpt-4o) e.g., gpt-4o-mini, llama3.2:1b, Qwen/Qwen2.5-1.5B-Instruct --llm-model MODEL # Model name (default: gpt-4o) e.g., gpt-4o-mini, llama3.2:1b, Qwen/Qwen2.5-1.5B-Instruct
--thinking-budget LEVEL # Thinking budget for reasoning models: low/medium/high (supported by o3, o3-mini, GPT-Oss:20b, and other reasoning models)
# Search Parameters # Search Parameters
--top-k N # Number of results to retrieve (default: 20) --top-k N # Number of results to retrieve (default: 20)
@@ -218,7 +224,7 @@ Ask questions directly about your personal PDFs, documents, and any directory co
<img src="videos/paper_clear.gif" alt="LEANN Document Search Demo" width="600"> <img src="videos/paper_clear.gif" alt="LEANN Document Search Demo" width="600">
</p> </p>
The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a README in Chinese) and this is the **easiest example** to run here: The example below asks a question about summarizing our paper (uses default data in `data/`, which is a directory with diverse data sources: two papers, Pride and Prejudice, and a Technical report about LLM in Huawei in Chinese), and this is the **easiest example** to run here:
```bash ```bash
source .venv/bin/activate # Don't forget to activate the virtual environment source .venv/bin/activate # Don't forget to activate the virtual environment
@@ -413,7 +419,26 @@ Once the index is built, you can ask questions like:
</details> </details>
### 🚀 Claude Code Integration: Transform Your Development Workflow!
**The future of code assistance is here.** Transform your development workflow with LEANN's native MCP integration for Claude Code. Index your entire codebase and get intelligent code assistance directly in your IDE.
**Key features:**
- 🔍 **Semantic code search** across your entire project
- 📚 **Context-aware assistance** for debugging and development
- 🚀 **Zero-config setup** with automatic language detection
```bash
# Install LEANN globally for MCP integration
uv tool install leann-core
# Setup is automatic - just start using Claude Code!
```
Try our fully agentic pipeline with auto query rewriting, semantic search planning, and more:
![LEANN MCP Integration](assets/mcp_leann.png)
**Ready to supercharge your coding?** [Complete Setup Guide →](packages/leann-mcp/README.md)
## 🖥️ Command Line Interface ## 🖥️ Command Line Interface
@@ -427,7 +452,7 @@ source .venv/bin/activate
leann --help leann --help
``` ```
**To make it globally available (recommended for daily use):** **To make it globally available:**
```bash ```bash
# Install the LEANN CLI globally using uv tool # Install the LEANN CLI globally using uv tool
uv tool install leann uv tool install leann
@@ -436,13 +461,15 @@ uv tool install leann
leann --help leann --help
``` ```
> **Note**: Global installation is required for Claude Code integration. The `leann_mcp` server depends on the globally available `leann` command.
### Usage Examples ### Usage Examples
```bash ```bash
# Build an index from documents # build from a specific directory, and my_docs is the index name
leann build my-docs --docs ./documents leann build my-docs --docs ./your_documents
# Search your documents # Search your documents
leann search my-docs "machine learning concepts" leann search my-docs "machine learning concepts"
@@ -574,8 +601,11 @@ MIT License - see [LICENSE](LICENSE) for details.
## 🙏 Acknowledgments ## 🙏 Acknowledgments
This work is done at [**Berkeley Sky Computing Lab**](https://sky.cs.berkeley.edu/). Core Contributors: [Yichuan Wang](https://yichuan-w.github.io/) & [Zhifei Li](https://github.com/andylizf).
We welcome more contributors! Feel free to open issues or submit PRs.
This work is done at [**Berkeley Sky Computing Lab**](https://sky.cs.berkeley.edu/).
--- ---

View File

@@ -75,7 +75,7 @@ class BaseRAGExample(ABC):
"--embedding-mode", "--embedding-mode",
type=str, type=str,
default="sentence-transformers", default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx"], choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode (default: sentence-transformers)", help="Embedding backend mode (default: sentence-transformers)",
) )
@@ -85,7 +85,7 @@ class BaseRAGExample(ABC):
"--llm", "--llm",
type=str, type=str,
default="openai", default="openai",
choices=["openai", "ollama", "hf"], choices=["openai", "ollama", "hf", "simulated"],
help="LLM backend to use (default: openai)", help="LLM backend to use (default: openai)",
) )
llm_group.add_argument( llm_group.add_argument(
@@ -100,6 +100,13 @@ class BaseRAGExample(ABC):
default="http://localhost:11434", default="http://localhost:11434",
help="Host for Ollama API (default: http://localhost:11434)", help="Host for Ollama API (default: http://localhost:11434)",
) )
llm_group.add_argument(
"--thinking-budget",
type=str,
choices=["low", "medium", "high"],
default=None,
help="Thinking budget for reasoning models (low/medium/high). Supported by GPT-Oss:20b and other reasoning models.",
)
# Search parameters # Search parameters
search_group = parser.add_argument_group("Search Parameters") search_group = parser.add_argument_group("Search Parameters")
@@ -228,7 +235,17 @@ class BaseRAGExample(ABC):
if not query: if not query:
continue continue
response = chat.ask(query, top_k=args.top_k, complexity=args.search_complexity) # Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if hasattr(args, "thinking_budget") and args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask(
query,
top_k=args.top_k,
complexity=args.search_complexity,
llm_kwargs=llm_kwargs,
)
print(f"\nAssistant: {response}\n") print(f"\nAssistant: {response}\n")
except KeyboardInterrupt: except KeyboardInterrupt:
@@ -247,7 +264,15 @@ class BaseRAGExample(ABC):
) )
print(f"\n[Query]: \033[36m{query}\033[0m") print(f"\n[Query]: \033[36m{query}\033[0m")
response = chat.ask(query, top_k=args.top_k, complexity=args.search_complexity)
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if hasattr(args, "thinking_budget") and args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask(
query, top_k=args.top_k, complexity=args.search_complexity, llm_kwargs=llm_kwargs
)
print(f"\n[Response]: \033[36m{response}\033[0m") print(f"\n[Response]: \033[36m{response}\033[0m")
async def run(self): async def run(self):

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

BIN
assets/mcp_leann.png Normal file
View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

View File

@@ -0,0 +1,123 @@
# Thinking Budget Feature Implementation
## Overview
This document describes the implementation of the **thinking budget** feature for LEANN, which allows users to control the computational effort for reasoning models like GPT-Oss:20b.
## Feature Description
The thinking budget feature provides three levels of computational effort for reasoning models:
- **`low`**: Fast responses, basic reasoning (default for simple queries)
- **`medium`**: Balanced speed and reasoning depth
- **`high`**: Maximum reasoning effort, best for complex analytical questions
## Implementation Details
### 1. Command Line Interface
Added `--thinking-budget` parameter to both CLI and RAG examples:
```bash
# LEANN CLI
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high
# RAG Examples
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
python apps/document_rag.py --llm openai --llm-model o3 --thinking-budget medium
```
### 2. LLM Backend Support
#### Ollama Backend (`packages/leann-core/src/leann/chat.py`)
```python
def ask(self, prompt: str, **kwargs) -> str:
# Handle thinking budget for reasoning models
options = kwargs.copy()
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget:
options.pop("thinking_budget", None)
if thinking_budget in ["low", "medium", "high"]:
options["reasoning"] = {"effort": thinking_budget, "exclude": False}
```
**API Format**: Uses Ollama's `reasoning` parameter with `effort` and `exclude` fields.
#### OpenAI Backend (`packages/leann-core/src/leann/chat.py`)
```python
def ask(self, prompt: str, **kwargs) -> str:
# Handle thinking budget for reasoning models
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget and thinking_budget in ["low", "medium", "high"]:
# Check if this is an o-series model
o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
if any(model in self.model for model in o_series_models):
params["reasoning_effort"] = thinking_budget
```
**API Format**: Uses OpenAI's `reasoning_effort` parameter for o-series models.
### 3. Parameter Propagation
The thinking budget parameter is properly propagated through the LEANN architecture:
1. **CLI** (`packages/leann-core/src/leann/cli.py`): Captures `--thinking-budget` argument
2. **Base RAG** (`apps/base_rag_example.py`): Adds parameter to argument parser
3. **LeannChat** (`packages/leann-core/src/leann/api.py`): Passes `llm_kwargs` to LLM
4. **LLM Interface**: Handles the parameter in backend-specific implementations
## Files Modified
### Core Implementation
- `packages/leann-core/src/leann/chat.py`: Added thinking budget support to OllamaChat and OpenAIChat
- `packages/leann-core/src/leann/cli.py`: Added `--thinking-budget` argument
- `apps/base_rag_example.py`: Added thinking budget parameter to RAG examples
### Documentation
- `README.md`: Added thinking budget parameter to usage examples
- `docs/configuration-guide.md`: Added detailed documentation and usage guidelines
### Examples
- `examples/thinking_budget_demo.py`: Comprehensive demo script with usage examples
## Usage Examples
### Basic Usage
```bash
# High reasoning effort for complex questions
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high
# Medium reasoning for balanced performance
leann ask my-index --llm openai --model gpt-4o --thinking-budget medium
# Low reasoning for fast responses
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget low
```
### RAG Examples
```bash
# Email RAG with high reasoning
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
# Document RAG with medium reasoning
python apps/document_rag.py --llm openai --llm-model gpt-4o --thinking-budget medium
```
## Supported Models
### Ollama Models
- **GPT-Oss:20b**: Primary target model with reasoning capabilities
- **Other reasoning models**: Any Ollama model that supports the `reasoning` parameter
### OpenAI Models
- **o3, o3-mini, o4-mini, o1**: o-series reasoning models with `reasoning_effort` parameter
- **GPT-OSS models**: Models that support reasoning capabilities
## Testing
The implementation includes comprehensive testing:
- Parameter handling verification
- Backend-specific API format validation
- CLI argument parsing tests
- Integration with existing LEANN architecture

View File

@@ -49,14 +49,25 @@ Based on our experience developing LEANN, embedding models fall into three categ
- **Cons**: Slower inference, longer index build times - **Cons**: Slower inference, longer index build times
- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use - **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
### Quick Start: OpenAI Embeddings (Fastest Setup) ### Quick Start: Cloud and Local Embedding Options
**OpenAI Embeddings (Fastest Setup)**
For immediate testing without local model downloads: For immediate testing without local model downloads:
```bash ```bash
# Set OpenAI embeddings (requires OPENAI_API_KEY) # Set OpenAI embeddings (requires OPENAI_API_KEY)
--embedding-mode openai --embedding-model text-embedding-3-small --embedding-mode openai --embedding-model text-embedding-3-small
``` ```
**Ollama Embeddings (Privacy-Focused)**
For local embeddings with complete privacy:
```bash
# First, pull an embedding model
ollama pull nomic-embed-text
# Use Ollama embeddings
--embedding-mode ollama --embedding-model nomic-embed-text
```
<details> <details>
<summary><strong>Cloud vs Local Trade-offs</strong></summary> <summary><strong>Cloud vs Local Trade-offs</strong></summary>
@@ -103,13 +114,15 @@ For immediate testing without local model downloads:
**OpenAI** (`--llm openai`) **OpenAI** (`--llm openai`)
- **Pros**: Best quality, consistent performance, no local resources needed - **Pros**: Best quality, consistent performance, no local resources needed
- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns - **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive) - **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3` (reasoning), `o3-mini` (reasoning, cheaper)
- **Thinking Budget**: Use `--thinking-budget low/medium/high` for o-series reasoning models (o3, o3-mini, o4-mini)
- **Note**: Our current default, but we recommend switching to Ollama for most use cases - **Note**: Our current default, but we recommend switching to Ollama for most use cases
**Ollama** (`--llm ollama`) **Ollama** (`--llm ollama`)
- **Pros**: Fully local, free, privacy-preserving, good model variety - **Pros**: Fully local, free, privacy-preserving, good model variety
- **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull` - **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
- **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning) - **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
- **Thinking Budget**: Use `--thinking-budget low/medium/high` for reasoning models like GPT-Oss:20b
**HuggingFace** (`--llm hf`) **HuggingFace** (`--llm hf`)
- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach) - **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
@@ -151,6 +164,36 @@ For immediate testing without local model downloads:
- LLM processing time ∝ top_k × chunk_size - LLM processing time ∝ top_k × chunk_size
- Total context = top_k × chunk_size tokens - Total context = top_k × chunk_size tokens
### Thinking Budget for Reasoning Models
**`--thinking-budget`** (reasoning effort level)
- Controls the computational effort for reasoning models
- Options: `low`, `medium`, `high`
- Guidelines:
- `low`: Fast responses, basic reasoning (default for simple queries)
- `medium`: Balanced speed and reasoning depth
- `high`: Maximum reasoning effort, best for complex analytical questions
- **Supported Models**:
- **Ollama**: `gpt-oss:20b`, `gpt-oss:120b`
- **OpenAI**: `o3`, `o3-mini`, `o4-mini`, `o1` (o-series reasoning models)
- **Note**: Models without reasoning support will show a warning and proceed without reasoning parameters
- **Example**: `--thinking-budget high` for complex analytical questions
**📖 For detailed usage examples and implementation details, check out [Thinking Budget Documentation](THINKING_BUDGET_FEATURE.md)**
**💡 Quick Examples:**
```bash
# OpenAI o-series reasoning model
python apps/document_rag.py --query "What are the main techniques LEANN explores?" \
--index-dir hnswbuild --backend hnsw \
--llm openai --llm-model o3 --thinking-budget medium
# Ollama reasoning model
python apps/document_rag.py --query "What are the main techniques LEANN explores?" \
--index-dir hnswbuild --backend hnsw \
--llm ollama --llm-model gpt-oss:20b --thinking-budget high
```
### Graph Degree (HNSW/DiskANN) ### Graph Degree (HNSW/DiskANN)
**`--graph-degree`** **`--graph-degree`**
@@ -179,9 +222,15 @@ For immediate testing without local model downloads:
3. **Use MLX on Apple Silicon** (optional optimization): 3. **Use MLX on Apple Silicon** (optional optimization):
```bash ```bash
--embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx --embedding-mode mlx --embedding-model mlx-community/Qwen3-Embedding-0.6B-8bit
``` ```
MLX might not be the best choice, as we tested and found that it only offers 1.3x acceleration compared to HF, so maybe using ollama is a better choice for embedding generation
4. **Use Ollama**
```bash
--embedding-mode ollama --embedding-model nomic-embed-text
```
To discover additional embedding models in ollama, check out https://ollama.com/search?c=embedding or read more about embedding models at https://ollama.com/blog/embedding-models, please do check the model size that works best for you
### If Search Quality is Poor ### If Search Quality is Poor
1. **Increase retrieval count**: 1. **Increase retrieval count**:

View File

@@ -1,8 +0,0 @@
# packages/leann-backend-diskann/CMakeLists.txt (simplified version)
cmake_minimum_required(VERSION 3.20)
project(leann_backend_diskann_wrapper)
# Tell CMake to directly enter the DiskANN submodule and execute its own CMakeLists.txt
# DiskANN will handle everything itself, including compiling Python bindings
add_subdirectory(src/third_party/DiskANN)

View File

@@ -4,7 +4,7 @@ import os
import struct import struct
import sys import sys
from pathlib import Path from pathlib import Path
from typing import Any, Literal from typing import Any, Literal, Optional
import numpy as np import numpy as np
import psutil import psutil
@@ -259,7 +259,7 @@ class DiskannSearcher(BaseSearcher):
prune_ratio: float = 0.0, prune_ratio: float = 0.0,
recompute_embeddings: bool = False, recompute_embeddings: bool = False,
pruning_strategy: Literal["global", "local", "proportional"] = "global", pruning_strategy: Literal["global", "local", "proportional"] = "global",
zmq_port: int | None = None, zmq_port: Optional[int] = None,
batch_recompute: bool = False, batch_recompute: bool = False,
dedup_node_dis: bool = False, dedup_node_dis: bool = False,
**kwargs, **kwargs,

View File

@@ -10,6 +10,7 @@ import sys
import threading import threading
import time import time
from pathlib import Path from pathlib import Path
from typing import Optional
import numpy as np import numpy as np
import zmq import zmq
@@ -32,7 +33,7 @@ if not logger.handlers:
def create_diskann_embedding_server( def create_diskann_embedding_server(
passages_file: str | None = None, passages_file: Optional[str] = None,
zmq_port: int = 5555, zmq_port: int = 5555,
model_name: str = "sentence-transformers/all-mpnet-base-v2", model_name: str = "sentence-transformers/all-mpnet-base-v2",
embedding_mode: str = "sentence-transformers", embedding_mode: str = "sentence-transformers",
@@ -261,7 +262,7 @@ if __name__ == "__main__":
"--embedding-mode", "--embedding-mode",
type=str, type=str,
default="sentence-transformers", default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx"], choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode", help="Embedding backend mode",
) )
parser.add_argument( parser.add_argument(

View File

@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
[project] [project]
name = "leann-backend-diskann" name = "leann-backend-diskann"
version = "0.2.1" version = "0.2.7"
dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"] dependencies = ["leann-core==0.2.7", "numpy", "protobuf>=3.19.0"]
[tool.scikit-build] [tool.scikit-build]
# Key: simplified CMake path # Key: simplified CMake path
@@ -17,3 +17,5 @@ editable.mode = "redirect"
cmake.build-type = "Release" cmake.build-type = "Release"
build.verbose = true build.verbose = true
build.tool-args = ["-j8"] build.tool-args = ["-j8"]
# Let CMake find packages via Homebrew prefix
cmake.define = {CMAKE_PREFIX_PATH = {env = "CMAKE_PREFIX_PATH"}, OpenMP_ROOT = {env = "OpenMP_ROOT"}}

View File

@@ -5,11 +5,20 @@ set(CMAKE_CXX_COMPILER_WORKS 1)
# Set OpenMP path for macOS # Set OpenMP path for macOS
if(APPLE) if(APPLE)
set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include") # Detect Homebrew installation path (Apple Silicon vs Intel)
set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include") if(EXISTS "/opt/homebrew/opt/libomp")
set(HOMEBREW_PREFIX "/opt/homebrew")
elseif(EXISTS "/usr/local/opt/libomp")
set(HOMEBREW_PREFIX "/usr/local")
else()
message(FATAL_ERROR "Could not find libomp installation. Please install with: brew install libomp")
endif()
set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -I${HOMEBREW_PREFIX}/opt/libomp/include")
set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -I${HOMEBREW_PREFIX}/opt/libomp/include")
set(OpenMP_C_LIB_NAMES "omp") set(OpenMP_C_LIB_NAMES "omp")
set(OpenMP_CXX_LIB_NAMES "omp") set(OpenMP_CXX_LIB_NAMES "omp")
set(OpenMP_omp_LIBRARY "/opt/homebrew/opt/libomp/lib/libomp.dylib") set(OpenMP_omp_LIBRARY "${HOMEBREW_PREFIX}/opt/libomp/lib/libomp.dylib")
# Force use of system libc++ to avoid version mismatch # Force use of system libc++ to avoid version mismatch
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libc++") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libc++")

View File

@@ -2,7 +2,7 @@ import logging
import os import os
import shutil import shutil
from pathlib import Path from pathlib import Path
from typing import Any, Literal from typing import Any, Literal, Optional
import numpy as np import numpy as np
from leann.interface import ( from leann.interface import (
@@ -152,7 +152,7 @@ class HNSWSearcher(BaseSearcher):
self, self,
query: np.ndarray, query: np.ndarray,
top_k: int, top_k: int,
zmq_port: int | None = None, zmq_port: Optional[int] = None,
complexity: int = 64, complexity: int = 64,
beam_width: int = 1, beam_width: int = 1,
prune_ratio: float = 0.0, prune_ratio: float = 0.0,

View File

@@ -10,6 +10,7 @@ import sys
import threading import threading
import time import time
from pathlib import Path from pathlib import Path
from typing import Union
import msgpack import msgpack
import numpy as np import numpy as np
@@ -33,7 +34,7 @@ if not logger.handlers:
def create_hnsw_embedding_server( def create_hnsw_embedding_server(
passages_file: str | None = None, passages_file: Union[str, None] = None,
zmq_port: int = 5555, zmq_port: int = 5555,
model_name: str = "sentence-transformers/all-mpnet-base-v2", model_name: str = "sentence-transformers/all-mpnet-base-v2",
distance_metric: str = "mips", distance_metric: str = "mips",
@@ -295,7 +296,7 @@ if __name__ == "__main__":
"--embedding-mode", "--embedding-mode",
type=str, type=str,
default="sentence-transformers", default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx"], choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode", help="Embedding backend mode",
) )

View File

@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
[project] [project]
name = "leann-backend-hnsw" name = "leann-backend-hnsw"
version = "0.2.1" version = "0.2.7"
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit." description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
dependencies = [ dependencies = [
"leann-core==0.2.1", "leann-core==0.2.7",
"numpy", "numpy",
"pyzmq>=23.0.0", "pyzmq>=23.0.0",
"msgpack>=1.0.0", "msgpack>=1.0.0",
@@ -22,6 +22,8 @@ cmake.build-type = "Release"
build.verbose = true build.verbose = true
build.tool-args = ["-j8"] build.tool-args = ["-j8"]
# CMake definitions to optimize compilation # CMake definitions to optimize compilation and find Homebrew packages
[tool.scikit-build.cmake.define] [tool.scikit-build.cmake.define]
CMAKE_BUILD_PARALLEL_LEVEL = "8" CMAKE_BUILD_PARALLEL_LEVEL = "8"
CMAKE_PREFIX_PATH = {env = "CMAKE_PREFIX_PATH"}
OpenMP_ROOT = {env = "OpenMP_ROOT"}

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "leann-core" name = "leann-core"
version = "0.2.1" version = "0.2.7"
description = "Core API and plugin system for LEANN" description = "Core API and plugin system for LEANN"
readme = "README.md" readme = "README.md"
requires-python = ">=3.9" requires-python = ">=3.9"
@@ -31,8 +31,10 @@ dependencies = [
"PyPDF2>=3.0.0", "PyPDF2>=3.0.0",
"pymupdf>=1.23.0", "pymupdf>=1.23.0",
"pdfplumber>=0.10.0", "pdfplumber>=0.10.0",
"mlx>=0.26.3; sys_platform == 'darwin'", "nbconvert>=7.0.0", # For .ipynb file support
"mlx-lm>=0.26.0; sys_platform == 'darwin'", "gitignore-parser>=0.1.12", # For proper .gitignore handling
"mlx>=0.26.3; sys_platform == 'darwin' and platform_machine == 'arm64'",
"mlx-lm>=0.26.0; sys_platform == 'darwin' and platform_machine == 'arm64'",
] ]
[project.optional-dependencies] [project.optional-dependencies]
@@ -44,6 +46,7 @@ colab = [
[project.scripts] [project.scripts]
leann = "leann.cli:main" leann = "leann.cli:main"
leann_mcp = "leann.mcp:main"
[tool.setuptools.packages.find] [tool.setuptools.packages.find]
where = ["src"] where = ["src"]

View File

@@ -10,7 +10,7 @@ import time
import warnings import warnings
from dataclasses import dataclass, field from dataclasses import dataclass, field
from pathlib import Path from pathlib import Path
from typing import Any, Literal from typing import Any, Literal, Optional
import numpy as np import numpy as np
@@ -33,7 +33,7 @@ def compute_embeddings(
model_name: str, model_name: str,
mode: str = "sentence-transformers", mode: str = "sentence-transformers",
use_server: bool = True, use_server: bool = True,
port: int | None = None, port: Optional[int] = None,
is_build=False, is_build=False,
) -> np.ndarray: ) -> np.ndarray:
""" """
@@ -157,12 +157,12 @@ class LeannBuilder:
self, self,
backend_name: str, backend_name: str,
embedding_model: str = "facebook/contriever", embedding_model: str = "facebook/contriever",
dimensions: int | None = None, dimensions: Optional[int] = None,
embedding_mode: str = "sentence-transformers", embedding_mode: str = "sentence-transformers",
**backend_kwargs, **backend_kwargs,
): ):
self.backend_name = backend_name self.backend_name = backend_name
backend_factory: LeannBackendFactoryInterface | None = BACKEND_REGISTRY.get(backend_name) backend_factory: Optional[LeannBackendFactoryInterface] = BACKEND_REGISTRY.get(backend_name)
if backend_factory is None: if backend_factory is None:
raise ValueError(f"Backend '{backend_name}' not found or not registered.") raise ValueError(f"Backend '{backend_name}' not found or not registered.")
self.backend_factory = backend_factory self.backend_factory = backend_factory
@@ -242,7 +242,7 @@ class LeannBuilder:
self.backend_kwargs = backend_kwargs self.backend_kwargs = backend_kwargs
self.chunks: list[dict[str, Any]] = [] self.chunks: list[dict[str, Any]] = []
def add_text(self, text: str, metadata: dict[str, Any] | None = None): def add_text(self, text: str, metadata: Optional[dict[str, Any]] = None):
if metadata is None: if metadata is None:
metadata = {} metadata = {}
passage_id = metadata.get("id", str(len(self.chunks))) passage_id = metadata.get("id", str(len(self.chunks)))
@@ -554,7 +554,7 @@ class LeannSearcher:
if "labels" in results and "distances" in results: if "labels" in results and "distances" in results:
logger.info(f" Processing {len(results['labels'][0])} passage IDs:") logger.info(f" Processing {len(results['labels'][0])} passage IDs:")
for i, (string_id, dist) in enumerate( for i, (string_id, dist) in enumerate(
zip(results["labels"][0], results["distances"][0], strict=False) zip(results["labels"][0], results["distances"][0])
): ):
try: try:
passage_data = self.passage_manager.get_passage(string_id) passage_data = self.passage_manager.get_passage(string_id)
@@ -592,7 +592,7 @@ class LeannChat:
def __init__( def __init__(
self, self,
index_path: str, index_path: str,
llm_config: dict[str, Any] | None = None, llm_config: Optional[dict[str, Any]] = None,
enable_warmup: bool = False, enable_warmup: bool = False,
**kwargs, **kwargs,
): ):
@@ -608,7 +608,7 @@ class LeannChat:
prune_ratio: float = 0.0, prune_ratio: float = 0.0,
recompute_embeddings: bool = True, recompute_embeddings: bool = True,
pruning_strategy: Literal["global", "local", "proportional"] = "global", pruning_strategy: Literal["global", "local", "proportional"] = "global",
llm_kwargs: dict[str, Any] | None = None, llm_kwargs: Optional[dict[str, Any]] = None,
expected_zmq_port: int = 5557, expected_zmq_port: int = 5557,
**search_kwargs, **search_kwargs,
): ):

View File

@@ -8,7 +8,7 @@ import difflib
import logging import logging
import os import os
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from typing import Any from typing import Any, Optional
import torch import torch
@@ -17,12 +17,12 @@ logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def check_ollama_models() -> list[str]: def check_ollama_models(host: str) -> list[str]:
"""Check available Ollama models and return a list""" """Check available Ollama models and return a list"""
try: try:
import requests import requests
response = requests.get("http://localhost:11434/api/tags", timeout=5) response = requests.get(f"{host}/api/tags", timeout=5)
if response.status_code == 200: if response.status_code == 200:
data = response.json() data = response.json()
return [model["name"] for model in data.get("models", [])] return [model["name"] for model in data.get("models", [])]
@@ -309,10 +309,12 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:
return search_hf_models_fuzzy(query, limit) return search_hf_models_fuzzy(query, limit)
def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None: def validate_model_and_suggest(
model_name: str, llm_type: str, host: str = "http://localhost:11434"
) -> Optional[str]:
"""Validate model name and provide suggestions if invalid""" """Validate model name and provide suggestions if invalid"""
if llm_type == "ollama": if llm_type == "ollama":
available_models = check_ollama_models() available_models = check_ollama_models(host)
if available_models and model_name not in available_models: if available_models and model_name not in available_models:
error_msg = f"Model '{model_name}' not found in your local Ollama installation." error_msg = f"Model '{model_name}' not found in your local Ollama installation."
@@ -469,7 +471,7 @@ class OllamaChat(LLMInterface):
requests.get(host) requests.get(host)
# Pre-check model availability with helpful suggestions # Pre-check model availability with helpful suggestions
model_error = validate_model_and_suggest(model, "ollama") model_error = validate_model_and_suggest(model, "ollama", host)
if model_error: if model_error:
raise ValueError(model_error) raise ValueError(model_error)
@@ -489,11 +491,35 @@ class OllamaChat(LLMInterface):
import requests import requests
full_url = f"{self.host}/api/generate" full_url = f"{self.host}/api/generate"
# Handle thinking budget for reasoning models
options = kwargs.copy()
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget:
# Remove thinking_budget from options as it's not a standard Ollama option
options.pop("thinking_budget", None)
# Only apply reasoning parameters to models that support it
reasoning_supported_models = [
"gpt-oss:20b",
"gpt-oss:120b",
"deepseek-r1",
"deepseek-coder",
]
if thinking_budget in ["low", "medium", "high"]:
if any(model in self.model.lower() for model in reasoning_supported_models):
options["reasoning"] = {"effort": thinking_budget, "exclude": False}
logger.info(f"Applied reasoning effort={thinking_budget} to model {self.model}")
else:
logger.warning(
f"Thinking budget '{thinking_budget}' requested but model '{self.model}' may not support reasoning parameters. Proceeding without reasoning."
)
payload = { payload = {
"model": self.model, "model": self.model,
"prompt": prompt, "prompt": prompt,
"stream": False, # Keep it simple for now "stream": False, # Keep it simple for now
"options": kwargs, "options": options,
} }
logger.debug(f"Sending request to Ollama: {payload}") logger.debug(f"Sending request to Ollama: {payload}")
try: try:
@@ -659,7 +685,7 @@ class HFChat(LLMInterface):
class OpenAIChat(LLMInterface): class OpenAIChat(LLMInterface):
"""LLM interface for OpenAI models.""" """LLM interface for OpenAI models."""
def __init__(self, model: str = "gpt-4o", api_key: str | None = None): def __init__(self, model: str = "gpt-4o", api_key: Optional[str] = None):
self.model = model self.model = model
self.api_key = api_key or os.getenv("OPENAI_API_KEY") self.api_key = api_key or os.getenv("OPENAI_API_KEY")
@@ -684,11 +710,38 @@ class OpenAIChat(LLMInterface):
params = { params = {
"model": self.model, "model": self.model,
"messages": [{"role": "user", "content": prompt}], "messages": [{"role": "user", "content": prompt}],
"max_tokens": kwargs.get("max_tokens", 1000),
"temperature": kwargs.get("temperature", 0.7), "temperature": kwargs.get("temperature", 0.7),
**{k: v for k, v in kwargs.items() if k not in ["max_tokens", "temperature"]},
} }
# Handle max_tokens vs max_completion_tokens based on model
max_tokens = kwargs.get("max_tokens", 1000)
if "o3" in self.model or "o4" in self.model or "o1" in self.model:
# o-series models use max_completion_tokens
params["max_completion_tokens"] = max_tokens
params["temperature"] = 1.0
else:
# Other models use max_tokens
params["max_tokens"] = max_tokens
# Handle thinking budget for reasoning models
thinking_budget = kwargs.get("thinking_budget")
if thinking_budget and thinking_budget in ["low", "medium", "high"]:
# Check if this is an o-series model (partial match for model names)
o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
if any(model in self.model for model in o_series_models):
# Use the correct OpenAI reasoning parameter format
params["reasoning_effort"] = thinking_budget
logger.info(f"Applied reasoning_effort={thinking_budget} to model {self.model}")
else:
logger.warning(
f"Thinking budget '{thinking_budget}' requested but model '{self.model}' may not support reasoning parameters. Proceeding without reasoning."
)
# Add other kwargs (excluding thinking_budget as it's handled above)
for k, v in kwargs.items():
if k not in ["max_tokens", "temperature", "thinking_budget"]:
params[k] = v
logger.info(f"Sending request to OpenAI with model {self.model}") logger.info(f"Sending request to OpenAI with model {self.model}")
try: try:
@@ -708,7 +761,7 @@ class SimulatedChat(LLMInterface):
return "This is a simulated answer from the LLM based on the retrieved context." return "This is a simulated answer from the LLM based on the retrieved context."
def get_llm(llm_config: dict[str, Any] | None = None) -> LLMInterface: def get_llm(llm_config: Optional[dict[str, Any]] = None) -> LLMInterface:
""" """
Factory function to get an LLM interface based on configuration. Factory function to get an LLM interface based on configuration.

View File

@@ -1,6 +1,7 @@
import argparse import argparse
import asyncio import asyncio
from pathlib import Path from pathlib import Path
from typing import Union
from llama_index.core import SimpleDirectoryReader from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter from llama_index.core.node_parser import SentenceSplitter
@@ -41,13 +42,23 @@ def extract_pdf_text_with_pdfplumber(file_path: str) -> str:
class LeannCLI: class LeannCLI:
def __init__(self): def __init__(self):
self.indexes_dir = Path.home() / ".leann" / "indexes" # Always use project-local .leann directory (like .git)
self.indexes_dir = Path.cwd() / ".leann" / "indexes"
self.indexes_dir.mkdir(parents=True, exist_ok=True) self.indexes_dir.mkdir(parents=True, exist_ok=True)
# Default parser for documents
self.node_parser = SentenceSplitter( self.node_parser = SentenceSplitter(
chunk_size=256, chunk_overlap=128, separator=" ", paragraph_separator="\n\n" chunk_size=256, chunk_overlap=128, separator=" ", paragraph_separator="\n\n"
) )
# Code-optimized parser
self.code_parser = SentenceSplitter(
chunk_size=512, # Larger chunks for code context
chunk_overlap=50, # Less overlap to preserve function boundaries
separator="\n", # Split by lines for code
paragraph_separator="\n\n", # Preserve logical code blocks
)
def get_index_path(self, index_name: str) -> str: def get_index_path(self, index_name: str) -> str:
index_dir = self.indexes_dir / index_name index_dir = self.indexes_dir / index_name
return str(index_dir / "documents.leann") return str(index_dir / "documents.leann")
@@ -64,10 +75,11 @@ class LeannCLI:
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=""" epilog="""
Examples: Examples:
leann build my-docs --docs ./documents # Build index named my-docs leann build my-docs --docs ./documents # Build index named my-docs
leann search my-docs "query" # Search in my-docs index leann build my-ppts --docs ./ --file-types .pptx,.pdf # Index only PowerPoint and PDF files
leann ask my-docs "question" # Ask my-docs index leann search my-docs "query" # Search in my-docs index
leann list # List all stored indexes leann ask my-docs "question" # Ask my-docs index
leann list # List all stored indexes
""", """,
) )
@@ -75,18 +87,34 @@ Examples:
# Build command # Build command
build_parser = subparsers.add_parser("build", help="Build document index") build_parser = subparsers.add_parser("build", help="Build document index")
build_parser.add_argument("index_name", help="Index name") build_parser.add_argument(
build_parser.add_argument("--docs", type=str, required=True, help="Documents directory") "index_name", nargs="?", help="Index name (default: current directory name)"
)
build_parser.add_argument(
"--docs", type=str, default=".", help="Documents directory (default: current directory)"
)
build_parser.add_argument( build_parser.add_argument(
"--backend", type=str, default="hnsw", choices=["hnsw", "diskann"] "--backend", type=str, default="hnsw", choices=["hnsw", "diskann"]
) )
build_parser.add_argument("--embedding-model", type=str, default="facebook/contriever") build_parser.add_argument("--embedding-model", type=str, default="facebook/contriever")
build_parser.add_argument(
"--embedding-mode",
type=str,
default="sentence-transformers",
choices=["sentence-transformers", "openai", "mlx", "ollama"],
help="Embedding backend mode (default: sentence-transformers)",
)
build_parser.add_argument("--force", "-f", action="store_true", help="Force rebuild") build_parser.add_argument("--force", "-f", action="store_true", help="Force rebuild")
build_parser.add_argument("--graph-degree", type=int, default=32) build_parser.add_argument("--graph-degree", type=int, default=32)
build_parser.add_argument("--complexity", type=int, default=64) build_parser.add_argument("--complexity", type=int, default=64)
build_parser.add_argument("--num-threads", type=int, default=1) build_parser.add_argument("--num-threads", type=int, default=1)
build_parser.add_argument("--compact", action="store_true", default=True) build_parser.add_argument("--compact", action="store_true", default=True)
build_parser.add_argument("--recompute", action="store_true", default=True) build_parser.add_argument("--recompute", action="store_true", default=True)
build_parser.add_argument(
"--file-types",
type=str,
help="Comma-separated list of file extensions to include (e.g., '.txt,.pdf,.pptx'). If not specified, uses default supported types.",
)
# Search command # Search command
search_parser = subparsers.add_parser("search", help="Search documents") search_parser = subparsers.add_parser("search", help="Search documents")
@@ -96,7 +124,12 @@ Examples:
search_parser.add_argument("--complexity", type=int, default=64) search_parser.add_argument("--complexity", type=int, default=64)
search_parser.add_argument("--beam-width", type=int, default=1) search_parser.add_argument("--beam-width", type=int, default=1)
search_parser.add_argument("--prune-ratio", type=float, default=0.0) search_parser.add_argument("--prune-ratio", type=float, default=0.0)
search_parser.add_argument("--recompute-embeddings", action="store_true") search_parser.add_argument(
"--recompute-embeddings",
action="store_true",
default=True,
help="Recompute embeddings (default: True)",
)
search_parser.add_argument( search_parser.add_argument(
"--pruning-strategy", "--pruning-strategy",
choices=["global", "local", "proportional"], choices=["global", "local", "proportional"],
@@ -119,94 +152,370 @@ Examples:
ask_parser.add_argument("--complexity", type=int, default=32) ask_parser.add_argument("--complexity", type=int, default=32)
ask_parser.add_argument("--beam-width", type=int, default=1) ask_parser.add_argument("--beam-width", type=int, default=1)
ask_parser.add_argument("--prune-ratio", type=float, default=0.0) ask_parser.add_argument("--prune-ratio", type=float, default=0.0)
ask_parser.add_argument("--recompute-embeddings", action="store_true") ask_parser.add_argument(
"--recompute-embeddings",
action="store_true",
default=True,
help="Recompute embeddings (default: True)",
)
ask_parser.add_argument( ask_parser.add_argument(
"--pruning-strategy", "--pruning-strategy",
choices=["global", "local", "proportional"], choices=["global", "local", "proportional"],
default="global", default="global",
) )
ask_parser.add_argument(
"--thinking-budget",
type=str,
choices=["low", "medium", "high"],
default=None,
help="Thinking budget for reasoning models (low/medium/high). Supported by GPT-Oss:20b and other reasoning models.",
)
# List command # List command
subparsers.add_parser("list", help="List all indexes") subparsers.add_parser("list", help="List all indexes")
return parser return parser
def register_project_dir(self):
"""Register current project directory in global registry"""
global_registry = Path.home() / ".leann" / "projects.json"
global_registry.parent.mkdir(exist_ok=True)
current_dir = str(Path.cwd())
# Load existing registry
projects = []
if global_registry.exists():
try:
import json
with open(global_registry) as f:
projects = json.load(f)
except Exception:
projects = []
# Add current directory if not already present
if current_dir not in projects:
projects.append(current_dir)
# Save registry
import json
with open(global_registry, "w") as f:
json.dump(projects, f, indent=2)
def _build_gitignore_parser(self, docs_dir: str):
"""Build gitignore parser using gitignore-parser library."""
from gitignore_parser import parse_gitignore
# Try to parse the root .gitignore
gitignore_path = Path(docs_dir) / ".gitignore"
if gitignore_path.exists():
try:
# gitignore-parser automatically handles all subdirectory .gitignore files!
matches = parse_gitignore(str(gitignore_path))
print(f"📋 Loaded .gitignore from {docs_dir} (includes all subdirectories)")
return matches
except Exception as e:
print(f"Warning: Could not parse .gitignore: {e}")
else:
print("📋 No .gitignore found")
# Fallback: basic pattern matching for essential files
essential_patterns = {".git", ".DS_Store", "__pycache__", "node_modules", ".venv", "venv"}
def basic_matches(file_path):
path_parts = Path(file_path).parts
return any(part in essential_patterns for part in path_parts)
return basic_matches
def _should_exclude_file(self, relative_path: Path, gitignore_matches) -> bool:
"""Check if a file should be excluded using gitignore parser."""
return gitignore_matches(str(relative_path))
def list_indexes(self): def list_indexes(self):
print("Stored LEANN indexes:") print("Stored LEANN indexes:")
if not self.indexes_dir.exists(): # Get all project directories with .leann
global_registry = Path.home() / ".leann" / "projects.json"
all_projects = []
if global_registry.exists():
try:
import json
with open(global_registry) as f:
all_projects = json.load(f)
except Exception:
pass
# Filter to only existing directories with .leann
valid_projects = []
for project_dir in all_projects:
project_path = Path(project_dir)
if project_path.exists() and (project_path / ".leann" / "indexes").exists():
valid_projects.append(project_path)
# Add current project if it has .leann but not in registry
current_path = Path.cwd()
if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects:
valid_projects.append(current_path)
if not valid_projects:
print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.") print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
return return
index_dirs = [d for d in self.indexes_dir.iterdir() if d.is_dir()] total_indexes = 0
current_dir = Path.cwd()
if not index_dirs: for project_path in valid_projects:
print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.") indexes_dir = project_path / ".leann" / "indexes"
return if not indexes_dir.exists():
continue
print(f"Found {len(index_dirs)} indexes:") index_dirs = [d for d in indexes_dir.iterdir() if d.is_dir()]
for i, index_dir in enumerate(index_dirs, 1): if not index_dirs:
index_name = index_dir.name continue
status = "" if self.index_exists(index_name) else ""
print(f" {i}. {index_name} [{status}]") # Show project header
if self.index_exists(index_name): if project_path == current_dir:
index_dir / "documents.leann.meta.json" print(f"\n📁 Current project ({project_path}):")
size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / ( else:
1024 * 1024 print(f"\n📂 {project_path}:")
)
print(f" Size: {size_mb:.1f} MB")
if index_dirs: for index_dir in index_dirs:
example_name = index_dirs[0].name total_indexes += 1
print("\nUsage:") index_name = index_dir.name
print(f' leann search {example_name} "your query"') meta_file = index_dir / "documents.leann.meta.json"
print(f" leann ask {example_name} --interactive") status = "" if meta_file.exists() else ""
def load_documents(self, docs_dir: str): print(f" {total_indexes}. {index_name} [{status}]")
if status == "":
size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / (
1024 * 1024
)
print(f" Size: {size_mb:.1f} MB")
if total_indexes > 0:
print(f"\nTotal: {total_indexes} indexes across {len(valid_projects)} projects")
print("\nUsage (current project only):")
# Show example from current project
current_indexes_dir = current_dir / ".leann" / "indexes"
if current_indexes_dir.exists():
current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()]
if current_index_dirs:
example_name = current_index_dirs[0].name
print(f' leann search {example_name} "your query"')
print(f" leann ask {example_name} --interactive")
def load_documents(self, docs_dir: str, custom_file_types: Union[str, None] = None):
print(f"Loading documents from {docs_dir}...") print(f"Loading documents from {docs_dir}...")
if custom_file_types:
print(f"Using custom file types: {custom_file_types}")
# Try to use better PDF parsers first # Build gitignore parser
gitignore_matches = self._build_gitignore_parser(docs_dir)
# Try to use better PDF parsers first, but only if PDFs are requested
documents = [] documents = []
docs_path = Path(docs_dir) docs_path = Path(docs_dir)
for file_path in docs_path.rglob("*.pdf"): # Check if we should process PDFs
print(f"Processing PDF: {file_path}") should_process_pdfs = custom_file_types is None or ".pdf" in custom_file_types
# Try PyMuPDF first (best quality) if should_process_pdfs:
text = extract_pdf_text_with_pymupdf(str(file_path)) for file_path in docs_path.rglob("*.pdf"):
if text is None: # Check if file matches any exclude pattern
# Try pdfplumber relative_path = file_path.relative_to(docs_path)
text = extract_pdf_text_with_pdfplumber(str(file_path)) if self._should_exclude_file(relative_path, gitignore_matches):
continue
if text: print(f"Processing PDF: {file_path}")
# Create a simple document structure
from llama_index.core import Document
doc = Document(text=text, metadata={"source": str(file_path)}) # Try PyMuPDF first (best quality)
documents.append(doc) text = extract_pdf_text_with_pymupdf(str(file_path))
else: if text is None:
# Fallback to default reader # Try pdfplumber
print(f"Using default reader for {file_path}") text = extract_pdf_text_with_pdfplumber(str(file_path))
default_docs = SimpleDirectoryReader(
str(file_path.parent), if text:
filename_as_id=True, # Create a simple document structure
required_exts=[file_path.suffix], from llama_index.core import Document
).load_data()
documents.extend(default_docs) doc = Document(text=text, metadata={"source": str(file_path)})
documents.append(doc)
else:
# Fallback to default reader
print(f"Using default reader for {file_path}")
try:
default_docs = SimpleDirectoryReader(
str(file_path.parent),
filename_as_id=True,
required_exts=[file_path.suffix],
).load_data()
documents.extend(default_docs)
except Exception as e:
print(f"Warning: Could not process {file_path}: {e}")
# Load other file types with default reader # Load other file types with default reader
other_docs = SimpleDirectoryReader( if custom_file_types:
docs_dir, # Parse custom file types from comma-separated string
recursive=True, code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
encoding="utf-8", # Ensure extensions start with a dot
required_exts=[".txt", ".md", ".docx"], code_extensions = [ext if ext.startswith(".") else f".{ext}" for ext in code_extensions]
).load_data(show_progress=True) else:
documents.extend(other_docs) # Use default supported file types
code_extensions = [
# Original document types
".txt",
".md",
".docx",
".pptx",
# Code files for Claude Code integration
".py",
".js",
".ts",
".jsx",
".tsx",
".java",
".cpp",
".c",
".h",
".hpp",
".cs",
".go",
".rs",
".rb",
".php",
".swift",
".kt",
".scala",
".r",
".sql",
".sh",
".bash",
".zsh",
".fish",
".ps1",
".bat",
# Config and markup files
".json",
".yaml",
".yml",
".xml",
".toml",
".ini",
".cfg",
".conf",
".html",
".css",
".scss",
".less",
".vue",
".svelte",
# Data science
".ipynb",
".R",
".py",
".jl",
]
# Try to load other file types, but don't fail if none are found
try:
# Create a custom file filter function using our PathSpec
def file_filter(file_path: str) -> bool:
"""Return True if file should be included (not excluded)"""
try:
docs_path_obj = Path(docs_dir)
file_path_obj = Path(file_path)
relative_path = file_path_obj.relative_to(docs_path_obj)
return not self._should_exclude_file(relative_path, gitignore_matches)
except (ValueError, OSError):
return True # Include files that can't be processed
other_docs = SimpleDirectoryReader(
docs_dir,
recursive=True,
encoding="utf-8",
required_exts=code_extensions,
file_extractor={}, # Use default extractors
filename_as_id=True,
).load_data(show_progress=True)
# Filter documents after loading based on gitignore rules
filtered_docs = []
for doc in other_docs:
file_path = doc.metadata.get("file_path", "")
if file_filter(file_path):
filtered_docs.append(doc)
documents.extend(filtered_docs)
except ValueError as e:
if "No files found" in str(e):
print("No additional files found for other supported types.")
else:
raise e
all_texts = [] all_texts = []
# Define code file extensions for intelligent chunking
code_file_exts = {
".py",
".js",
".ts",
".jsx",
".tsx",
".java",
".cpp",
".c",
".h",
".hpp",
".cs",
".go",
".rs",
".rb",
".php",
".swift",
".kt",
".scala",
".r",
".sql",
".sh",
".bash",
".zsh",
".fish",
".ps1",
".bat",
".json",
".yaml",
".yml",
".xml",
".toml",
".ini",
".cfg",
".conf",
".html",
".css",
".scss",
".less",
".vue",
".svelte",
".ipynb",
".R",
".jl",
}
for doc in documents: for doc in documents:
nodes = self.node_parser.get_nodes_from_documents([doc]) # Check if this is a code file based on source path
source_path = doc.metadata.get("source", "")
is_code_file = any(source_path.endswith(ext) for ext in code_file_exts)
# Use appropriate parser based on file type
parser = self.code_parser if is_code_file else self.node_parser
nodes = parser.get_nodes_from_documents([doc])
for node in nodes: for node in nodes:
all_texts.append(node.get_content()) all_texts.append(node.get_content())
@@ -215,15 +524,23 @@ Examples:
async def build_index(self, args): async def build_index(self, args):
docs_dir = args.docs docs_dir = args.docs
index_name = args.index_name # Use current directory name if index_name not provided
if args.index_name:
index_name = args.index_name
else:
index_name = Path.cwd().name
print(f"Using current directory name as index: '{index_name}'")
index_dir = self.indexes_dir / index_name index_dir = self.indexes_dir / index_name
index_path = self.get_index_path(index_name) index_path = self.get_index_path(index_name)
print(f"📂 Indexing: {Path(docs_dir).resolve()}")
if index_dir.exists() and not args.force: if index_dir.exists() and not args.force:
print(f"Index '{index_name}' already exists. Use --force to rebuild.") print(f"Index '{index_name}' already exists. Use --force to rebuild.")
return return
all_texts = self.load_documents(docs_dir) all_texts = self.load_documents(docs_dir, args.file_types)
if not all_texts: if not all_texts:
print("No documents found") print("No documents found")
return return
@@ -235,6 +552,7 @@ Examples:
builder = LeannBuilder( builder = LeannBuilder(
backend_name=args.backend, backend_name=args.backend,
embedding_model=args.embedding_model, embedding_model=args.embedding_model,
embedding_mode=args.embedding_mode,
graph_degree=args.graph_degree, graph_degree=args.graph_degree,
complexity=args.complexity, complexity=args.complexity,
is_compact=args.compact, is_compact=args.compact,
@@ -248,6 +566,9 @@ Examples:
builder.build_index(index_path) builder.build_index(index_path)
print(f"Index built at {index_path}") print(f"Index built at {index_path}")
# Register this project directory in global registry
self.register_project_dir()
async def search_documents(self, args): async def search_documents(self, args):
index_name = args.index_name index_name = args.index_name
query = args.query query = args.query
@@ -308,6 +629,11 @@ Examples:
if not user_input: if not user_input:
continue continue
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask( response = chat.ask(
user_input, user_input,
top_k=args.top_k, top_k=args.top_k,
@@ -316,11 +642,17 @@ Examples:
prune_ratio=args.prune_ratio, prune_ratio=args.prune_ratio,
recompute_embeddings=args.recompute_embeddings, recompute_embeddings=args.recompute_embeddings,
pruning_strategy=args.pruning_strategy, pruning_strategy=args.pruning_strategy,
llm_kwargs=llm_kwargs,
) )
print(f"LEANN: {response}") print(f"LEANN: {response}")
else: else:
query = input("Enter your question: ").strip() query = input("Enter your question: ").strip()
if query: if query:
# Prepare LLM kwargs with thinking budget if specified
llm_kwargs = {}
if args.thinking_budget:
llm_kwargs["thinking_budget"] = args.thinking_budget
response = chat.ask( response = chat.ask(
query, query,
top_k=args.top_k, top_k=args.top_k,
@@ -329,6 +661,7 @@ Examples:
prune_ratio=args.prune_ratio, prune_ratio=args.prune_ratio,
recompute_embeddings=args.recompute_embeddings, recompute_embeddings=args.recompute_embeddings,
pruning_strategy=args.pruning_strategy, pruning_strategy=args.pruning_strategy,
llm_kwargs=llm_kwargs,
) )
print(f"LEANN: {response}") print(f"LEANN: {response}")

View File

@@ -6,6 +6,7 @@ Preserves all optimization parameters to ensure performance
import logging import logging
import os import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Any from typing import Any
import numpy as np import numpy as np
@@ -35,7 +36,7 @@ def compute_embeddings(
Args: Args:
texts: List of texts to compute embeddings for texts: List of texts to compute embeddings for
model_name: Model name model_name: Model name
mode: Computation mode ('sentence-transformers', 'openai', 'mlx') mode: Computation mode ('sentence-transformers', 'openai', 'mlx', 'ollama')
is_build: Whether this is a build operation (shows progress bar) is_build: Whether this is a build operation (shows progress bar)
batch_size: Batch size for processing batch_size: Batch size for processing
adaptive_optimization: Whether to use adaptive optimization based on batch size adaptive_optimization: Whether to use adaptive optimization based on batch size
@@ -55,6 +56,8 @@ def compute_embeddings(
return compute_embeddings_openai(texts, model_name) return compute_embeddings_openai(texts, model_name)
elif mode == "mlx": elif mode == "mlx":
return compute_embeddings_mlx(texts, model_name) return compute_embeddings_mlx(texts, model_name)
elif mode == "ollama":
return compute_embeddings_ollama(texts, model_name, is_build=is_build)
else: else:
raise ValueError(f"Unsupported embedding mode: {mode}") raise ValueError(f"Unsupported embedding mode: {mode}")
@@ -365,3 +368,262 @@ def compute_embeddings_mlx(chunks: list[str], model_name: str, batch_size: int =
# Stack numpy arrays # Stack numpy arrays
return np.stack(all_embeddings) return np.stack(all_embeddings)
def compute_embeddings_ollama(
texts: list[str], model_name: str, is_build: bool = False, host: str = "http://localhost:11434"
) -> np.ndarray:
"""
Compute embeddings using Ollama API.
Args:
texts: List of texts to compute embeddings for
model_name: Ollama model name (e.g., "nomic-embed-text", "mxbai-embed-large")
is_build: Whether this is a build operation (shows progress bar)
host: Ollama host URL (default: http://localhost:11434)
Returns:
Normalized embeddings array, shape: (len(texts), embedding_dim)
"""
try:
import requests
except ImportError:
raise ImportError(
"The 'requests' library is required for Ollama embeddings. Install with: uv pip install requests"
)
if not texts:
raise ValueError("Cannot compute embeddings for empty text list")
logger.info(
f"Computing embeddings for {len(texts)} texts using Ollama API, model: '{model_name}'"
)
# Check if Ollama is running
try:
response = requests.get(f"{host}/api/version", timeout=5)
response.raise_for_status()
except requests.exceptions.ConnectionError:
error_msg = (
f"❌ Could not connect to Ollama at {host}.\n\n"
"Please ensure Ollama is running:\n"
" • macOS/Linux: ollama serve\n"
" • Windows: Make sure Ollama is running in the system tray\n\n"
"Installation: https://ollama.com/download"
)
raise RuntimeError(error_msg)
except Exception as e:
raise RuntimeError(f"Unexpected error connecting to Ollama: {e}")
# Check if model exists and provide helpful suggestions
try:
response = requests.get(f"{host}/api/tags", timeout=5)
response.raise_for_status()
models = response.json()
model_names = [model["name"] for model in models.get("models", [])]
# Filter for embedding models (models that support embeddings)
embedding_models = []
suggested_embedding_models = [
"nomic-embed-text",
"mxbai-embed-large",
"bge-m3",
"all-minilm",
"snowflake-arctic-embed",
]
for model in model_names:
# Check if it's an embedding model (by name patterns or known models)
base_name = model.split(":")[0]
if any(emb in base_name for emb in ["embed", "bge", "minilm", "e5"]):
embedding_models.append(model)
# Check if model exists (handle versioned names)
model_found = any(
model_name == name.split(":")[0] or model_name == name for name in model_names
)
if not model_found:
error_msg = f"❌ Model '{model_name}' not found in local Ollama.\n\n"
# Suggest pulling the model
error_msg += "📦 To install this embedding model:\n"
error_msg += f" ollama pull {model_name}\n\n"
# Show available embedding models
if embedding_models:
error_msg += "✅ Available embedding models:\n"
for model in embedding_models[:5]:
error_msg += f"{model}\n"
if len(embedding_models) > 5:
error_msg += f" ... and {len(embedding_models) - 5} more\n"
else:
error_msg += "💡 Popular embedding models to install:\n"
for model in suggested_embedding_models[:3]:
error_msg += f" • ollama pull {model}\n"
error_msg += "\n📚 Browse more: https://ollama.com/library"
raise ValueError(error_msg)
# Verify the model supports embeddings by testing it
try:
test_response = requests.post(
f"{host}/api/embeddings", json={"model": model_name, "prompt": "test"}, timeout=10
)
if test_response.status_code != 200:
error_msg = (
f"⚠️ Model '{model_name}' exists but may not support embeddings.\n\n"
f"Please use an embedding model like:\n"
)
for model in suggested_embedding_models[:3]:
error_msg += f"{model}\n"
raise ValueError(error_msg)
except requests.exceptions.RequestException:
# If test fails, continue anyway - model might still work
pass
except requests.exceptions.RequestException as e:
logger.warning(f"Could not verify model existence: {e}")
# Process embeddings with optimized concurrent processing
import requests
def get_single_embedding(text_idx_tuple):
"""Helper function to get embedding for a single text."""
text, idx = text_idx_tuple
max_retries = 3
retry_count = 0
# Truncate very long texts to avoid API issues
truncated_text = text[:8000] if len(text) > 8000 else text
while retry_count < max_retries:
try:
response = requests.post(
f"{host}/api/embeddings",
json={"model": model_name, "prompt": truncated_text},
timeout=30,
)
response.raise_for_status()
result = response.json()
embedding = result.get("embedding")
if embedding is None:
raise ValueError(f"No embedding returned for text {idx}")
return idx, embedding
except requests.exceptions.Timeout:
retry_count += 1
if retry_count >= max_retries:
logger.warning(f"Timeout for text {idx} after {max_retries} retries")
return idx, None
except Exception as e:
if retry_count >= max_retries - 1:
logger.error(f"Failed to get embedding for text {idx}: {e}")
return idx, None
retry_count += 1
return idx, None
# Determine if we should use concurrent processing
use_concurrent = (
len(texts) > 5 and not is_build
) # Don't use concurrent in build mode to avoid overwhelming
max_workers = min(4, len(texts)) # Limit concurrent requests to avoid overwhelming Ollama
all_embeddings = [None] * len(texts) # Pre-allocate list to maintain order
failed_indices = []
if use_concurrent:
logger.info(
f"Using concurrent processing with {max_workers} workers for {len(texts)} texts"
)
with ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all tasks
future_to_idx = {
executor.submit(get_single_embedding, (text, idx)): idx
for idx, text in enumerate(texts)
}
# Add progress bar for concurrent processing
try:
if is_build or len(texts) > 10:
from tqdm import tqdm
futures_iterator = tqdm(
as_completed(future_to_idx),
total=len(texts),
desc="Computing Ollama embeddings",
)
else:
futures_iterator = as_completed(future_to_idx)
except ImportError:
futures_iterator = as_completed(future_to_idx)
# Collect results as they complete
for future in futures_iterator:
try:
idx, embedding = future.result()
if embedding is not None:
all_embeddings[idx] = embedding
else:
failed_indices.append(idx)
except Exception as e:
idx = future_to_idx[future]
logger.error(f"Exception for text {idx}: {e}")
failed_indices.append(idx)
else:
# Sequential processing with progress bar
show_progress = is_build or len(texts) > 10
try:
if show_progress:
from tqdm import tqdm
iterator = tqdm(
enumerate(texts), total=len(texts), desc="Computing Ollama embeddings"
)
else:
iterator = enumerate(texts)
except ImportError:
iterator = enumerate(texts)
for idx, text in iterator:
result_idx, embedding = get_single_embedding((text, idx))
if embedding is not None:
all_embeddings[idx] = embedding
else:
failed_indices.append(idx)
# Handle failed embeddings
if failed_indices:
if len(failed_indices) == len(texts):
raise RuntimeError("Failed to compute any embeddings")
logger.warning(f"Failed to compute embeddings for {len(failed_indices)}/{len(texts)} texts")
# Use zero embeddings as fallback for failed ones
valid_embedding = next((e for e in all_embeddings if e is not None), None)
if valid_embedding:
embedding_dim = len(valid_embedding)
for idx in failed_indices:
all_embeddings[idx] = [0.0] * embedding_dim
# Remove None values and convert to numpy array
all_embeddings = [e for e in all_embeddings if e is not None]
# Convert to numpy array and normalize
embeddings = np.array(all_embeddings, dtype=np.float32)
# Normalize embeddings (L2 normalization)
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
embeddings = embeddings / (norms + 1e-8) # Add small epsilon to avoid division by zero
logger.info(f"Generated {len(embeddings)} embeddings, dimension: {embeddings.shape[1]}")
return embeddings

View File

@@ -6,6 +6,7 @@ import subprocess
import sys import sys
import time import time
from pathlib import Path from pathlib import Path
from typing import Optional
import psutil import psutil
@@ -182,8 +183,8 @@ class EmbeddingServerManager:
e.g., "leann_backend_diskann.embedding_server" e.g., "leann_backend_diskann.embedding_server"
""" """
self.backend_module_name = backend_module_name self.backend_module_name = backend_module_name
self.server_process: subprocess.Popen | None = None self.server_process: Optional[subprocess.Popen] = None
self.server_port: int | None = None self.server_port: Optional[int] = None
self._atexit_registered = False self._atexit_registered = False
def start_server( def start_server(

View File

@@ -1,5 +1,5 @@
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from typing import Any, Literal from typing import Any, Literal, Union
import numpy as np import numpy as np
@@ -34,7 +34,9 @@ class LeannBackendSearcherInterface(ABC):
pass pass
@abstractmethod @abstractmethod
def _ensure_server_running(self, passages_source_file: str, port: int | None, **kwargs) -> int: def _ensure_server_running(
self, passages_source_file: str, port: Union[int, None], **kwargs
) -> int:
"""Ensure server is running""" """Ensure server is running"""
pass pass
@@ -48,7 +50,7 @@ class LeannBackendSearcherInterface(ABC):
prune_ratio: float = 0.0, prune_ratio: float = 0.0,
recompute_embeddings: bool = False, recompute_embeddings: bool = False,
pruning_strategy: Literal["global", "local", "proportional"] = "global", pruning_strategy: Literal["global", "local", "proportional"] = "global",
zmq_port: int | None = None, zmq_port: Union[int, None] = None,
**kwargs, **kwargs,
) -> dict[str, Any]: ) -> dict[str, Any]:
"""Search for nearest neighbors """Search for nearest neighbors
@@ -74,7 +76,7 @@ class LeannBackendSearcherInterface(ABC):
self, self,
query: str, query: str,
use_server_if_available: bool = True, use_server_if_available: bool = True,
zmq_port: int | None = None, zmq_port: Union[int, None] = None,
) -> np.ndarray: ) -> np.ndarray:
"""Compute embedding for a query string """Compute embedding for a query string

View File

@@ -0,0 +1,176 @@
#!/usr/bin/env python3
import json
import subprocess
import sys
def handle_request(request):
if request.get("method") == "initialize":
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"capabilities": {"tools": {}},
"protocolVersion": "2024-11-05",
"serverInfo": {"name": "leann-mcp", "version": "1.0.0"},
},
}
elif request.get("method") == "tools/list":
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"tools": [
{
"name": "leann_search",
"description": """🔍 Search code using natural language - like having a coding assistant who knows your entire codebase!
🎯 **Perfect for**:
- "How does authentication work?" → finds auth-related code
- "Error handling patterns" → locates try-catch blocks and error logic
- "Database connection setup" → finds DB initialization code
- "API endpoint definitions" → locates route handlers
- "Configuration management" → finds config files and usage
💡 **Pro tip**: Use this before making any changes to understand existing patterns and conventions.""",
"inputSchema": {
"type": "object",
"properties": {
"index_name": {
"type": "string",
"description": "Name of the LEANN index to search. Use 'leann_list' first to see available indexes.",
},
"query": {
"type": "string",
"description": "Search query - can be natural language (e.g., 'how to handle errors') or technical terms (e.g., 'async function definition')",
},
"top_k": {
"type": "integer",
"default": 5,
"minimum": 1,
"maximum": 20,
"description": "Number of search results to return. Use 5-10 for focused results, 15-20 for comprehensive exploration.",
},
"complexity": {
"type": "integer",
"default": 32,
"minimum": 16,
"maximum": 128,
"description": "Search complexity level. Use 16-32 for fast searches (recommended), 64+ for higher precision when needed.",
},
},
"required": ["index_name", "query"],
},
},
{
"name": "leann_status",
"description": "📊 Check the health and stats of your code indexes - like a medical checkup for your codebase knowledge!",
"inputSchema": {
"type": "object",
"properties": {
"index_name": {
"type": "string",
"description": "Optional: Name of specific index to check. If not provided, shows status of all indexes.",
}
},
},
},
{
"name": "leann_list",
"description": "📋 Show all your indexed codebases - your personal code library! Use this to see what's available for search.",
"inputSchema": {"type": "object", "properties": {}},
},
]
},
}
elif request.get("method") == "tools/call":
tool_name = request["params"]["name"]
args = request["params"].get("arguments", {})
try:
if tool_name == "leann_search":
# Validate required parameters
if not args.get("index_name") or not args.get("query"):
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"content": [
{
"type": "text",
"text": "Error: Both index_name and query are required",
}
]
},
}
# Build simplified command
cmd = [
"leann",
"search",
args["index_name"],
args["query"],
f"--top-k={args.get('top_k', 5)}",
f"--complexity={args.get('complexity', 32)}",
]
result = subprocess.run(cmd, capture_output=True, text=True)
elif tool_name == "leann_status":
if args.get("index_name"):
# Check specific index status - for now, we'll use leann list and filter
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
# We could enhance this to show more detailed status per index
else:
# Show all indexes status
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
elif tool_name == "leann_list":
result = subprocess.run(["leann", "list"], capture_output=True, text=True)
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"result": {
"content": [
{
"type": "text",
"text": result.stdout
if result.returncode == 0
else f"Error: {result.stderr}",
}
]
},
}
except Exception as e:
return {
"jsonrpc": "2.0",
"id": request.get("id"),
"error": {"code": -1, "message": str(e)},
}
def main():
for line in sys.stdin:
try:
request = json.loads(line.strip())
response = handle_request(request)
if response:
print(json.dumps(response))
sys.stdout.flush()
except Exception as e:
error_response = {
"jsonrpc": "2.0",
"id": None,
"error": {"code": -1, "message": str(e)},
}
print(json.dumps(error_response))
sys.stdout.flush()
if __name__ == "__main__":
main()

View File

@@ -1,7 +1,7 @@
import json import json
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from pathlib import Path from pathlib import Path
from typing import Any, Literal from typing import Any, Literal, Optional
import numpy as np import numpy as np
@@ -169,7 +169,7 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
prune_ratio: float = 0.0, prune_ratio: float = 0.0,
recompute_embeddings: bool = False, recompute_embeddings: bool = False,
pruning_strategy: Literal["global", "local", "proportional"] = "global", pruning_strategy: Literal["global", "local", "proportional"] = "global",
zmq_port: int | None = None, zmq_port: Optional[int] = None,
**kwargs, **kwargs,
) -> dict[str, Any]: ) -> dict[str, Any]:
""" """

View File

@@ -0,0 +1,91 @@
# 🔥 LEANN Claude Code Integration
Transform your development workflow with intelligent code assistance using LEANN's semantic search directly in Claude Code.
## Prerequisites
**Step 1:** First, complete the basic LEANN installation following the [📦 Installation guide](../../README.md#installation) in the root README:
```bash
uv venv
source .venv/bin/activate
uv pip install leann
```
**Step 2:** Install LEANN globally for MCP integration:
```bash
uv tool install leann-core
```
This makes the `leann` command available system-wide, which `leann_mcp` requires.
## 🚀 Quick Setup
Add the LEANN MCP server to Claude Code:
```bash
claude mcp add leann-server -- leann_mcp
```
## 🛠️ Available Tools
Once connected, you'll have access to these powerful semantic search tools in Claude Code:
- **`leann_list`** - List all available indexes across your projects
- **`leann_search`** - Perform semantic searches across code and documents
- **`leann_ask`** - Ask natural language questions and get AI-powered answers from your codebase
## 🎯 Quick Start Example
```bash
# Build an index for your project (change to your actual path)
leann build my-project --docs ./
# Start Claude Code
claude
```
**Try this in Claude Code:**
```
Help me understand this codebase. List available indexes and search for authentication patterns.
```
<p align="center">
<img src="../../assets/claude_code_leann.png" alt="LEANN in Claude Code" width="80%">
</p>
## 🧠 How It Works
The integration consists of three key components working seamlessly together:
- **`leann`** - Core CLI tool for indexing and searching (installed globally via `uv tool install`)
- **`leann_mcp`** - MCP server that wraps `leann` commands for Claude Code integration
- **Claude Code** - Calls `leann_mcp`, which executes `leann` commands and returns intelligent results
## 📁 File Support
LEANN understands **30+ file types** including:
- **Programming**: Python, JavaScript, TypeScript, Java, Go, Rust, C++, C#
- **Data**: SQL, YAML, JSON, CSV, XML
- **Documentation**: Markdown, TXT, PDF
- **And many more!**
## 💾 Storage & Organization
- **Project indexes**: Stored in `.leann/` directory (just like `.git`)
- **Global registry**: Project tracking at `~/.leann/projects.json`
- **Multi-project support**: Switch between different codebases seamlessly
- **Portable**: Transfer indexes between machines with minimal overhead
## 🗑️ Uninstalling
To remove the LEANN MCP server from Claude Code:
```bash
claude mcp remove leann-server
```
To remove LEANN
```
uv pip uninstall leann leann-backend-hnsw leann-core
```

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "leann" name = "leann"
version = "0.2.1" version = "0.2.7"
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!" description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
readme = "README.md" readme = "README.md"
requires-python = ">=3.9" requires-python = ">=3.9"

View File

@@ -32,7 +32,7 @@ dependencies = [
"pypdfium2>=4.30.0", "pypdfium2>=4.30.0",
# LlamaIndex core and readers - updated versions # LlamaIndex core and readers - updated versions
"llama-index>=0.12.44", "llama-index>=0.12.44",
"llama-index-readers-file>=0.4.0", # Essential for PDF parsing "llama-index-readers-file>=0.4.0", # Essential for PDF parsing
# "llama-index-readers-docling", # Requires Python >= 3.10 # "llama-index-readers-docling", # Requires Python >= 3.10
# "llama-index-node-parser-docling", # Requires Python >= 3.10 # "llama-index-node-parser-docling", # Requires Python >= 3.10
"llama-index-vector-stores-faiss>=0.4.0", "llama-index-vector-stores-faiss>=0.4.0",
@@ -40,9 +40,12 @@ dependencies = [
# Other dependencies # Other dependencies
"ipykernel==6.29.5", "ipykernel==6.29.5",
"msgpack>=1.1.1", "msgpack>=1.1.1",
"mlx>=0.26.3; sys_platform == 'darwin'", "mlx>=0.26.3; sys_platform == 'darwin' and platform_machine == 'arm64'",
"mlx-lm>=0.26.0; sys_platform == 'darwin'", "mlx-lm>=0.26.0; sys_platform == 'darwin' and platform_machine == 'arm64'",
"psutil>=5.8.0", "psutil>=5.8.0",
"pathspec>=0.12.1",
"nbconvert>=7.16.6",
"gitignore-parser>=0.1.12",
] ]
[project.optional-dependencies] [project.optional-dependencies]
@@ -88,7 +91,7 @@ leann-backend-diskann = { path = "packages/leann-backend-diskann", editable = tr
leann-backend-hnsw = { path = "packages/leann-backend-hnsw", editable = true } leann-backend-hnsw = { path = "packages/leann-backend-hnsw", editable = true }
[tool.ruff] [tool.ruff]
target-version = "py310" target-version = "py39"
line-length = 100 line-length = 100
extend-exclude = [ extend-exclude = [
"third_party", "third_party",

7318
uv.lock generated
View File

File diff suppressed because it is too large Load Diff