Merge branch 'main' into feature/graph-partition-support

fix: handle non-daemon threads blocking process exit
The root cause was pytest-timeout creating non-daemon threads that prevented the Python process from exiting, even after all tests completed. Fixes: 1. Configure pytest-timeout to use 'thread' method instead of default - Avoids creating problematic non-daemon threads 2. Add aggressive thread cleanup in conftest.py - Convert pytest-timeout threads to daemon threads - Force exit with os._exit(0) in CI if non-daemon threads remain 3. Enhanced cleanup in both global_test_cleanup and pytest_sessionfinish - Detect and handle stuck threads - Clear diagnostics about what's blocking exit The issue was that even though tests finished in 51 seconds, a non-daemon thread 'pytest_timeout tests/test_readme_examples.py::test_llm_config_hf' was preventing process exit, causing the 6-minute CI timeout. This should finally solve the hanging CI problem.
2025-08-11 01:46:31 -07:00 · 2025-08-08 23:20:52 -07:00 · 2025-08-08 22:48:40 -07:00 · 2025-08-08 21:27:04 -07:00 · 2025-08-08 21:25:58 -07:00 · 2025-08-08 18:55:50 -07:00
32 changed files with 1679 additions and 1241 deletions
--- a/.github/workflows/build-and-publish.yml
+++ b/.github/workflows/build-and-publish.yml
@@ -6,7 +6,15 @@ on:
  pull_request:
    branches: [ main ]
  workflow_dispatch:
+    inputs:
+      debug_enabled:
+        type: boolean
+        description: 'Run with tmate debugging enabled (SSH access to runner)'
+        required: false
+        default: false

 jobs:
  build:
    uses: ./.github/workflows/build-reusable.yml
+    with:
+      debug_enabled: ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled || false }}
--- a/.github/workflows/build-reusable.yml
+++ b/.github/workflows/build-reusable.yml
@@ -8,6 +8,11 @@ on:
        required: false
        type: string
        default: ''
+      debug_enabled:
+        description: 'Enable tmate debugging session for troubleshooting'
+        required: false
+        type: boolean
+        default: false

 jobs:
  lint:
@@ -28,7 +33,7 @@ jobs:

      - name: Install ruff
        run: |
-          uv tool install ruff
+          uv tool install ruff==0.12.7

      - name: Run ruff check
        run: |
@@ -54,36 +59,16 @@ jobs:
            python: '3.12'
          - os: ubuntu-22.04
            python: '3.13'
-          - os: macos-14
+          - os: macos-latest
            python: '3.9'
-          - os: macos-14
+          - os: macos-latest
            python: '3.10'
-          - os: macos-14
+          - os: macos-latest
            python: '3.11'
-          - os: macos-14
+          - os: macos-latest
            python: '3.12'
-          - os: macos-14
+          - os: macos-latest
            python: '3.13'
-          - os: macos-15
-            python: '3.9'
-          - os: macos-15
-            python: '3.10'
-          - os: macos-15
-            python: '3.11'
-          - os: macos-15
-            python: '3.12'
-          - os: macos-15
-            python: '3.13'
-          - os: macos-13
-            python: '3.9'
-          - os: macos-13
-            python: '3.10'
-          - os: macos-13
-            python: '3.11'
-          - os: macos-13
-            python: '3.12'
-          # Note: macos-13 + Python 3.13 excluded due to PyTorch compatibility
-          # (PyTorch 2.5+ supports Python 3.13 but not Intel Mac x86_64)
    runs-on: ${{ matrix.os }}

    steps:
@@ -129,70 +114,41 @@ jobs:
            uv pip install --system delocate
          fi

-      - name: Set macOS environment variables
-        if: runner.os == 'macOS'
-        run: |
-          # Use brew --prefix to automatically detect Homebrew installation path
-          HOMEBREW_PREFIX=$(brew --prefix)
-          echo "HOMEBREW_PREFIX=${HOMEBREW_PREFIX}" >> $GITHUB_ENV
-          echo "OpenMP_ROOT=${HOMEBREW_PREFIX}/opt/libomp" >> $GITHUB_ENV
-
-          # Set CMAKE_PREFIX_PATH to let CMake find all packages automatically
-          echo "CMAKE_PREFIX_PATH=${HOMEBREW_PREFIX}" >> $GITHUB_ENV
-
-          # Set compiler flags for OpenMP (required for both backends)
-          echo "LDFLAGS=-L${HOMEBREW_PREFIX}/opt/libomp/lib" >> $GITHUB_ENV
-          echo "CPPFLAGS=-I${HOMEBREW_PREFIX}/opt/libomp/include" >> $GITHUB_ENV
-
      - name: Build packages
        run: |
-          # Build core (platform independent)
+          # Build core (platform independent) on all platforms for consistency
          cd packages/leann-core
          uv build
          cd ../..

          # Build HNSW backend
          cd packages/leann-backend-hnsw
-          if [[ "${{ matrix.os }}" == macos-* ]]; then
-            # Use system clang for better compatibility
+          if [ "${{ matrix.os }}" == "macos-latest" ]; then
+            # Use system clang instead of homebrew LLVM for better compatibility
            export CC=clang
            export CXX=clang++
-            # Homebrew libraries on each macOS version require matching minimum version
-            if [[ "${{ matrix.os }}" == "macos-13" ]]; then
-              export MACOSX_DEPLOYMENT_TARGET=13.0
-            elif [[ "${{ matrix.os }}" == "macos-14" ]]; then
-              export MACOSX_DEPLOYMENT_TARGET=14.0
-            elif [[ "${{ matrix.os }}" == "macos-15" ]]; then
-              export MACOSX_DEPLOYMENT_TARGET=15.0
-            fi
-            uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
+            export MACOSX_DEPLOYMENT_TARGET=11.0
+            uv build --wheel --python python
          else
-            uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
+            uv build --wheel --python python
          fi
          cd ../..

          # Build DiskANN backend
          cd packages/leann-backend-diskann
-          if [[ "${{ matrix.os }}" == macos-* ]]; then
-            # Use system clang for better compatibility
+          if [ "${{ matrix.os }}" == "macos-latest" ]; then
+            # Use system clang instead of homebrew LLVM for better compatibility
            export CC=clang
            export CXX=clang++
-            # DiskANN requires macOS 13.3+ for sgesdd_ LAPACK function
-            # But Homebrew libraries on each macOS version require matching minimum version
-            if [[ "${{ matrix.os }}" == "macos-13" ]]; then
-              export MACOSX_DEPLOYMENT_TARGET=13.3
-            elif [[ "${{ matrix.os }}" == "macos-14" ]]; then
-              export MACOSX_DEPLOYMENT_TARGET=14.0
-            elif [[ "${{ matrix.os }}" == "macos-15" ]]; then
-              export MACOSX_DEPLOYMENT_TARGET=15.0
-            fi
-            uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
+            # sgesdd_ is only available on macOS 13.3+
+            export MACOSX_DEPLOYMENT_TARGET=13.3
+            uv build --wheel --python python
          else
-            uv build --wheel --python ${{ matrix.python }} --find-links ${GITHUB_WORKSPACE}/packages/leann-core/dist
+            uv build --wheel --python python
          fi
          cd ../..

-          # Build meta package (platform independent)
+          # Build meta package (platform independent) on all platforms
          cd packages/leann
          uv build
          cd ../..
@@ -209,10 +165,15 @@ jobs:
          fi
          cd ../..

-          # Repair DiskANN wheel
+          # Repair DiskANN wheel - use show first to debug
          cd packages/leann-backend-diskann
          if [ -d dist ]; then
+            echo "Checking DiskANN wheel contents before repair:"
+            unzip -l dist/*.whl | grep -E "\.so|\.pyd|_diskannpy" || echo "No .so files found"
+            auditwheel show dist/*.whl || echo "auditwheel show failed"
            auditwheel repair dist/*.whl -w dist_repaired
+            echo "Checking DiskANN wheel contents after repair:"
+            unzip -l dist_repaired/*.whl | grep -E "\.so|\.pyd|_diskannpy" || echo "No .so files found after repair"
            rm -rf dist
            mv dist_repaired dist
          fi
@@ -221,24 +182,10 @@ jobs:
      - name: Repair wheels (macOS)
        if: runner.os == 'macOS'
        run: |
-          # Determine deployment target based on runner OS
-          # Must match the Homebrew libraries for each macOS version
-          if [[ "${{ matrix.os }}" == "macos-13" ]]; then
-            HNSW_TARGET="13.0"
-            DISKANN_TARGET="13.3"
-          elif [[ "${{ matrix.os }}" == "macos-14" ]]; then
-            HNSW_TARGET="14.0"
-            DISKANN_TARGET="14.0"
-          elif [[ "${{ matrix.os }}" == "macos-15" ]]; then
-            HNSW_TARGET="15.0"
-            DISKANN_TARGET="15.0"
-          fi
-
          # Repair HNSW wheel
          cd packages/leann-backend-hnsw
          if [ -d dist ]; then
-            export MACOSX_DEPLOYMENT_TARGET=$HNSW_TARGET
-            delocate-wheel -w dist_repaired -v --require-target-macos-version $HNSW_TARGET dist/*.whl
+            delocate-wheel -w dist_repaired -v dist/*.whl
            rm -rf dist
            mv dist_repaired dist
          fi
@@ -247,8 +194,7 @@ jobs:
          # Repair DiskANN wheel
          cd packages/leann-backend-diskann
          if [ -d dist ]; then
-            export MACOSX_DEPLOYMENT_TARGET=$DISKANN_TARGET
-            delocate-wheel -w dist_repaired -v --require-target-macos-version $DISKANN_TARGET dist/*.whl
+            delocate-wheel -w dist_repaired -v dist/*.whl
            rm -rf dist
            mv dist_repaired dist
          fi
@@ -259,34 +205,242 @@ jobs:
          echo "📦 Built packages:"
          find packages/*/dist -name "*.whl" -o -name "*.tar.gz" | sort

-
      - name: Install built packages for testing
        run: |
          # Create a virtual environment with the correct Python version
-          uv venv --python ${{ matrix.python }}
+          uv venv --python python${{ matrix.python }}
          source .venv/bin/activate || source .venv/Scripts/activate

-          # Install packages using --find-links to prioritize local builds
-          uv pip install --find-links packages/leann-core/dist --find-links packages/leann-backend-hnsw/dist --find-links packages/leann-backend-diskann/dist packages/leann-core/dist/*.whl || uv pip install --find-links packages/leann-core/dist packages/leann-core/dist/*.tar.gz
-          uv pip install --find-links packages/leann-core/dist packages/leann-backend-hnsw/dist/*.whl
-          uv pip install --find-links packages/leann-core/dist packages/leann-backend-diskann/dist/*.whl
-          uv pip install packages/leann/dist/*.whl || uv pip install packages/leann/dist/*.tar.gz
+          # Install the built wheels directly to ensure we use locally built packages
+          # Use only locally built wheels on all platforms for full consistency
+          FIND_LINKS="--find-links packages/leann-core/dist --find-links packages/leann/dist"
+          FIND_LINKS="$FIND_LINKS --find-links packages/leann-backend-hnsw/dist --find-links packages/leann-backend-diskann/dist"
+
+          uv pip install leann-core leann leann-backend-hnsw leann-backend-diskann \
+            $FIND_LINKS --force-reinstall

          # Install test dependencies using extras
          uv pip install -e ".[test]"

+          # Debug: Check if _diskannpy module is installed correctly
+          echo "Checking installed DiskANN module structure:"
+          python -c "import leann_backend_diskann; print('leann_backend_diskann location:', leann_backend_diskann.__file__)" || echo "Failed to import leann_backend_diskann"
+          python -c "from leann_backend_diskann import _diskannpy; print('_diskannpy imported successfully')" || echo "Failed to import _diskannpy"
+          ls -la $(python -c "import leann_backend_diskann; import os; print(os.path.dirname(leann_backend_diskann.__file__))" 2>/dev/null) 2>/dev/null || echo "Failed to list module directory"
+
+          # Extra debugging for Python 3.13
+          if [[ "${{ matrix.python }}" == "3.13" ]]; then
+            echo "=== Python 3.13 Debug Info ==="
+            echo "Python version details:"
+            python --version
+            python -c "import sys; print(f'sys.version_info: {sys.version_info}')"
+
+            echo "Pytest version:"
+            python -m pytest --version
+
+            echo "Testing basic pytest collection:"
+            if [[ "$RUNNER_OS" == "Linux" ]]; then
+              timeout --signal=INT 10 python -m pytest --collect-only tests/test_ci_minimal.py -v || echo "Collection timed out or failed"
+            else
+              # No timeout on macOS/Windows
+              python -m pytest --collect-only tests/test_ci_minimal.py -v || echo "Collection failed"
+            fi
+
+            echo "Testing single simple test:"
+            if [[ "$RUNNER_OS" == "Linux" ]]; then
+              timeout --signal=INT 10 python -m pytest tests/test_ci_minimal.py::test_package_imports --full-trace -v || echo "Simple test timed out or failed"
+            else
+              # No timeout on macOS/Windows
+              python -m pytest tests/test_ci_minimal.py::test_package_imports --full-trace -v || echo "Simple test failed"
+            fi
+          fi
+
+      # Enable tmate debugging session if requested
+      - name: Setup tmate session for debugging
+        if: ${{ inputs.debug_enabled }}
+        uses: mxschmitt/action-tmate@v3
+        with:
+          detached: true
+          timeout-minutes: 30
+          limit-access-to-actor: true
+
      - name: Run tests with pytest
+        # Timeout hierarchy:
+        # 1. Individual test timeout: 20s (see pyproject.toml markers)
+        # 2. Pytest session timeout: 300s (see pyproject.toml [tool.pytest.ini_options])
+        # 3. Outer shell timeout: 360s (300s + 60s buffer for cleanup)
+        # 4. GitHub Actions job timeout: 6 hours (default)
        env:
-          CI: true
+          CI: true  # Mark as CI environment to skip memory-intensive tests
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          HF_HUB_DISABLE_SYMLINKS: 1
          TOKENIZERS_PARALLELISM: false
-          PYTORCH_ENABLE_MPS_FALLBACK: 0
-          OMP_NUM_THREADS: 1
-          MKL_NUM_THREADS: 1
+          PYTORCH_ENABLE_MPS_FALLBACK: 0  # Disable MPS on macOS CI to avoid memory issues
+          OMP_NUM_THREADS: 1  # Disable OpenMP parallelism to avoid libomp crashes
+          MKL_NUM_THREADS: 1  # Single thread for MKL operations
        run: |
+          # Activate virtual environment
          source .venv/bin/activate || source .venv/Scripts/activate
-          pytest tests/ -v --tb=short
+
+          # Define comprehensive diagnostic function
+          diag() {
+            echo "===== COMPREHENSIVE DIAGNOSTICS BEGIN ====="
+            date
+            echo ""
+            echo "### Current Shell Info ###"
+            echo "Shell PID: $$"
+            echo "Shell PPID: $PPID"
+            echo "Current directory: $(pwd)"
+            echo ""
+
+            echo "### Process Tree (full) ###"
+            pstree -ap 2>/dev/null || ps auxf || true
+            echo ""
+
+            echo "### All Python/Pytest Processes ###"
+            ps -ef | grep -E 'python|pytest' | grep -v grep || true
+            echo ""
+
+            echo "### Embedding Server Processes ###"
+            ps -ef | grep -E 'embedding|zmq|diskann' | grep -v grep || true
+            echo ""
+
+            echo "### Network Listeners ###"
+            ss -ltnp 2>/dev/null || netstat -ltn 2>/dev/null || true
+            echo ""
+
+            echo "### Open File Descriptors (lsof) ###"
+            lsof -p $$ 2>/dev/null | head -20 || true
+            echo ""
+
+            echo "### Zombie Processes ###"
+            ps aux | grep '<defunct>' || echo "No zombie processes"
+            echo ""
+
+            echo "### Current Jobs ###"
+            jobs -l || true
+            echo ""
+
+            echo "### /proc/PID/fd for current shell ###"
+            ls -la /proc/$$/fd 2>/dev/null || true
+            echo ""
+
+            echo "===== COMPREHENSIVE DIAGNOSTICS END ====="
+          }
+
+                    # Enable verbose logging for debugging
+          export PYTHONUNBUFFERED=1
+          export PYTEST_CURRENT_TEST=1
+
+          # Run all tests with extensive logging
+          if [[ "$RUNNER_OS" == "Linux" ]]; then
+            echo "🚀 Starting Linux test execution with timeout..."
+            echo "Current time: $(date)"
+            echo "Shell PID: $$"
+            echo "Python: $(python --version)"
+            echo "Pytest: $(pytest --version)"
+
+            # Show environment variables for debugging
+            echo "📦 Environment variables:"
+            env | grep -E "PYTHON|PYTEST|CI|RUNNER" | sort
+
+            # Set trap for diagnostics
+            trap diag INT TERM EXIT
+
+            echo "📋 Pre-test diagnostics:"
+            ps -ef | grep -E 'python|pytest' | grep -v grep || echo "No python/pytest processes before test"
+
+            # Check for any listening ports before test
+            echo "🔌 Pre-test network state:"
+            ss -ltn 2>/dev/null | grep -E "555[0-9]|556[0-9]" || echo "No embedding server ports open"
+
+            # Set timeouts - outer must be larger than pytest's internal timeout
+            # IMPORTANT: Keep PYTEST_TIMEOUT_SEC in sync with pyproject.toml [tool.pytest.ini_options] timeout
+            PYTEST_TIMEOUT_SEC=${PYTEST_TIMEOUT_SEC:-300}  # Default 300s, matches pyproject.toml
+            BUFFER_SEC=${TIMEOUT_BUFFER_SEC:-60}  # Buffer for cleanup after pytest timeout
+            OUTER_TIMEOUT_SEC=${OUTER_TIMEOUT_SEC:-$((PYTEST_TIMEOUT_SEC + BUFFER_SEC))}
+
+            echo "⏰ Timeout configuration:"
+            echo "   - Pytest internal timeout: ${PYTEST_TIMEOUT_SEC}s (from pyproject.toml)"
+            echo "   - Cleanup buffer: ${BUFFER_SEC}s"
+            echo "   - Outer shell timeout: ${OUTER_TIMEOUT_SEC}s (${PYTEST_TIMEOUT_SEC}s + ${BUFFER_SEC}s buffer)"
+            echo "   - This ensures pytest can complete its own timeout handling and cleanup"
+
+            echo "🏃 Running pytest with ${OUTER_TIMEOUT_SEC}s outer timeout..."
+
+            # Export for inner shell
+            export PYTEST_TIMEOUT_SEC OUTER_TIMEOUT_SEC BUFFER_SEC
+
+            timeout --preserve-status --signal=INT --kill-after=10 ${OUTER_TIMEOUT_SEC} bash -c '
+              echo "⏱️ Pytest starting at: $(date)"
+              echo "Running command: pytest tests/ -vv --maxfail=3 --tb=short --capture=no"
+
+              # Run pytest with maximum verbosity and no output capture
+              pytest tests/ -vv --maxfail=3 --tb=short --capture=no --log-cli-level=DEBUG 2>&1 | tee pytest.log
+              PYTEST_EXIT=${PIPESTATUS[0]}
+
+              echo "✅ Pytest finished at: $(date) with exit code: $PYTEST_EXIT"
+              echo "Last 20 lines of pytest output:"
+              tail -20 pytest.log || true
+
+              # Immediately check for leftover processes
+              echo "🔍 Post-pytest process check:"
+              ps -ef | grep -E "python|pytest|embedding" | grep -v grep || echo "No leftover processes"
+
+              # Clean up any children before exit
+              echo "🧹 Cleaning up child processes..."
+              pkill -TERM -P $$ 2>/dev/null || true
+              sleep 0.5
+              pkill -KILL -P $$ 2>/dev/null || true
+
+              echo "📊 Final check before exit:"
+              ps -ef | grep -E "python|pytest|embedding" | grep -v grep || echo "All clean"
+
+              exit $PYTEST_EXIT
+            '
+
+            EXIT_CODE=$?
+            echo "🔚 Timeout command exited with code: $EXIT_CODE"
+
+                        if [ $EXIT_CODE -eq 124 ]; then
+              echo "⚠️ TIMEOUT TRIGGERED - Tests took more than ${OUTER_TIMEOUT_SEC} seconds!"
+              echo "📸 Capturing full diagnostics..."
+              diag
+
+              # Run diagnostic script if available
+              if [ -f scripts/diagnose_hang.sh ]; then
+                echo "🔍 Running diagnostic script..."
+                bash scripts/diagnose_hang.sh || true
+              fi
+
+              # More aggressive cleanup
+              echo "💀 Killing all Python processes owned by runner..."
+              pkill -9 -u runner python || true
+              pkill -9 -u runner pytest || true
+            elif [ $EXIT_CODE -ne 0 ]; then
+              echo "❌ Tests failed with exit code: $EXIT_CODE"
+            else
+              echo "✅ All tests passed!"
+            fi
+
+                        # Always show final state
+            echo "📍 Final state check:"
+            ps -ef | grep -E 'python|pytest|embedding' | grep -v grep || echo "No Python processes remaining"
+
+            exit $EXIT_CODE
+          else
+            # For macOS/Windows, run without GNU timeout
+            echo "🚀 Running tests on $RUNNER_OS..."
+            pytest tests/ -vv --maxfail=3 --tb=short --capture=no --log-cli-level=INFO
+          fi
+
+      # Provide tmate session on test failure for debugging
+      - name: Setup tmate session on failure
+        if: ${{ failure() && (inputs.debug_enabled || contains(github.event.head_commit.message, '[debug]')) }}
+        uses: mxschmitt/action-tmate@v3
+        with:
+          timeout-minutes: 30
+          limit-access-to-actor: true

      - name: Run sanity checks (optional)
        run: |
--- a/README.md
+++ b/README.md
@@ -3,11 +3,10 @@
 </p>

 <p align="center">
-  <img src="https://img.shields.io/badge/Python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue.svg" alt="Python Versions">
-  <img src="https://github.com/yichuan-w/LEANN/actions/workflows/build-and-publish.yml/badge.svg" alt="CI Status">
-  <img src="https://img.shields.io/badge/Platform-Ubuntu%20%7C%20macOS%20(ARM64%2FIntel)-lightgrey" alt="Platform">
+  <img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+">
  <img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License">
-  <img src="https://img.shields.io/badge/MCP-Native%20Integration-blue" alt="MCP Integration">
+  <img src="https://img.shields.io/badge/Platform-Linux%20%7C%20macOS-lightgrey" alt="Platform">
+  <img src="https://img.shields.io/badge/MCP-Native%20Integration-blue?style=flat-square" alt="MCP Integration">
 </p>

 <h2 align="center" tabindex="-1" class="heading-element" dir="auto">
@@ -190,7 +189,7 @@ All RAG examples share these common parameters. **Interactive mode** is availabl
 --force-rebuild         # Force rebuild index even if it exists

 # Embedding Parameters
--embedding-model MODEL  # e.g., facebook/contriever, text-embedding-3-small, nomic-embed-text,mlx-community/Qwen3-Embedding-0.6B-8bit or nomic-embed-text
+--embedding-model MODEL  # e.g., facebook/contriever, text-embedding-3-small, nomic-embed-text, mlx-community/Qwen3-Embedding-0.6B-8bit or nomic-embed-text
 --embedding-mode MODE    # sentence-transformers, openai, mlx, or ollama

 # LLM Parameters (Text generation models)
@@ -468,7 +467,7 @@ leann --help
 ### Usage Examples

 ```bash
-# build from a specific directory, and my_docs is the index name(Here you can also build from multiple dict or multiple files)
+# build from a specific directory, and my_docs is the index name
 leann build my-docs --docs ./your_documents

 # Search your documents
@@ -611,9 +610,8 @@ We welcome more contributors! Feel free to open issues or submit PRs.

 This work is done at [**Berkeley Sky Computing Lab**](https://sky.cs.berkeley.edu/).

-## Star History
+---

-[![Star History Chart](https://api.star-history.com/svg?repos=yichuan-w/LEANN&type=Date)](https://www.star-history.com/#yichuan-w/LEANN&Date)
 <p align="center">
  <strong>⭐ Star us on GitHub if Leann is useful for your research or applications!</strong>
 </p>
--- a/packages/leann-backend-diskann/CMakeLists.txt
+++ b/packages/leann-backend-diskann/CMakeLists.txt
@@ -0,0 +1,8 @@
+# packages/leann-backend-diskann/CMakeLists.txt (simplified version)
+
+cmake_minimum_required(VERSION 3.20)
+project(leann_backend_diskann_wrapper)
+
+# Tell CMake to directly enter the DiskANN submodule and execute its own CMakeLists.txt
+# DiskANN will handle everything itself, including compiling Python bindings
+add_subdirectory(src/third_party/DiskANN)
--- a/packages/leann-backend-diskann/leann_backend_diskann/diskann_backend.py
+++ b/packages/leann-backend-diskann/leann_backend_diskann/diskann_backend.py
@@ -22,11 +22,6 @@ logger = logging.getLogger(__name__)
@contextlib.contextmanager
 def suppress_cpp_output_if_needed():
    """Suppress C++ stdout/stderr based on LEANN_LOG_LEVEL"""
-    # In CI we avoid fiddling with low-level file descriptors to prevent aborts
-    if os.getenv("CI") == "true":
-        yield
-        return
-
    log_level = os.getenv("LEANN_LOG_LEVEL", "WARNING").upper()

    # Only suppress if log level is WARNING or higher (ERROR, CRITICAL)
@@ -464,3 +459,25 @@ class DiskannSearcher(BaseSearcher):
        string_labels = [[str(int_label) for int_label in batch_labels] for batch_labels in labels]

        return {"labels": string_labels, "distances": distances}
+
+    def cleanup(self):
+        """Cleanup DiskANN-specific resources including C++ index."""
+        # Call parent cleanup first
+        super().cleanup()
+
+        # Delete the C++ index to trigger destructors
+        try:
+            if hasattr(self, "_index") and self._index is not None:
+                del self._index
+                self._index = None
+                self._current_zmq_port = None
+        except Exception:
+            pass
+
+        # Force garbage collection to ensure C++ objects are destroyed
+        try:
+            import gc
+
+            gc.collect()
+        except Exception:
+            pass
--- a/packages/leann-backend-diskann/leann_backend_diskann/diskann_embedding_server.py
+++ b/packages/leann-backend-diskann/leann_backend_diskann/diskann_embedding_server.py
@@ -100,12 +100,12 @@ def create_diskann_embedding_server(
        socket = context.socket(
            zmq.REP
        )  # REP socket for both BaseSearcher and DiskANN C++ REQ clients
+        socket.setsockopt(zmq.LINGER, 0)  # Don't block on close
        socket.bind(f"tcp://*:{zmq_port}")
        logger.info(f"DiskANN ZMQ REP server listening on port {zmq_port}")

-        socket.setsockopt(zmq.RCVTIMEO, 1000)
-        socket.setsockopt(zmq.SNDTIMEO, 1000)
-        socket.setsockopt(zmq.LINGER, 0)
+        socket.setsockopt(zmq.RCVTIMEO, 300000)
+        socket.setsockopt(zmq.SNDTIMEO, 300000)

        while True:
            try:
@@ -222,217 +222,30 @@ def create_diskann_embedding_server(
                traceback.print_exc()
                raise

-    def zmq_server_thread_with_shutdown(shutdown_event):
-        """ZMQ server thread that respects shutdown signal.
-
-        This creates its own REP socket, binds to zmq_port, and periodically
-        checks shutdown_event using recv timeouts to exit cleanly.
-        """
-        logger.info("DiskANN ZMQ server thread started with shutdown support")
-
-        context = zmq.Context()
-        rep_socket = context.socket(zmq.REP)
-        rep_socket.bind(f"tcp://*:{zmq_port}")
-        logger.info(f"DiskANN ZMQ REP server listening on port {zmq_port}")
-
-        # Set receive timeout so we can check shutdown_event periodically
-        rep_socket.setsockopt(zmq.RCVTIMEO, 1000)  # 1 second timeout
-        rep_socket.setsockopt(zmq.SNDTIMEO, 1000)
-        rep_socket.setsockopt(zmq.LINGER, 0)
-
-        try:
-            while not shutdown_event.is_set():
-                try:
-                    e2e_start = time.time()
-                    # REP socket receives single-part messages
-                    message = rep_socket.recv()
-
-                    # Check for empty messages - REP socket requires response to every request
-                    if not message:
-                        logger.warning("Received empty message, sending empty response")
-                        rep_socket.send(b"")
-                        continue
-
-                    # Try protobuf first (same logic as original)
-                    texts = []
-                    is_text_request = False
-
-                    try:
-                        req_proto = embedding_pb2.NodeEmbeddingRequest()
-                        req_proto.ParseFromString(message)
-                        node_ids = list(req_proto.node_ids)
-
-                        # Look up texts by node IDs
-                        for nid in node_ids:
-                            try:
-                                passage_data = passages.get_passage(str(nid))
-                                txt = passage_data["text"]
-                                if not txt:
-                                    raise RuntimeError(f"FATAL: Empty text for passage ID {nid}")
-                                texts.append(txt)
-                            except KeyError:
-                                raise RuntimeError(f"FATAL: Passage with ID {nid} not found")
-
-                        logger.info(f"ZMQ received protobuf request for {len(node_ids)} node IDs")
-                    except Exception:
-                        # Fallback to msgpack for text requests
-                        try:
-                            import msgpack
-
-                            request = msgpack.unpackb(message)
-                            if isinstance(request, list) and all(
-                                isinstance(item, str) for item in request
-                            ):
-                                texts = request
-                                is_text_request = True
-                                logger.info(
-                                    f"ZMQ received msgpack text request for {len(texts)} texts"
-                                )
-                            else:
-                                raise ValueError("Not a valid msgpack text request")
-                        except Exception:
-                            logger.error("Both protobuf and msgpack parsing failed!")
-                            # Send error response
-                            resp_proto = embedding_pb2.NodeEmbeddingResponse()
-                            rep_socket.send(resp_proto.SerializeToString())
-                            continue
-
-                    # Process the request
-                    embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
-                    logger.info(f"Computed embeddings shape: {embeddings.shape}")
-
-                    # Validation
-                    if np.isnan(embeddings).any() or np.isinf(embeddings).any():
-                        logger.error("NaN or Inf detected in embeddings!")
-                        # Send error response
-                        if is_text_request:
-                            import msgpack
-
-                            response_data = msgpack.packb([])
-                        else:
-                            resp_proto = embedding_pb2.NodeEmbeddingResponse()
-                            response_data = resp_proto.SerializeToString()
-                        rep_socket.send(response_data)
-                        continue
-
-                    # Prepare response based on request type
-                    if is_text_request:
-                        # For direct text requests, return msgpack
-                        import msgpack
-
-                        response_data = msgpack.packb(embeddings.tolist())
-                    else:
-                        # For protobuf requests, return protobuf
-                        resp_proto = embedding_pb2.NodeEmbeddingResponse()
-                        hidden_contiguous = np.ascontiguousarray(embeddings, dtype=np.float32)
-
-                        resp_proto.embeddings_data = hidden_contiguous.tobytes()
-                        resp_proto.dimensions.append(hidden_contiguous.shape[0])
-                        resp_proto.dimensions.append(hidden_contiguous.shape[1])
-
-                        response_data = resp_proto.SerializeToString()
-
-                    # Send response back to the client
-                    rep_socket.send(response_data)
-
-                    e2e_end = time.time()
-                    logger.info(f"⏱️  ZMQ E2E time: {e2e_end - e2e_start:.6f}s")
-
-                except zmq.Again:
-                    # Timeout - check shutdown_event and continue
-                    continue
-                except Exception as e:
-                    if not shutdown_event.is_set():
-                        logger.error(f"Error in ZMQ server loop: {e}")
-                        try:
-                            # Send error response for REP socket
-                            resp_proto = embedding_pb2.NodeEmbeddingResponse()
-                            rep_socket.send(resp_proto.SerializeToString())
-                        except Exception:
-                            pass
-                    else:
-                        logger.info("Shutdown in progress, ignoring ZMQ error")
-                        break
-        finally:
-            try:
-                rep_socket.close(0)
-            except Exception:
-                pass
-            try:
-                context.term()
-            except Exception:
-                pass
-
-        logger.info("DiskANN ZMQ server thread exiting gracefully")
-
-    # Add shutdown coordination
-    shutdown_event = threading.Event()
-
-    def shutdown_zmq_server():
-        """Gracefully shutdown ZMQ server."""
-        logger.info("Initiating graceful shutdown...")
-        shutdown_event.set()
-
-        if zmq_thread.is_alive():
-            logger.info("Waiting for ZMQ thread to finish...")
-            zmq_thread.join(timeout=5)
-            if zmq_thread.is_alive():
-                logger.warning("ZMQ thread did not finish in time")
-
-        # Clean up ZMQ resources
-        try:
-            # Note: socket and context are cleaned up by thread exit
-            logger.info("ZMQ resources cleaned up")
-        except Exception as e:
-            logger.warning(f"Error cleaning ZMQ resources: {e}")
-
-        # Clean up other resources
-        try:
-            import gc
-
-            gc.collect()
-            logger.info("Additional resources cleaned up")
-        except Exception as e:
-            logger.warning(f"Error cleaning additional resources: {e}")
-
-        logger.info("Graceful shutdown completed")
-        sys.exit(0)
-
-    # Register signal handlers within this function scope
-    import signal
-
-    def signal_handler(sig, frame):
-        logger.info(f"Received signal {sig}, shutting down gracefully...")
-        shutdown_zmq_server()
-
-    signal.signal(signal.SIGTERM, signal_handler)
-    signal.signal(signal.SIGINT, signal_handler)
-
-    # Start ZMQ thread (NOT daemon!)
-    zmq_thread = threading.Thread(
-        target=lambda: zmq_server_thread_with_shutdown(shutdown_event),
-        daemon=False,  # Not daemon - we want to wait for it
-    )
+    zmq_thread = threading.Thread(target=zmq_server_thread, daemon=True)
    zmq_thread.start()
    logger.info(f"Started DiskANN ZMQ server thread on port {zmq_port}")

    # Keep the main thread alive
    try:
-        while not shutdown_event.is_set():
-            time.sleep(0.1)  # Check shutdown more frequently
+        while True:
+            time.sleep(1)
    except KeyboardInterrupt:
        logger.info("DiskANN Server shutting down...")
-        shutdown_zmq_server()
        return

-    # If we reach here, shutdown was triggered by signal
-    logger.info("Main loop exited, process should be shutting down")
-

 if __name__ == "__main__":
+    import signal
    import sys

-    # Signal handlers are now registered within create_diskann_embedding_server
+    def signal_handler(sig, frame):
+        logger.info(f"Received signal {sig}, shutting down gracefully...")
+        sys.exit(0)
+
+    # Register signal handlers for graceful shutdown
+    signal.signal(signal.SIGTERM, signal_handler)
+    signal.signal(signal.SIGINT, signal_handler)

    parser = argparse.ArgumentParser(description="DiskANN Embedding service")
    parser.add_argument("--zmq-port", type=int, default=5555, help="ZMQ port to run on")
--- a/packages/leann-backend-diskann/leann_backend_diskann/graph_partition_simple.py
+++ b/packages/leann-backend-diskann/leann_backend_diskann/graph_partition_simple.py
@@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+"""
+Simplified Graph Partition Module for LEANN DiskANN Backend
+
+This module provides a simple Python interface for graph partitioning
+that directly calls the existing executables.
+"""
+
+import os
+import subprocess
+import tempfile
+from pathlib import Path
+from typing import Optional
+
+
+def partition_graph_simple(
+    index_prefix_path: str, output_dir: Optional[str] = None, **kwargs
+) -> tuple[str, str]:
+    """
+    Simple function to partition a graph index.
+
+    Args:
+        index_prefix_path: Path to the index prefix (e.g., "/path/to/index")
+        output_dir: Output directory (defaults to parent of index_prefix_path)
+        **kwargs: Additional parameters for graph partitioning
+
+    Returns:
+        Tuple of (disk_graph_index_path, partition_bin_path)
+    """
+    # Set default parameters
+    params = {
+        "gp_times": 10,
+        "lock_nums": 10,
+        "cut": 100,
+        "scale_factor": 1,
+        "data_type": "float",
+        "thread_nums": 10,
+        **kwargs,
+    }
+
+    # Determine output directory
+    if output_dir is None:
+        output_dir = str(Path(index_prefix_path).parent)
+
+    # Find the graph_partition directory
+    current_file = Path(__file__)
+    graph_partition_dir = current_file.parent.parent / "third_party" / "DiskANN" / "graph_partition"
+
+    if not graph_partition_dir.exists():
+        raise RuntimeError(f"Graph partition directory not found: {graph_partition_dir}")
+
+    # Find input index file
+    old_index_file = f"{index_prefix_path}_disk_beam_search.index"
+    if not os.path.exists(old_index_file):
+        old_index_file = f"{index_prefix_path}_disk.index"
+
+    if not os.path.exists(old_index_file):
+        raise RuntimeError(f"Index file not found: {old_index_file}")
+
+    # Create temporary directory for processing
+    with tempfile.TemporaryDirectory() as temp_dir:
+        temp_data_dir = Path(temp_dir) / "data"
+        temp_data_dir.mkdir(parents=True, exist_ok=True)
+
+        # Set up paths for temporary files
+        graph_path = temp_data_dir / "starling" / "_M_R_L_B" / "GRAPH"
+        graph_gp_path = (
+            graph_path
+            / f"GP_TIMES_{params['gp_times']}_LOCK_{params['lock_nums']}_GP_USE_FREQ0_CUT{params['cut']}_SCALE{params['scale_factor']}"
+        )
+        graph_gp_path.mkdir(parents=True, exist_ok=True)
+
+        # Run the build script with our parameters
+        cmd = [str(graph_partition_dir / "build.sh"), "release", "split_graph", index_prefix_path]
+
+        # Set environment variables for parameters
+        env = os.environ.copy()
+        env.update(
+            {
+                "GP_TIMES": str(params["gp_times"]),
+                "GP_LOCK_NUMS": str(params["lock_nums"]),
+                "GP_CUT": str(params["cut"]),
+                "GP_SCALE_F": str(params["scale_factor"]),
+                "DATA_TYPE": params["data_type"],
+                "GP_T": str(params["thread_nums"]),
+            }
+        )
+
+        print(f"Running graph partition with command: {' '.join(cmd)}")
+        print(f"Working directory: {graph_partition_dir}")
+
+        # Run the command
+        result = subprocess.run(
+            cmd, env=env, capture_output=True, text=True, cwd=graph_partition_dir
+        )
+
+        if result.returncode != 0:
+            print(f"Command failed with return code {result.returncode}")
+            print(f"stdout: {result.stdout}")
+            print(f"stderr: {result.stderr}")
+            raise RuntimeError(
+                f"Graph partitioning failed with return code {result.returncode}.\n"
+                f"stdout: {result.stdout}\n"
+                f"stderr: {result.stderr}"
+            )
+
+        # Check if output files were created
+        disk_graph_path = Path(output_dir) / "_disk_graph.index"
+        partition_bin_path = Path(output_dir) / "_partition.bin"
+
+        if not disk_graph_path.exists():
+            raise RuntimeError(f"Expected output file not found: {disk_graph_path}")
+
+        if not partition_bin_path.exists():
+            raise RuntimeError(f"Expected output file not found: {partition_bin_path}")
+
+        print("✅ Partitioning completed successfully!")
+        print(f"   Disk graph index: {disk_graph_path}")
+        print(f"   Partition binary: {partition_bin_path}")
+
+        return str(disk_graph_path), str(partition_bin_path)
+
+
+# Example usage
+if __name__ == "__main__":
+    try:
+        disk_graph_path, partition_bin_path = partition_graph_simple(
+            "/Users/yichuan/Desktop/release2/leann/diskannbuild/test_doc_files",
+            gp_times=5,
+            lock_nums=5,
+            cut=50,
+        )
+        print("Success! Output files:")
+        print(f"  - {disk_graph_path}")
+        print(f"  - {partition_bin_path}")
+    except Exception as e:
+        print(f"Error: {e}")
--- a/packages/leann-backend-diskann/pyproject.toml
+++ b/packages/leann-backend-diskann/pyproject.toml
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-diskann"
-version = "0.2.8"
-dependencies = ["leann-core==0.2.8", "numpy", "protobuf>=3.19.0"]
+version = "0.2.7"
+dependencies = ["leann-core==0.2.7", "numpy", "protobuf>=3.19.0"]

 [tool.scikit-build]
 # Key: simplified CMake path
@@ -17,5 +17,3 @@ editable.mode = "redirect"
 cmake.build-type = "Release"
 build.verbose = true
 build.tool-args = ["-j8"]
-# Let CMake find packages via Homebrew prefix
-cmake.define = {CMAKE_PREFIX_PATH = {env = "CMAKE_PREFIX_PATH"}, OpenMP_ROOT = {env = "OpenMP_ROOT"}}
--- a/packages/leann-backend-diskann/third_party/DiskANN
+++ b/packages/leann-backend-diskann/third_party/DiskANN
--- a/packages/leann-backend-hnsw/CMakeLists.txt
+++ b/packages/leann-backend-hnsw/CMakeLists.txt
@@ -5,20 +5,11 @@ set(CMAKE_CXX_COMPILER_WORKS 1)

 # Set OpenMP path for macOS
 if(APPLE)
-    # Detect Homebrew installation path (Apple Silicon vs Intel)
-    if(EXISTS "/opt/homebrew/opt/libomp")
-        set(HOMEBREW_PREFIX "/opt/homebrew")
-    elseif(EXISTS "/usr/local/opt/libomp")
-        set(HOMEBREW_PREFIX "/usr/local")
-    else()
-        message(FATAL_ERROR "Could not find libomp installation. Please install with: brew install libomp")
-    endif()
-
-    set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -I${HOMEBREW_PREFIX}/opt/libomp/include")
-    set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -I${HOMEBREW_PREFIX}/opt/libomp/include")
+    set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include")
+    set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -I/opt/homebrew/opt/libomp/include")
    set(OpenMP_C_LIB_NAMES "omp")
    set(OpenMP_CXX_LIB_NAMES "omp")
-    set(OpenMP_omp_LIBRARY "${HOMEBREW_PREFIX}/opt/libomp/lib/libomp.dylib")
+    set(OpenMP_omp_LIBRARY "/opt/homebrew/opt/libomp/lib/libomp.dylib")

    # Force use of system libc++ to avoid version mismatch
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libc++")
--- a/packages/leann-backend-hnsw/leann_backend_hnsw/convert_to_csr.py
+++ b/packages/leann-backend-hnsw/leann_backend_hnsw/convert_to_csr.py
@@ -250,7 +250,11 @@ def convert_hnsw_graph_to_csr(input_filename, output_filename, prune_embeddings=
        output_filename: Output CSR index file
        prune_embeddings: Whether to prune embedding storage (write NULL storage marker)
    """
-    # Keep prints simple; rely on CI runner to flush output as needed
+    # Disable buffering for print statements to avoid deadlock in CI/pytest
+    import functools
+
+    global print
+    print = functools.partial(print, flush=True)

    print(f"Starting conversion: {input_filename} -> {output_filename}")
    start_time = time.time()
--- a/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_backend.py
+++ b/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_backend.py
@@ -245,3 +245,25 @@ class HNSWSearcher(BaseSearcher):
        string_labels = [[str(int_label) for int_label in batch_labels] for batch_labels in labels]

        return {"labels": string_labels, "distances": distances}
+
+    def cleanup(self):
+        """Cleanup HNSW-specific resources including C++ ZMQ connections."""
+        # Call parent cleanup first
+        super().cleanup()
+
+        # Additional cleanup for C++ side ZMQ connections
+        # The ZmqDistanceComputer in C++ uses ZMQ connections that need cleanup
+        try:
+            # Delete the index to trigger C++ destructors
+            if hasattr(self, "index"):
+                del self.index
+        except Exception:
+            pass
+
+        # Force garbage collection to ensure C++ objects are destroyed
+        try:
+            import gc
+
+            gc.collect()
+        except Exception:
+            pass
--- a/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
+++ b/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
@@ -82,317 +82,189 @@ def create_hnsw_embedding_server(
    with open(passages_file) as f:
        meta = json.load(f)

-    # Let PassageManager handle path resolution uniformly. It supports fallback order:
-    # 1) path/index_path; 2) *_relative; 3) standard siblings next to meta
+    # Let PassageManager handle path resolution uniformly
    passages = PassageManager(meta["passage_sources"], metadata_file_path=passages_file)
-    # Dimension from metadata for shaping responses
-    try:
-        embedding_dim: int = int(meta.get("dimensions", 0))
-    except Exception:
-        embedding_dim = 0
    logger.info(
        f"Loaded PassageManager with {len(passages.global_offset_map)} passages from metadata"
    )

-    # (legacy ZMQ thread removed; using shutdown-capable server only)
-
-    def zmq_server_thread_with_shutdown(shutdown_event):
-        """ZMQ server thread that respects shutdown signal.
-
-        Creates its own REP socket bound to zmq_port and polls with timeouts
-        to allow graceful shutdown.
-        """
-        logger.info("ZMQ server thread started with shutdown support")
-
+    def zmq_server_thread():
+        """ZMQ server thread"""
        context = zmq.Context()
-        rep_socket = context.socket(zmq.REP)
-        rep_socket.bind(f"tcp://*:{zmq_port}")
-        logger.info(f"HNSW ZMQ REP server listening on port {zmq_port}")
-        rep_socket.setsockopt(zmq.RCVTIMEO, 1000)
-        # Keep sends from blocking during shutdown; fail fast and drop on close
-        rep_socket.setsockopt(zmq.SNDTIMEO, 1000)
-        rep_socket.setsockopt(zmq.LINGER, 0)
+        socket = context.socket(zmq.REP)
+        socket.setsockopt(zmq.LINGER, 0)  # Don't block on close
+        socket.bind(f"tcp://*:{zmq_port}")
+        logger.info(f"HNSW ZMQ server listening on port {zmq_port}")

-        # Track last request type/length for shape-correct fallbacks
-        last_request_type = "unknown"  # 'text' | 'distance' | 'embedding' | 'unknown'
-        last_request_length = 0
+        socket.setsockopt(zmq.RCVTIMEO, 300000)
+        socket.setsockopt(zmq.SNDTIMEO, 300000)

-        try:
-            while not shutdown_event.is_set():
-                try:
-                    e2e_start = time.time()
-                    logger.debug("🔍 Waiting for ZMQ message...")
-                    request_bytes = rep_socket.recv()
+        while True:
+            try:
+                message_bytes = socket.recv()
+                logger.debug(f"Received ZMQ request of size {len(message_bytes)} bytes")

-                    # Rest of the processing logic (same as original)
-                    request = msgpack.unpackb(request_bytes)
+                e2e_start = time.time()
+                request_payload = msgpack.unpackb(message_bytes)

-                    if len(request) == 1 and request[0] == "__QUERY_MODEL__":
-                        response_bytes = msgpack.packb([model_name])
-                        rep_socket.send(response_bytes)
-                        continue
+                # Handle direct text embedding request
+                if isinstance(request_payload, list) and len(request_payload) > 0:
+                    # Check if this is a direct text request (list of strings)
+                    if all(isinstance(item, str) for item in request_payload):
+                        logger.info(
+                            f"Processing direct text embedding request for {len(request_payload)} texts in {embedding_mode} mode"
+                        )

-                    # Handle direct text embedding request
-                    if (
-                        isinstance(request, list)
-                        and request
-                        and all(isinstance(item, str) for item in request)
-                    ):
-                        last_request_type = "text"
-                        last_request_length = len(request)
-                        embeddings = compute_embeddings(request, model_name, mode=embedding_mode)
-                        rep_socket.send(msgpack.packb(embeddings.tolist()))
+                        # Use unified embedding computation (now with model caching)
+                        embeddings = compute_embeddings(
+                            request_payload, model_name, mode=embedding_mode
+                        )
+
+                        response = embeddings.tolist()
+                        socket.send(msgpack.packb(response))
                        e2e_end = time.time()
                        logger.info(f"⏱️  Text embedding E2E time: {e2e_end - e2e_start:.6f}s")
                        continue

-                    # Handle distance calculation request: [[ids], [query_vector]]
-                    if (
-                        isinstance(request, list)
-                        and len(request) == 2
-                        and isinstance(request[0], list)
-                        and isinstance(request[1], list)
-                    ):
-                        node_ids = request[0]
-                        # Handle nested [[ids]] shape defensively
-                        if len(node_ids) == 1 and isinstance(node_ids[0], list):
-                            node_ids = node_ids[0]
-                        query_vector = np.array(request[1], dtype=np.float32)
-                        last_request_type = "distance"
-                        last_request_length = len(node_ids)
+                # Handle distance calculation requests
+                if (
+                    isinstance(request_payload, list)
+                    and len(request_payload) == 2
+                    and isinstance(request_payload[0], list)
+                    and isinstance(request_payload[1], list)
+                ):
+                    node_ids = request_payload[0]
+                    query_vector = np.array(request_payload[1], dtype=np.float32)

-                        logger.debug("Distance calculation request received")
-                        logger.debug(f"    Node IDs: {node_ids}")
-                        logger.debug(f"    Query vector dim: {len(query_vector)}")
+                    logger.debug("Distance calculation request received")
+                    logger.debug(f"    Node IDs: {node_ids}")
+                    logger.debug(f"    Query vector dim: {len(query_vector)}")

-                        # Gather texts for found ids
-                        texts: list[str] = []
-                        found_indices: list[int] = []
-                        for idx, nid in enumerate(node_ids):
-                            try:
-                                passage_data = passages.get_passage(str(nid))
-                                txt = passage_data.get("text", "")
-                                if isinstance(txt, str) and len(txt) > 0:
-                                    texts.append(txt)
-                                    found_indices.append(idx)
-                                else:
-                                    logger.error(f"Empty text for passage ID {nid}")
-                            except KeyError:
-                                logger.error(f"Passage ID {nid} not found")
-                            except Exception as e:
-                                logger.error(f"Exception looking up passage ID {nid}: {e}")
-
-                        # Prepare full-length response with large sentinel values
-                        large_distance = 1e9
-                        response_distances = [large_distance] * len(node_ids)
-
-                        if texts:
-                            try:
-                                embeddings = compute_embeddings(
-                                    texts, model_name, mode=embedding_mode
-                                )
-                                logger.info(
-                                    f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
-                                )
-                                if distance_metric == "l2":
-                                    partial = np.sum(
-                                        np.square(embeddings - query_vector.reshape(1, -1)), axis=1
-                                    )
-                                else:  # mips or cosine
-                                    partial = -np.dot(embeddings, query_vector)
-
-                                for pos, dval in zip(found_indices, partial.flatten().tolist()):
-                                    response_distances[pos] = float(dval)
-                            except Exception as e:
-                                logger.error(f"Distance computation error, using sentinels: {e}")
-
-                        # Send response in expected shape [[distances]]
-                        rep_socket.send(msgpack.packb([response_distances], use_single_float=True))
-                        e2e_end = time.time()
-                        logger.info(f"⏱️  Distance calculation E2E time: {e2e_end - e2e_start:.6f}s")
-                        continue
-
-                    # Fallback: treat as embedding-by-id request
-                    if (
-                        isinstance(request, list)
-                        and len(request) == 1
-                        and isinstance(request[0], list)
-                    ):
-                        node_ids = request[0]
-                    elif isinstance(request, list):
-                        node_ids = request
-                    else:
-                        node_ids = []
-                    last_request_type = "embedding"
-                    last_request_length = len(node_ids)
-                    logger.info(f"ZMQ received {len(node_ids)} node IDs for embedding fetch")
-
-                    # Preallocate zero-filled flat data for robustness
-                    if embedding_dim <= 0:
-                        dims = [0, 0]
-                        flat_data: list[float] = []
-                    else:
-                        dims = [len(node_ids), embedding_dim]
-                        flat_data = [0.0] * (dims[0] * dims[1])
-
-                    # Collect texts for found ids
-                    texts: list[str] = []
-                    found_indices: list[int] = []
-                    for idx, nid in enumerate(node_ids):
+                    # Get embeddings for node IDs
+                    texts = []
+                    for nid in node_ids:
                        try:
                            passage_data = passages.get_passage(str(nid))
-                            txt = passage_data.get("text", "")
-                            if isinstance(txt, str) and len(txt) > 0:
-                                texts.append(txt)
-                                found_indices.append(idx)
-                            else:
-                                logger.error(f"Empty text for passage ID {nid}")
+                            txt = passage_data["text"]
+                            texts.append(txt)
                        except KeyError:
-                            logger.error(f"Passage with ID {nid} not found")
+                            logger.error(f"Passage ID {nid} not found")
+                            raise RuntimeError(f"FATAL: Passage with ID {nid} not found")
                        except Exception as e:
                            logger.error(f"Exception looking up passage ID {nid}: {e}")
+                            raise

-                    if texts:
-                        try:
-                            embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
-                            logger.info(
-                                f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
-                            )
+                    # Process embeddings
+                    embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                    logger.info(
+                        f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
+                    )

-                            if np.isnan(embeddings).any() or np.isinf(embeddings).any():
-                                logger.error(
-                                    f"NaN or Inf detected in embeddings! Requested IDs: {node_ids[:5]}..."
-                                )
-                                dims = [0, embedding_dim]
-                                flat_data = []
-                            else:
-                                emb_f32 = np.ascontiguousarray(embeddings, dtype=np.float32)
-                                flat = emb_f32.flatten().tolist()
-                                for j, pos in enumerate(found_indices):
-                                    start = pos * embedding_dim
-                                    end = start + embedding_dim
-                                    if end <= len(flat_data):
-                                        flat_data[start:end] = flat[
-                                            j * embedding_dim : (j + 1) * embedding_dim
-                                        ]
-                        except Exception as e:
-                            logger.error(f"Embedding computation error, returning zeros: {e}")
+                    # Calculate distances
+                    if distance_metric == "l2":
+                        distances = np.sum(
+                            np.square(embeddings - query_vector.reshape(1, -1)), axis=1
+                        )
+                    else:  # mips or cosine
+                        distances = -np.dot(embeddings, query_vector)

-                    response_payload = [dims, flat_data]
-                    response_bytes = msgpack.packb(response_payload, use_single_float=True)
+                    response_payload = distances.flatten().tolist()
+                    response_bytes = msgpack.packb([response_payload], use_single_float=True)
+                    logger.debug(f"Sending distance response with {len(distances)} distances")

-                    rep_socket.send(response_bytes)
+                    socket.send(response_bytes)
                    e2e_end = time.time()
-                    logger.info(f"⏱️  ZMQ E2E time: {e2e_end - e2e_start:.6f}s")
-
-                except zmq.Again:
-                    # Timeout - check shutdown_event and continue
+                    logger.info(f"⏱️  Distance calculation E2E time: {e2e_end - e2e_start:.6f}s")
                    continue
-                except Exception as e:
-                    if not shutdown_event.is_set():
-                        logger.error(f"Error in ZMQ server loop: {e}")
-                        # Shape-correct fallback
-                        try:
-                            if last_request_type == "distance":
-                                large_distance = 1e9
-                                fallback_len = max(0, int(last_request_length))
-                                safe = [[large_distance] * fallback_len]
-                            elif last_request_type == "embedding":
-                                bsz = max(0, int(last_request_length))
-                                dim = max(0, int(embedding_dim))
-                                safe = (
-                                    [[bsz, dim], [0.0] * (bsz * dim)] if dim > 0 else [[0, 0], []]
-                                )
-                            elif last_request_type == "text":
-                                safe = []  # direct text embeddings expectation is a flat list
-                            else:
-                                safe = [[0, int(embedding_dim) if embedding_dim > 0 else 0], []]
-                            rep_socket.send(msgpack.packb(safe, use_single_float=True))
-                        except Exception:
-                            pass
-                    else:
-                        logger.info("Shutdown in progress, ignoring ZMQ error")
-                        break
-        finally:
-            try:
-                rep_socket.close(0)
-            except Exception:
-                pass
-            try:
-                context.term()
-            except Exception:
-                pass

-        logger.info("ZMQ server thread exiting gracefully")
+                # Standard embedding request (passage ID lookup)
+                if (
+                    not isinstance(request_payload, list)
+                    or len(request_payload) != 1
+                    or not isinstance(request_payload[0], list)
+                ):
+                    logger.error(
+                        f"Invalid MessagePack request format. Expected [[ids...]] or [texts...], got: {type(request_payload)}"
+                    )
+                    socket.send(msgpack.packb([[], []]))
+                    continue

-    # Add shutdown coordination
-    shutdown_event = threading.Event()
+                node_ids = request_payload[0]
+                logger.debug(f"Request for {len(node_ids)} node embeddings")

-    def shutdown_zmq_server():
-        """Gracefully shutdown ZMQ server."""
-        logger.info("Initiating graceful shutdown...")
-        shutdown_event.set()
+                # Look up texts by node IDs
+                texts = []
+                for nid in node_ids:
+                    try:
+                        passage_data = passages.get_passage(str(nid))
+                        txt = passage_data["text"]
+                        if not txt:
+                            raise RuntimeError(f"FATAL: Empty text for passage ID {nid}")
+                        texts.append(txt)
+                    except KeyError:
+                        raise RuntimeError(f"FATAL: Passage with ID {nid} not found")
+                    except Exception as e:
+                        logger.error(f"Exception looking up passage ID {nid}: {e}")
+                        raise

-        if zmq_thread.is_alive():
-            logger.info("Waiting for ZMQ thread to finish...")
-            zmq_thread.join(timeout=5)
-            if zmq_thread.is_alive():
-                logger.warning("ZMQ thread did not finish in time")
+                # Process embeddings
+                embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                logger.info(
+                    f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
+                )

-        # Clean up ZMQ resources
-        try:
-            # Note: socket and context are cleaned up by thread exit
-            logger.info("ZMQ resources cleaned up")
-        except Exception as e:
-            logger.warning(f"Error cleaning ZMQ resources: {e}")
+                # Serialization and response
+                if np.isnan(embeddings).any() or np.isinf(embeddings).any():
+                    logger.error(
+                        f"NaN or Inf detected in embeddings! Requested IDs: {node_ids[:5]}..."
+                    )
+                    raise AssertionError()

-        # Clean up other resources
-        try:
-            import gc
+                hidden_contiguous_f32 = np.ascontiguousarray(embeddings, dtype=np.float32)
+                response_payload = [
+                    list(hidden_contiguous_f32.shape),
+                    hidden_contiguous_f32.flatten().tolist(),
+                ]
+                response_bytes = msgpack.packb(response_payload, use_single_float=True)

-            gc.collect()
-            logger.info("Additional resources cleaned up")
-        except Exception as e:
-            logger.warning(f"Error cleaning additional resources: {e}")
+                socket.send(response_bytes)
+                e2e_end = time.time()
+                logger.info(f"⏱️  ZMQ E2E time: {e2e_end - e2e_start:.6f}s")

-        logger.info("Graceful shutdown completed")
-        sys.exit(0)
+            except zmq.Again:
+                logger.debug("ZMQ socket timeout, continuing to listen")
+                continue
+            except Exception as e:
+                logger.error(f"Error in ZMQ server loop: {e}")
+                import traceback

-    # Register signal handlers within this function scope
-    import signal
+                traceback.print_exc()
+                socket.send(msgpack.packb([[], []]))

-    def signal_handler(sig, frame):
-        logger.info(f"Received signal {sig}, shutting down gracefully...")
-        shutdown_zmq_server()
-
-    signal.signal(signal.SIGTERM, signal_handler)
-    signal.signal(signal.SIGINT, signal_handler)
-
-    # Pass shutdown_event to ZMQ thread
-    zmq_thread = threading.Thread(
-        target=lambda: zmq_server_thread_with_shutdown(shutdown_event),
-        daemon=False,  # Not daemon - we want to wait for it
-    )
+    zmq_thread = threading.Thread(target=zmq_server_thread, daemon=True)
    zmq_thread.start()
    logger.info(f"Started HNSW ZMQ server thread on port {zmq_port}")

    # Keep the main thread alive
    try:
-        while not shutdown_event.is_set():
-            time.sleep(0.1)  # Check shutdown more frequently
+        while True:
+            time.sleep(1)
    except KeyboardInterrupt:
        logger.info("HNSW Server shutting down...")
-        shutdown_zmq_server()
        return

-    # If we reach here, shutdown was triggered by signal
-    logger.info("Main loop exited, process should be shutting down")
-

 if __name__ == "__main__":
+    import signal
    import sys

-    # Signal handlers are now registered within create_hnsw_embedding_server
+    def signal_handler(sig, frame):
+        logger.info(f"Received signal {sig}, shutting down gracefully...")
+        sys.exit(0)
+
+    # Register signal handlers for graceful shutdown
+    signal.signal(signal.SIGTERM, signal_handler)
+    signal.signal(signal.SIGINT, signal_handler)

    parser = argparse.ArgumentParser(description="HNSW Embedding service")
    parser.add_argument("--zmq-port", type=int, default=5555, help="ZMQ port to run on")
--- a/packages/leann-backend-hnsw/pyproject.toml
+++ b/packages/leann-backend-hnsw/pyproject.toml
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-hnsw"
-version = "0.2.8"
+version = "0.2.7"
 description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
 dependencies = [
-    "leann-core==0.2.8",
+    "leann-core==0.2.7",
    "numpy",
    "pyzmq>=23.0.0",
    "msgpack>=1.0.0",
@@ -22,8 +22,6 @@ cmake.build-type = "Release"
 build.verbose = true
 build.tool-args = ["-j8"]

-# CMake definitions to optimize compilation and find Homebrew packages
+# CMake definitions to optimize compilation
 [tool.scikit-build.cmake.define]
 CMAKE_BUILD_PARALLEL_LEVEL = "8"
-CMAKE_PREFIX_PATH = {env = "CMAKE_PREFIX_PATH"}
-OpenMP_ROOT = {env = "OpenMP_ROOT"}
--- a/packages/leann-backend-hnsw/third_party/faiss
+++ b/packages/leann-backend-hnsw/third_party/faiss
--- a/packages/leann-core/pyproject.toml
+++ b/packages/leann-core/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann-core"
-version = "0.2.8"
+version = "0.2.7"
 description = "Core API and plugin system for LEANN"
 readme = "README.md"
 requires-python = ">=3.9"
@@ -33,8 +33,8 @@ dependencies = [
    "pdfplumber>=0.10.0",
    "nbconvert>=7.0.0",  # For .ipynb file support
    "gitignore-parser>=0.1.12",  # For proper .gitignore handling
-    "mlx>=0.26.3; sys_platform == 'darwin' and platform_machine == 'arm64'",
-    "mlx-lm>=0.26.0; sys_platform == 'darwin' and platform_machine == 'arm64'",
+    "mlx>=0.26.3; sys_platform == 'darwin'",
+    "mlx-lm>=0.26.0; sys_platform == 'darwin'",
 ]

 [project.optional-dependencies]
--- a/packages/leann-core/src/leann/api.py
+++ b/packages/leann-core/src/leann/api.py
@@ -87,21 +87,26 @@ def compute_embeddings_via_server(chunks: list[str], model_name: str, port: int)
    # Connect to embedding server
    context = zmq.Context()
    socket = context.socket(zmq.REQ)
+    socket.setsockopt(zmq.LINGER, 0)  # Don't block on close
+    socket.setsockopt(zmq.RCVTIMEO, 1000)  # 1s timeout on receive
+    socket.setsockopt(zmq.SNDTIMEO, 1000)  # 1s timeout on send
+    socket.setsockopt(zmq.IMMEDIATE, 1)  # Don't wait for connection
    socket.connect(f"tcp://localhost:{port}")

-    # Send chunks to server for embedding computation
-    request = chunks
-    socket.send(msgpack.packb(request))
+    try:
+        # Send chunks to server for embedding computation
+        request = chunks
+        socket.send(msgpack.packb(request))

-    # Receive embeddings from server
-    response = socket.recv()
-    embeddings_list = msgpack.unpackb(response)
+        # Receive embeddings from server
+        response = socket.recv()
+        embeddings_list = msgpack.unpackb(response)

-    # Convert back to numpy array
-    embeddings = np.array(embeddings_list, dtype=np.float32)
-
-    socket.close()
-    context.term()
+        # Convert back to numpy array
+        embeddings = np.array(embeddings_list, dtype=np.float32)
+    finally:
+        socket.close(linger=0)
+        context.term()

    return embeddings

@@ -122,55 +127,31 @@ class PassageManager:
        self.passage_files = {}
        self.global_offset_map = {}  # Combined map for fast lookup

-        # Derive index base name for standard sibling fallbacks, e.g., <index_name>.passages.*
-        index_name_base = None
-        if metadata_file_path:
-            meta_name = Path(metadata_file_path).name
-            if meta_name.endswith(".meta.json"):
-                index_name_base = meta_name[: -len(".meta.json")]
-
        for source in passage_sources:
            assert source["type"] == "jsonl", "only jsonl is supported"
-            passage_file = source.get("path", "")
-            index_file = source.get("index_path", "")  # .idx file
+            passage_file = source["path"]
+            index_file = source["index_path"]  # .idx file

            # Fix path resolution - relative paths should be relative to metadata file directory
-            def _resolve_candidates(
-                primary: str,
-                relative_key: str,
-                default_name: Optional[str],
-                source_dict: dict[str, Any],
-            ) -> list[Path]:
-                candidates: list[Path] = []
-                # 1) Primary as-is (absolute or relative)
-                if primary:
-                    p = Path(primary)
-                    candidates.append(p if p.is_absolute() else (Path.cwd() / p))
-                # 2) metadata-relative explicit relative key
-                if metadata_file_path and source_dict.get(relative_key):
-                    candidates.append(Path(metadata_file_path).parent / source_dict[relative_key])
-                # 3) metadata-relative standard sibling filename
-                if metadata_file_path and default_name:
-                    candidates.append(Path(metadata_file_path).parent / default_name)
-                return candidates
-
-            # Build candidate lists and pick first existing; otherwise keep last candidate for error message
-            idx_default = f"{index_name_base}.passages.idx" if index_name_base else None
-            idx_candidates = _resolve_candidates(
-                index_file, "index_path_relative", idx_default, source
-            )
-            pas_default = f"{index_name_base}.passages.jsonl" if index_name_base else None
-            pas_candidates = _resolve_candidates(passage_file, "path_relative", pas_default, source)
-
-            def _pick_existing(cands: list[Path]) -> str:
-                for c in cands:
-                    if c.exists():
-                        return str(c.resolve())
-                # Fallback to last candidate (best guess) even if not exists; will error below
-                return str(cands[-1].resolve()) if cands else ""
-
-            index_file = _pick_existing(idx_candidates)
-            passage_file = _pick_existing(pas_candidates)
+            if not Path(index_file).is_absolute():
+                if metadata_file_path:
+                    # Resolve relative to metadata file directory
+                    metadata_dir = Path(metadata_file_path).parent
+                    logger.debug(
+                        f"PassageManager: Resolving relative paths from metadata_dir: {metadata_dir}"
+                    )
+                    index_file = str((metadata_dir / index_file).resolve())
+                    passage_file = str((metadata_dir / passage_file).resolve())
+                    logger.debug(f"PassageManager: Resolved index_file: {index_file}")
+                else:
+                    # Fallback to current directory resolution (legacy behavior)
+                    logger.warning(
+                        "PassageManager: No metadata_file_path provided, using fallback resolution from cwd"
+                    )
+                    logger.debug(f"PassageManager: Current working directory: {Path.cwd()}")
+                    index_file = str(Path(index_file).resolve())
+                    passage_file = str(Path(passage_file).resolve())
+                    logger.debug(f"PassageManager: Fallback resolved index_file: {index_file}")

            if not Path(index_file).exists():
                raise FileNotFoundError(f"Passage index file not found: {index_file}")
@@ -356,12 +337,8 @@ class LeannBuilder:
            "passage_sources": [
                {
                    "type": "jsonl",
-                    # Preserve existing relative file names (backward-compatible)
-                    "path": passages_file.name,
-                    "index_path": offset_file.name,
-                    # Add optional redundant relative keys for remote build portability (non-breaking)
-                    "path_relative": passages_file.name,
-                    "index_path_relative": offset_file.name,
+                    "path": passages_file.name,  # Use relative path (just filename)
+                    "index_path": offset_file.name,  # Use relative path (just filename)
                }
            ],
        }
@@ -476,12 +453,8 @@ class LeannBuilder:
            "passage_sources": [
                {
                    "type": "jsonl",
-                    # Preserve existing relative file names (backward-compatible)
-                    "path": passages_file.name,
-                    "index_path": offset_file.name,
-                    # Add optional redundant relative keys for remote build portability (non-breaking)
-                    "path_relative": passages_file.name,
-                    "index_path_relative": offset_file.name,
+                    "path": passages_file.name,  # Use relative path (just filename)
+                    "index_path": offset_file.name,  # Use relative path (just filename)
                }
            ],
            "built_from_precomputed_embeddings": True,
@@ -644,14 +617,27 @@ class LeannSearcher:
        return enriched_results

    def cleanup(self):
-        """Explicitly cleanup embedding server resources.
+        """Explicitly cleanup embedding server and ZMQ resources.

        This method should be called after you're done using the searcher,
        especially in test environments or batch processing scenarios.
        """
+        # Stop embedding server
        if hasattr(self.backend_impl, "embedding_server_manager"):
            self.backend_impl.embedding_server_manager.stop_server()

+        # Set ZMQ linger but don't terminate global context
+        try:
+            import zmq
+
+            # Just set linger on the global instance
+            ctx = zmq.Context.instance()
+            ctx.linger = 0
+            # NEVER call ctx.term() or destroy() on the global instance
+            # That would block waiting for all sockets to close
+        except Exception:
+            pass
+

 class LeannChat:
    def __init__(
--- a/packages/leann-core/src/leann/cli.py
+++ b/packages/leann-core/src/leann/cli.py
@@ -1,11 +1,9 @@
 import argparse
 import asyncio
 from pathlib import Path
-from typing import Union

 from llama_index.core import SimpleDirectoryReader
 from llama_index.core.node_parser import SentenceSplitter
-from tqdm import tqdm

 from .api import LeannBuilder, LeannChat, LeannSearcher

@@ -76,14 +74,11 @@ class LeannCLI:
            formatter_class=argparse.RawDescriptionHelpFormatter,
            epilog="""
 Examples:
-  leann build my-docs --docs ./documents                                  # Build index from directory
-  leann build my-code --docs ./src ./tests ./config                      # Build index from multiple directories
-  leann build my-files --docs ./file1.py ./file2.txt ./docs/             # Build index from files and directories
-  leann build my-mixed --docs ./readme.md ./src/ ./config.json           # Build index from mixed files/dirs
-  leann build my-ppts --docs ./ --file-types .pptx,.pdf                  # Index only PowerPoint and PDF files
-  leann search my-docs "query"                                           # Search in my-docs index
-  leann ask my-docs "question"                                           # Ask my-docs index
-  leann list                                                             # List all stored indexes
+  leann build my-docs --docs ./documents                    # Build index named my-docs
+  leann build my-ppts --docs ./ --file-types .pptx,.pdf    # Index only PowerPoint and PDF files
+  leann search my-docs "query"                             # Search in my-docs index
+  leann ask my-docs "question"                             # Ask my-docs index
+  leann list                                              # List all stored indexes
            """,
        )

@@ -95,11 +90,7 @@ Examples:
            "index_name", nargs="?", help="Index name (default: current directory name)"
        )
        build_parser.add_argument(
-            "--docs",
-            type=str,
-            nargs="+",
-            default=["."],
-            help="Documents directories and/or files (default: current directory)",
+            "--docs", type=str, default=".", help="Documents directory (default: current directory)"
        )
        build_parser.add_argument(
            "--backend", type=str, default="hnsw", choices=["hnsw", "diskann"]
@@ -243,32 +234,6 @@ Examples:
        """Check if a file should be excluded using gitignore parser."""
        return gitignore_matches(str(relative_path))

-    def _is_git_submodule(self, path: Path) -> bool:
-        """Check if a path is a git submodule."""
-        try:
-            # Find the git repo root
-            current_dir = Path.cwd()
-            while current_dir != current_dir.parent:
-                if (current_dir / ".git").exists():
-                    gitmodules_path = current_dir / ".gitmodules"
-                    if gitmodules_path.exists():
-                        # Read .gitmodules to check if this path is a submodule
-                        gitmodules_content = gitmodules_path.read_text()
-                        # Convert path to relative to git root
-                        try:
-                            relative_path = path.resolve().relative_to(current_dir)
-                            # Check if this path appears in .gitmodules
-                            return f"path = {relative_path}" in gitmodules_content
-                        except ValueError:
-                            # Path is not under git root
-                            return False
-                    break
-                current_dir = current_dir.parent
-            return False
-        except Exception:
-            # If anything goes wrong, assume it's not a submodule
-            return False
-
    def list_indexes(self):
        print("Stored LEANN indexes:")

@@ -298,9 +263,7 @@ Examples:
            valid_projects.append(current_path)

        if not valid_projects:
-            print(
-                "No indexes found. Use 'leann build <name> --docs <dir> [<dir2> ...]' to create one."
-            )
+            print("No indexes found. Use 'leann build <name> --docs <dir>' to create one.")
            return

        total_indexes = 0
@@ -347,88 +310,56 @@ Examples:
                    print(f'  leann search {example_name} "your query"')
                    print(f"  leann ask {example_name} --interactive")

-    def load_documents(
-        self, docs_paths: Union[str, list], custom_file_types: Union[str, None] = None
-    ):
-        # Handle both single path (string) and multiple paths (list) for backward compatibility
-        if isinstance(docs_paths, str):
-            docs_paths = [docs_paths]
-
-        # Separate files and directories
-        files = []
-        directories = []
-        for path in docs_paths:
-            path_obj = Path(path)
-            if path_obj.is_file():
-                files.append(str(path_obj))
-            elif path_obj.is_dir():
-                # Check if this is a git submodule - if so, skip it
-                if self._is_git_submodule(path_obj):
-                    print(f"⚠️  Skipping git submodule: {path}")
-                    continue
-                directories.append(str(path_obj))
-            else:
-                print(f"⚠️  Warning: Path '{path}' does not exist, skipping...")
-                continue
-
-        # Print summary of what we're processing
-        total_items = len(files) + len(directories)
-        items_desc = []
-        if files:
-            items_desc.append(f"{len(files)} file{'s' if len(files) > 1 else ''}")
-        if directories:
-            items_desc.append(
-                f"{len(directories)} director{'ies' if len(directories) > 1 else 'y'}"
-            )
-
-        print(f"Loading documents from {' and '.join(items_desc)} ({total_items} total):")
-        if files:
-            print(f"  📄 Files: {', '.join([Path(f).name for f in files])}")
-        if directories:
-            print(f"  📁 Directories: {', '.join(directories)}")
-
+    def load_documents(self, docs_dir: str, custom_file_types: str | None = None):
+        print(f"Loading documents from {docs_dir}...")
        if custom_file_types:
            print(f"Using custom file types: {custom_file_types}")

-        all_documents = []
+        # Build gitignore parser
+        gitignore_matches = self._build_gitignore_parser(docs_dir)

-        # First, process individual files if any
-        if files:
-            print(f"\n🔄 Processing {len(files)} individual file{'s' if len(files) > 1 else ''}...")
+        # Try to use better PDF parsers first, but only if PDFs are requested
+        documents = []
+        docs_path = Path(docs_dir)

-            # Load individual files using SimpleDirectoryReader with input_files
-            # Note: We skip gitignore filtering for explicitly specified files
-            try:
-                # Group files by their parent directory for efficient loading
-                from collections import defaultdict
+        # Check if we should process PDFs
+        should_process_pdfs = custom_file_types is None or ".pdf" in custom_file_types

-                files_by_dir = defaultdict(list)
-                for file_path in files:
-                    parent_dir = str(Path(file_path).parent)
-                    files_by_dir[parent_dir].append(file_path)
+        if should_process_pdfs:
+            for file_path in docs_path.rglob("*.pdf"):
+                # Check if file matches any exclude pattern
+                relative_path = file_path.relative_to(docs_path)
+                if self._should_exclude_file(relative_path, gitignore_matches):
+                    continue

-                # Load files from each parent directory
-                for parent_dir, file_list in files_by_dir.items():
-                    print(
-                        f"  Loading {len(file_list)} file{'s' if len(file_list) > 1 else ''} from {parent_dir}"
-                    )
+                print(f"Processing PDF: {file_path}")
+
+                # Try PyMuPDF first (best quality)
+                text = extract_pdf_text_with_pymupdf(str(file_path))
+                if text is None:
+                    # Try pdfplumber
+                    text = extract_pdf_text_with_pdfplumber(str(file_path))
+
+                if text:
+                    # Create a simple document structure
+                    from llama_index.core import Document
+
+                    doc = Document(text=text, metadata={"source": str(file_path)})
+                    documents.append(doc)
+                else:
+                    # Fallback to default reader
+                    print(f"Using default reader for {file_path}")
                    try:
-                        file_docs = SimpleDirectoryReader(
-                            parent_dir,
-                            input_files=file_list,
+                        default_docs = SimpleDirectoryReader(
+                            str(file_path.parent),
                            filename_as_id=True,
+                            required_exts=[file_path.suffix],
                        ).load_data()
-                        all_documents.extend(file_docs)
-                        print(
-                            f"    ✅ Loaded {len(file_docs)} document{'s' if len(file_docs) > 1 else ''}"
-                        )
+                        documents.extend(default_docs)
                    except Exception as e:
-                        print(f"    ❌ Warning: Could not load files from {parent_dir}: {e}")
+                        print(f"Warning: Could not process {file_path}: {e}")

-            except Exception as e:
-                print(f"❌ Error processing individual files: {e}")
-
-        # Define file extensions to process
+        # Load other file types with default reader
        if custom_file_types:
            # Parse custom file types from comma-separated string
            code_extensions = [ext.strip() for ext in custom_file_types.split(",") if ext.strip()]
@@ -490,106 +421,41 @@ Examples:
                ".py",
                ".jl",
            ]
+        # Try to load other file types, but don't fail if none are found
+        try:
+            # Create a custom file filter function using our PathSpec
+            def file_filter(file_path: str) -> bool:
+                """Return True if file should be included (not excluded)"""
+                try:
+                    docs_path_obj = Path(docs_dir)
+                    file_path_obj = Path(file_path)
+                    relative_path = file_path_obj.relative_to(docs_path_obj)
+                    return not self._should_exclude_file(relative_path, gitignore_matches)
+                except (ValueError, OSError):
+                    return True  # Include files that can't be processed

-        # Process each directory
-        if directories:
-            print(
-                f"\n🔄 Processing {len(directories)} director{'ies' if len(directories) > 1 else 'y'}..."
-            )
+            other_docs = SimpleDirectoryReader(
+                docs_dir,
+                recursive=True,
+                encoding="utf-8",
+                required_exts=code_extensions,
+                file_extractor={},  # Use default extractors
+                filename_as_id=True,
+            ).load_data(show_progress=True)

-        for docs_dir in directories:
-            print(f"Processing directory: {docs_dir}")
-            # Build gitignore parser for each directory
-            gitignore_matches = self._build_gitignore_parser(docs_dir)
+            # Filter documents after loading based on gitignore rules
+            filtered_docs = []
+            for doc in other_docs:
+                file_path = doc.metadata.get("file_path", "")
+                if file_filter(file_path):
+                    filtered_docs.append(doc)

-            # Try to use better PDF parsers first, but only if PDFs are requested
-            documents = []
-            docs_path = Path(docs_dir)
-
-            # Check if we should process PDFs
-            should_process_pdfs = custom_file_types is None or ".pdf" in custom_file_types
-
-            if should_process_pdfs:
-                for file_path in docs_path.rglob("*.pdf"):
-                    # Check if file matches any exclude pattern
-                    try:
-                        relative_path = file_path.relative_to(docs_path)
-                        if self._should_exclude_file(relative_path, gitignore_matches):
-                            continue
-                    except ValueError:
-                        # Skip files that can't be made relative to docs_path
-                        print(f"⚠️  Skipping file outside directory scope: {file_path}")
-                        continue
-
-                    print(f"Processing PDF: {file_path}")
-
-                    # Try PyMuPDF first (best quality)
-                    text = extract_pdf_text_with_pymupdf(str(file_path))
-                    if text is None:
-                        # Try pdfplumber
-                        text = extract_pdf_text_with_pdfplumber(str(file_path))
-
-                    if text:
-                        # Create a simple document structure
-                        from llama_index.core import Document
-
-                        doc = Document(text=text, metadata={"source": str(file_path)})
-                        documents.append(doc)
-                    else:
-                        # Fallback to default reader
-                        print(f"Using default reader for {file_path}")
-                        try:
-                            default_docs = SimpleDirectoryReader(
-                                str(file_path.parent),
-                                filename_as_id=True,
-                                required_exts=[file_path.suffix],
-                            ).load_data()
-                            documents.extend(default_docs)
-                        except Exception as e:
-                            print(f"Warning: Could not process {file_path}: {e}")
-
-            # Load other file types with default reader
-            try:
-                # Create a custom file filter function using our PathSpec
-                def file_filter(
-                    file_path: str, docs_dir=docs_dir, gitignore_matches=gitignore_matches
-                ) -> bool:
-                    """Return True if file should be included (not excluded)"""
-                    try:
-                        docs_path_obj = Path(docs_dir)
-                        file_path_obj = Path(file_path)
-                        relative_path = file_path_obj.relative_to(docs_path_obj)
-                        return not self._should_exclude_file(relative_path, gitignore_matches)
-                    except (ValueError, OSError):
-                        return True  # Include files that can't be processed
-
-                other_docs = SimpleDirectoryReader(
-                    docs_dir,
-                    recursive=True,
-                    encoding="utf-8",
-                    required_exts=code_extensions,
-                    file_extractor={},  # Use default extractors
-                    filename_as_id=True,
-                ).load_data(show_progress=True)
-
-                # Filter documents after loading based on gitignore rules
-                filtered_docs = []
-                for doc in other_docs:
-                    file_path = doc.metadata.get("file_path", "")
-                    if file_filter(file_path):
-                        filtered_docs.append(doc)
-
-                documents.extend(filtered_docs)
-            except ValueError as e:
-                if "No files found" in str(e):
-                    print(f"No additional files found for other supported types in {docs_dir}.")
-                else:
-                    raise e
-
-            all_documents.extend(documents)
-            print(f"Loaded {len(documents)} documents from {docs_dir}")
-
-        documents = all_documents
+            documents.extend(filtered_docs)
+        except ValueError as e:
+            if "No files found" in str(e):
+                print("No additional files found for other supported types.")
+            else:
+                raise e

        all_texts = []

@@ -640,9 +506,7 @@ Examples:
            ".jl",
        }

-        print("start chunking documents")
-        # Add progress bar for document chunking
-        for doc in tqdm(documents, desc="Chunking documents", unit="doc"):
+        for doc in documents:
            # Check if this is a code file based on source path
            source_path = doc.metadata.get("source", "")
            is_code_file = any(source_path.endswith(ext) for ext in code_file_exts)
@@ -658,7 +522,7 @@ Examples:
        return all_texts

    async def build_index(self, args):
-        docs_paths = args.docs
+        docs_dir = args.docs
        # Use current directory name if index_name not provided
        if args.index_name:
            index_name = args.index_name
@@ -669,25 +533,13 @@ Examples:
        index_dir = self.indexes_dir / index_name
        index_path = self.get_index_path(index_name)

-        # Display all paths being indexed with file/directory distinction
-        files = [p for p in docs_paths if Path(p).is_file()]
-        directories = [p for p in docs_paths if Path(p).is_dir()]
-
-        print(f"📂 Indexing {len(docs_paths)} path{'s' if len(docs_paths) > 1 else ''}:")
-        if files:
-            print(f"  📄 Files ({len(files)}):")
-            for i, file_path in enumerate(files, 1):
-                print(f"    {i}. {Path(file_path).resolve()}")
-        if directories:
-            print(f"  📁 Directories ({len(directories)}):")
-            for i, dir_path in enumerate(directories, 1):
-                print(f"    {i}. {Path(dir_path).resolve()}")
+        print(f"📂 Indexing: {Path(docs_dir).resolve()}")

        if index_dir.exists() and not args.force:
            print(f"Index '{index_name}' already exists. Use --force to rebuild.")
            return

-        all_texts = self.load_documents(docs_paths, args.file_types)
+        all_texts = self.load_documents(docs_dir, args.file_types)
        if not all_texts:
            print("No documents found")
            return
@@ -723,7 +575,7 @@ Examples:

        if not self.index_exists(index_name):
            print(
-                f"Index '{index_name}' not found. Use 'leann build {index_name} --docs <dir> [<dir2> ...]' to create it."
+                f"Index '{index_name}' not found. Use 'leann build {index_name} --docs <dir>' to create it."
            )
            return

@@ -750,7 +602,7 @@ Examples:

        if not self.index_exists(index_name):
            print(
-                f"Index '{index_name}' not found. Use 'leann build {index_name} --docs <dir> [<dir2> ...]' to create it."
+                f"Index '{index_name}' not found. Use 'leann build {index_name} --docs <dir>' to create it."
            )
            return

--- a/packages/leann-core/src/leann/embedding_compute.py
+++ b/packages/leann-core/src/leann/embedding_compute.py
@@ -6,6 +6,7 @@ Preserves all optimization parameters to ensure performance

 import logging
 import os
+from concurrent.futures import ThreadPoolExecutor, as_completed
 from typing import Any

 import numpy as np
@@ -373,9 +374,7 @@ def compute_embeddings_ollama(
    texts: list[str], model_name: str, is_build: bool = False, host: str = "http://localhost:11434"
 ) -> np.ndarray:
    """
-    Compute embeddings using Ollama API with simplified batch processing.
-
-    Uses batch size of 32 for MPS/CPU and 128 for CUDA to optimize performance.
+    Compute embeddings using Ollama API.

    Args:
        texts: List of texts to compute embeddings for
@@ -439,19 +438,12 @@ def compute_embeddings_ollama(
            if any(emb in base_name for emb in ["embed", "bge", "minilm", "e5"]):
                embedding_models.append(model)

-        # Check if model exists (handle versioned names) and resolve to full name
-        resolved_model_name = None
-        for name in model_names:
-            # Exact match
-            if model_name == name:
-                resolved_model_name = name
-                break
-            # Match without version tag (use the versioned name)
-            elif model_name == name.split(":")[0]:
-                resolved_model_name = name
-                break
+        # Check if model exists (handle versioned names)
+        model_found = any(
+            model_name == name.split(":")[0] or model_name == name for name in model_names
+        )

-        if not resolved_model_name:
+        if not model_found:
            error_msg = f"❌ Model '{model_name}' not found in local Ollama.\n\n"

            # Suggest pulling the model
@@ -473,11 +465,6 @@ def compute_embeddings_ollama(
            error_msg += "\n📚 Browse more: https://ollama.com/library"
            raise ValueError(error_msg)

-        # Use the resolved model name for all subsequent operations
-        if resolved_model_name != model_name:
-            logger.info(f"Resolved model name '{model_name}' to '{resolved_model_name}'")
-        model_name = resolved_model_name
-
        # Verify the model supports embeddings by testing it
        try:
            test_response = requests.post(
@@ -498,148 +485,138 @@ def compute_embeddings_ollama(
    except requests.exceptions.RequestException as e:
        logger.warning(f"Could not verify model existence: {e}")

-    # Determine batch size based on device availability
-    # Check for CUDA/MPS availability using torch if available
-    batch_size = 32  # Default for MPS/CPU
-    try:
-        import torch
+    # Process embeddings with optimized concurrent processing
+    import requests

-        if torch.cuda.is_available():
-            batch_size = 128  # CUDA gets larger batch size
-        elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
-            batch_size = 32  # MPS gets smaller batch size
-    except ImportError:
-        # If torch is not available, use conservative batch size
-        batch_size = 32
+    def get_single_embedding(text_idx_tuple):
+        """Helper function to get embedding for a single text."""
+        text, idx = text_idx_tuple
+        max_retries = 3
+        retry_count = 0

-    logger.info(f"Using batch size: {batch_size}")
+        # Truncate very long texts to avoid API issues
+        truncated_text = text[:8000] if len(text) > 8000 else text

-    def get_batch_embeddings(batch_texts):
-        """Get embeddings for a batch of texts."""
-        all_embeddings = []
-        failed_indices = []
+        while retry_count < max_retries:
+            try:
+                response = requests.post(
+                    f"{host}/api/embeddings",
+                    json={"model": model_name, "prompt": truncated_text},
+                    timeout=30,
+                )
+                response.raise_for_status()

-        for i, text in enumerate(batch_texts):
-            max_retries = 3
-            retry_count = 0
+                result = response.json()
+                embedding = result.get("embedding")

-            # Truncate very long texts to avoid API issues
-            truncated_text = text[:8000] if len(text) > 8000 else text
-            while retry_count < max_retries:
-                try:
-                    response = requests.post(
-                        f"{host}/api/embeddings",
-                        json={"model": model_name, "prompt": truncated_text},
-                        timeout=30,
+                if embedding is None:
+                    raise ValueError(f"No embedding returned for text {idx}")
+
+                return idx, embedding
+
+            except requests.exceptions.Timeout:
+                retry_count += 1
+                if retry_count >= max_retries:
+                    logger.warning(f"Timeout for text {idx} after {max_retries} retries")
+                    return idx, None
+
+            except Exception as e:
+                if retry_count >= max_retries - 1:
+                    logger.error(f"Failed to get embedding for text {idx}: {e}")
+                    return idx, None
+                retry_count += 1
+
+        return idx, None
+
+    # Determine if we should use concurrent processing
+    use_concurrent = (
+        len(texts) > 5 and not is_build
+    )  # Don't use concurrent in build mode to avoid overwhelming
+    max_workers = min(4, len(texts))  # Limit concurrent requests to avoid overwhelming Ollama
+
+    all_embeddings = [None] * len(texts)  # Pre-allocate list to maintain order
+    failed_indices = []
+
+    if use_concurrent:
+        logger.info(
+            f"Using concurrent processing with {max_workers} workers for {len(texts)} texts"
+        )
+
+        with ThreadPoolExecutor(max_workers=max_workers) as executor:
+            # Submit all tasks
+            future_to_idx = {
+                executor.submit(get_single_embedding, (text, idx)): idx
+                for idx, text in enumerate(texts)
+            }
+
+            # Add progress bar for concurrent processing
+            try:
+                if is_build or len(texts) > 10:
+                    from tqdm import tqdm
+
+                    futures_iterator = tqdm(
+                        as_completed(future_to_idx),
+                        total=len(texts),
+                        desc="Computing Ollama embeddings",
                    )
-                    response.raise_for_status()
-
-                    result = response.json()
-                    embedding = result.get("embedding")
-
-                    if embedding is None:
-                        raise ValueError(f"No embedding returned for text {i}")
-
-                    if not isinstance(embedding, list) or len(embedding) == 0:
-                        raise ValueError(f"Invalid embedding format for text {i}")
-
-                    all_embeddings.append(embedding)
-                    break
-
-                except requests.exceptions.Timeout:
-                    retry_count += 1
-                    if retry_count >= max_retries:
-                        logger.warning(f"Timeout for text {i} after {max_retries} retries")
-                        failed_indices.append(i)
-                        all_embeddings.append(None)
-                        break
+                else:
+                    futures_iterator = as_completed(future_to_idx)
+            except ImportError:
+                futures_iterator = as_completed(future_to_idx)

+            # Collect results as they complete
+            for future in futures_iterator:
+                try:
+                    idx, embedding = future.result()
+                    if embedding is not None:
+                        all_embeddings[idx] = embedding
+                    else:
+                        failed_indices.append(idx)
                except Exception as e:
-                    retry_count += 1
-                    if retry_count >= max_retries:
-                        logger.error(f"Failed to get embedding for text {i}: {e}")
-                        failed_indices.append(i)
-                        all_embeddings.append(None)
-                        break
-        return all_embeddings, failed_indices
+                    idx = future_to_idx[future]
+                    logger.error(f"Exception for text {idx}: {e}")
+                    failed_indices.append(idx)

-    # Process texts in batches
-    all_embeddings = []
-    all_failed_indices = []
-
-    # Setup progress bar if needed
-    show_progress = is_build or len(texts) > 10
-    try:
-        if show_progress:
-            from tqdm import tqdm
-    except ImportError:
-        show_progress = False
-
-    # Process batches
-    num_batches = (len(texts) + batch_size - 1) // batch_size
-
-    if show_progress:
-        batch_iterator = tqdm(range(num_batches), desc="Computing Ollama embeddings")
    else:
-        batch_iterator = range(num_batches)
+        # Sequential processing with progress bar
+        show_progress = is_build or len(texts) > 10

-    for batch_idx in batch_iterator:
-        start_idx = batch_idx * batch_size
-        end_idx = min(start_idx + batch_size, len(texts))
-        batch_texts = texts[start_idx:end_idx]
+        try:
+            if show_progress:
+                from tqdm import tqdm

-        batch_embeddings, batch_failed = get_batch_embeddings(batch_texts)
+                iterator = tqdm(
+                    enumerate(texts), total=len(texts), desc="Computing Ollama embeddings"
+                )
+            else:
+                iterator = enumerate(texts)
+        except ImportError:
+            iterator = enumerate(texts)

-        # Adjust failed indices to global indices
-        global_failed = [start_idx + idx for idx in batch_failed]
-        all_failed_indices.extend(global_failed)
-        all_embeddings.extend(batch_embeddings)
+        for idx, text in iterator:
+            result_idx, embedding = get_single_embedding((text, idx))
+            if embedding is not None:
+                all_embeddings[idx] = embedding
+            else:
+                failed_indices.append(idx)

    # Handle failed embeddings
-    if all_failed_indices:
-        if len(all_failed_indices) == len(texts):
+    if failed_indices:
+        if len(failed_indices) == len(texts):
            raise RuntimeError("Failed to compute any embeddings")

-        logger.warning(
-            f"Failed to compute embeddings for {len(all_failed_indices)}/{len(texts)} texts"
-        )
+        logger.warning(f"Failed to compute embeddings for {len(failed_indices)}/{len(texts)} texts")

        # Use zero embeddings as fallback for failed ones
        valid_embedding = next((e for e in all_embeddings if e is not None), None)
        if valid_embedding:
            embedding_dim = len(valid_embedding)
-            for i, embedding in enumerate(all_embeddings):
-                if embedding is None:
-                    all_embeddings[i] = [0.0] * embedding_dim
+            for idx in failed_indices:
+                all_embeddings[idx] = [0.0] * embedding_dim

-    # Remove None values
+    # Remove None values and convert to numpy array
    all_embeddings = [e for e in all_embeddings if e is not None]

-    if not all_embeddings:
-        raise RuntimeError("No valid embeddings were computed")
-
-    # Validate embedding dimensions
-    expected_dim = len(all_embeddings[0])
-    inconsistent_dims = []
-    for i, embedding in enumerate(all_embeddings):
-        if len(embedding) != expected_dim:
-            inconsistent_dims.append((i, len(embedding)))
-
-    if inconsistent_dims:
-        error_msg = f"Ollama returned inconsistent embedding dimensions. Expected {expected_dim}, but got:\n"
-        for idx, dim in inconsistent_dims[:10]:  # Show first 10 inconsistent ones
-            error_msg += f"  - Text {idx}: {dim} dimensions\n"
-        if len(inconsistent_dims) > 10:
-            error_msg += f"  ... and {len(inconsistent_dims) - 10} more\n"
-        error_msg += f"\nThis is likely an Ollama API bug with model '{model_name}'. Please try:\n"
-        error_msg += "1. Restart Ollama service: 'ollama serve'\n"
-        error_msg += f"2. Re-pull the model: 'ollama pull {model_name}'\n"
-        error_msg += (
-            "3. Use sentence-transformers instead: --embedding-mode sentence-transformers\n"
-        )
-        error_msg += "4. Report this issue to Ollama: https://github.com/ollama/ollama/issues"
-        raise ValueError(error_msg)
-
    # Convert to numpy array and normalize
    embeddings = np.array(all_embeddings, dtype=np.float32)

--- a/packages/leann-core/src/leann/embedding_server_manager.py
+++ b/packages/leann-core/src/leann/embedding_server_manager.py
@@ -1,6 +1,7 @@
 import atexit
 import logging
 import os
+import signal
 import socket
 import subprocess
 import sys
@@ -8,7 +9,7 @@ import time
 from pathlib import Path
 from typing import Optional

-# Lightweight, self-contained server manager with no cross-process inspection
+import psutil

 # Set up logging based on environment variable
 LOG_LEVEL = os.getenv("LEANN_LOG_LEVEL", "WARNING").upper()
@@ -43,7 +44,130 @@ def _check_port(port: int) -> bool:
        return s.connect_ex(("localhost", port)) == 0


-# Note: All cross-process scanning helpers removed for simplicity
+def _check_process_matches_config(
+    port: int, expected_model: str, expected_passages_file: str
+) -> bool:
+    """
+    Check if the process using the port matches our expected model and passages file.
+    Returns True if matches, False otherwise.
+    """
+    try:
+        for proc in psutil.process_iter(["pid", "cmdline"]):
+            if not _is_process_listening_on_port(proc, port):
+                continue
+
+            cmdline = proc.info["cmdline"]
+            if not cmdline:
+                continue
+
+            return _check_cmdline_matches_config(
+                cmdline, port, expected_model, expected_passages_file
+            )
+
+        logger.debug(f"No process found listening on port {port}")
+        return False
+
+    except Exception as e:
+        logger.warning(f"Could not check process on port {port}: {e}")
+        return False
+
+
+def _is_process_listening_on_port(proc, port: int) -> bool:
+    """Check if a process is listening on the given port."""
+    try:
+        connections = proc.net_connections()
+        for conn in connections:
+            if conn.laddr.port == port and conn.status == psutil.CONN_LISTEN:
+                return True
+        return False
+    except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
+        return False
+
+
+def _check_cmdline_matches_config(
+    cmdline: list, port: int, expected_model: str, expected_passages_file: str
+) -> bool:
+    """Check if command line matches our expected configuration."""
+    cmdline_str = " ".join(cmdline)
+    logger.debug(f"Found process on port {port}: {cmdline_str}")
+
+    # Check if it's our embedding server
+    is_embedding_server = any(
+        server_type in cmdline_str
+        for server_type in [
+            "embedding_server",
+            "leann_backend_diskann.embedding_server",
+            "leann_backend_hnsw.hnsw_embedding_server",
+        ]
+    )
+
+    if not is_embedding_server:
+        logger.debug(f"Process on port {port} is not our embedding server")
+        return False
+
+    # Check model name
+    model_matches = _check_model_in_cmdline(cmdline, expected_model)
+
+    # Check passages file if provided
+    passages_matches = _check_passages_in_cmdline(cmdline, expected_passages_file)
+
+    result = model_matches and passages_matches
+    logger.debug(
+        f"model_matches: {model_matches}, passages_matches: {passages_matches}, overall: {result}"
+    )
+    return result
+
+
+def _check_model_in_cmdline(cmdline: list, expected_model: str) -> bool:
+    """Check if the command line contains the expected model."""
+    if "--model-name" not in cmdline:
+        return False
+
+    model_idx = cmdline.index("--model-name")
+    if model_idx + 1 >= len(cmdline):
+        return False
+
+    actual_model = cmdline[model_idx + 1]
+    return actual_model == expected_model
+
+
+def _check_passages_in_cmdline(cmdline: list, expected_passages_file: str) -> bool:
+    """Check if the command line contains the expected passages file."""
+    if "--passages-file" not in cmdline:
+        return False  # Expected but not found
+
+    passages_idx = cmdline.index("--passages-file")
+    if passages_idx + 1 >= len(cmdline):
+        return False
+
+    actual_passages = cmdline[passages_idx + 1]
+    expected_path = Path(expected_passages_file).resolve()
+    actual_path = Path(actual_passages).resolve()
+    return actual_path == expected_path
+
+
+def _find_compatible_port_or_next_available(
+    start_port: int, model_name: str, passages_file: str, max_attempts: int = 100
+) -> tuple[int, bool]:
+    """
+    Find a port that either has a compatible server or is available.
+    Returns (port, is_compatible) where is_compatible indicates if we found a matching server.
+    """
+    for port in range(start_port, start_port + max_attempts):
+        if not _check_port(port):
+            # Port is available
+            return port, False
+
+        # Port is in use, check if it's compatible
+        if _check_process_matches_config(port, model_name, passages_file):
+            logger.info(f"Found compatible server on port {port}")
+            return port, True
+        else:
+            logger.info(f"Port {port} has incompatible server, trying next port...")
+
+    raise RuntimeError(
+        f"Could not find compatible or available port in range {start_port}-{start_port + max_attempts}"
+    )


 class EmbeddingServerManager:
@@ -62,16 +186,7 @@ class EmbeddingServerManager:
        self.backend_module_name = backend_module_name
        self.server_process: Optional[subprocess.Popen] = None
        self.server_port: Optional[int] = None
-        # Track last-started config for in-process reuse only
-        self._server_config: Optional[dict] = None
        self._atexit_registered = False
-        # Also register a weakref finalizer to ensure cleanup when manager is GC'ed
-        try:
-            import weakref
-
-            self._finalizer = weakref.finalize(self, self._finalize_process)
-        except Exception:
-            self._finalizer = None

    def start_server(
        self,
@@ -81,24 +196,26 @@ class EmbeddingServerManager:
        **kwargs,
    ) -> tuple[bool, int]:
        """Start the embedding server."""
-        # passages_file may be present in kwargs for server CLI, but we don't need it here
+        passages_file = kwargs.get("passages_file")

-        # If this manager already has a live server, just reuse it
-        if self.server_process and self.server_process.poll() is None and self.server_port:
-            logger.info("Reusing in-process server")
-            return True, self.server_port
+        # Check if we have a compatible server already running
+        if self._has_compatible_running_server(model_name, passages_file):
+            logger.info("Found compatible running server!")
+            return True, port

        # For Colab environment, use a different strategy
        if _is_colab_environment():
            logger.info("Detected Colab environment, using alternative startup strategy")
            return self._start_server_colab(port, model_name, embedding_mode, **kwargs)

-        # Always pick a fresh available port
-        try:
-            actual_port = _get_available_port(port)
-        except RuntimeError:
-            logger.error("No available ports found")
-            return False, port
+        # Find a compatible port or next available
+        actual_port, is_compatible = _find_compatible_port_or_next_available(
+            port, model_name, passages_file
+        )
+
+        if is_compatible:
+            logger.info(f"Found compatible server on port {actual_port}")
+            return True, actual_port

        # Start a new server
        return self._start_new_server(actual_port, model_name, embedding_mode, **kwargs)
@@ -131,7 +248,17 @@ class EmbeddingServerManager:
            logger.error(f"Failed to start embedding server in Colab: {e}")
            return False, actual_port

-    # Note: No compatibility check needed; manager is per-searcher and configs are stable per instance
+    def _has_compatible_running_server(self, model_name: str, passages_file: str) -> bool:
+        """Check if we have a compatible running server."""
+        if not (self.server_process and self.server_process.poll() is None and self.server_port):
+            return False
+
+        if _check_process_matches_config(self.server_port, model_name, passages_file):
+            logger.info(f"Existing server process (PID {self.server_process.pid}) is compatible")
+            return True
+
+        logger.info("Existing server process is incompatible. Should start a new server.")
+        return False

    def _start_new_server(
        self, port: int, model_name: str, embedding_mode: str, **kwargs
@@ -178,61 +305,33 @@ class EmbeddingServerManager:
        project_root = Path(__file__).parent.parent.parent.parent.parent
        logger.info(f"Command: {' '.join(command)}")

-        # In CI environment, redirect stdout to avoid buffer deadlock but keep stderr for debugging
-        # Embedding servers use many print statements that can fill stdout buffers
+        # In CI environment, redirect output to avoid buffer deadlock
+        # Embedding servers use many print statements that can fill buffers
        is_ci = os.environ.get("CI") == "true"
        if is_ci:
            stdout_target = subprocess.DEVNULL
-            stderr_target = None  # Keep stderr for error debugging in CI
-            logger.info(
-                "CI environment detected, redirecting embedding server stdout to DEVNULL, keeping stderr"
-            )
+            stderr_target = subprocess.DEVNULL
+            logger.info("CI environment detected, redirecting embedding server output to DEVNULL")
        else:
            stdout_target = None  # Direct to console for visible logs
            stderr_target = None  # Direct to console for visible logs

-        # Start embedding server subprocess
+        # IMPORTANT: Use a new session so we can manage the whole process group reliably
        self.server_process = subprocess.Popen(
            command,
            cwd=project_root,
            stdout=stdout_target,
            stderr=stderr_target,
+            start_new_session=True,
        )
        self.server_port = port
-        # Record config for in-process reuse
-        try:
-            self._server_config = {
-                "model_name": command[command.index("--model-name") + 1]
-                if "--model-name" in command
-                else "",
-                "passages_file": command[command.index("--passages-file") + 1]
-                if "--passages-file" in command
-                else "",
-                "embedding_mode": command[command.index("--embedding-mode") + 1]
-                if "--embedding-mode" in command
-                else "sentence-transformers",
-            }
-        except Exception:
-            self._server_config = {
-                "model_name": "",
-                "passages_file": "",
-                "embedding_mode": "sentence-transformers",
-            }
        logger.info(f"Server process started with PID: {self.server_process.pid}")

        # Register atexit callback only when we actually start a process
        if not self._atexit_registered:
-            # Always attempt best-effort finalize at interpreter exit
-            atexit.register(self._finalize_process)
+            # Use a lambda to avoid issues with bound methods
+            atexit.register(lambda: self.stop_server() if self.server_process else None)
            self._atexit_registered = True
-        # Touch finalizer so it knows there is a live process
-        if getattr(self, "_finalizer", None) is not None and not self._finalizer.alive:
-            try:
-                import weakref
-
-                self._finalizer = weakref.finalize(self, self._finalize_process)
-            except Exception:
-                pass

    def _wait_for_server_ready(self, port: int) -> tuple[bool, int]:
        """Wait for the server to be ready."""
@@ -257,28 +356,34 @@ class EmbeddingServerManager:
        if not self.server_process:
            return

-        if self.server_process and self.server_process.poll() is not None:
+        if self.server_process.poll() is not None:
            # Process already terminated
            self.server_process = None
-            self.server_port = None
-            self._server_config = None
            return

        logger.info(
            f"Terminating server process (PID: {self.server_process.pid}) for backend {self.backend_module_name}..."
        )
-
-        # Use simple termination - our improved server shutdown should handle this properly
-        self.server_process.terminate()
+        # Try terminating the whole process group first (POSIX)
+        try:
+            pgid = os.getpgid(self.server_process.pid)
+            os.killpg(pgid, signal.SIGTERM)
+        except Exception:
+            # Fallback to terminating just the process
+            self.server_process.terminate()

        try:
-            self.server_process.wait(timeout=5)  # Give more time for graceful shutdown
-            logger.info(f"Server process {self.server_process.pid} terminated gracefully.")
+            self.server_process.wait(timeout=3)
+            logger.info(f"Server process {self.server_process.pid} terminated.")
        except subprocess.TimeoutExpired:
            logger.warning(
-                f"Server process {self.server_process.pid} did not terminate within 5 seconds, force killing..."
+                f"Server process {self.server_process.pid} did not terminate gracefully within 3 seconds, killing it."
            )
-            self.server_process.kill()
+            try:
+                pgid = os.getpgid(self.server_process.pid)
+                os.killpg(pgid, signal.SIGKILL)
+            except Exception:
+                self.server_process.kill()
            try:
                self.server_process.wait(timeout=2)
                logger.info(f"Server process {self.server_process.pid} killed successfully.")
@@ -286,58 +391,32 @@ class EmbeddingServerManager:
                logger.error(
                    f"Failed to kill server process {self.server_process.pid} - it may be hung"
                )
+                # Don't hang indefinitely

-        # Clean up process resources with timeout to avoid CI hang
-        try:
-            # Use shorter timeout in CI environments
-            is_ci = os.environ.get("CI") == "true"
-            timeout = 3 if is_ci else 10
-            self.server_process.wait(timeout=timeout)
-            logger.info(f"Server process {self.server_process.pid} cleanup completed")
-        except subprocess.TimeoutExpired:
-            logger.warning(f"Process cleanup timeout after {timeout}s, proceeding anyway")
-        except Exception as e:
-            logger.warning(f"Error during process cleanup: {e}")
-        finally:
-            self.server_process = None
-            self.server_port = None
-            self._server_config = None
-
-    def _finalize_process(self) -> None:
-        """Best-effort cleanup used by weakref.finalize/atexit."""
-        try:
-            self.stop_server()
-        except Exception:
-            pass
-
-    def _adopt_existing_server(self, *args, **kwargs) -> None:
-        # Removed: cross-process adoption no longer supported
-        return
+        # Clean up process resources without waiting
+        # The process should already be terminated/killed above
+        # Don't wait here as it can hang CI indefinitely
+        self.server_process = None

    def _launch_server_process_colab(self, command: list, port: int) -> None:
        """Launch the server process with Colab-specific settings."""
        logger.info(f"Colab Command: {' '.join(command)}")

-        # In Colab, we need to be more careful about process management
+        # In Colab, redirect to DEVNULL to avoid pipe blocking
+        # PIPE without reading can cause hangs
        self.server_process = subprocess.Popen(
            command,
-            stdout=subprocess.PIPE,
-            stderr=subprocess.PIPE,
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.DEVNULL,
            text=True,
        )
        self.server_port = port
        logger.info(f"Colab server process started with PID: {self.server_process.pid}")

-        # Register atexit callback (unified)
+        # Register atexit callback
        if not self._atexit_registered:
-            atexit.register(self._finalize_process)
+            atexit.register(lambda: self.stop_server() if self.server_process else None)
            self._atexit_registered = True
-        # Record config for in-process reuse is best-effort in Colab mode
-        self._server_config = {
-            "model_name": "",
-            "passages_file": "",
-            "embedding_mode": "sentence-transformers",
-        }

    def _wait_for_server_ready_colab(self, port: int) -> tuple[bool, int]:
        """Wait for the server to be ready with Colab-specific timeout."""
--- a/packages/leann-core/src/leann/mcp.py
+++ b/packages/leann-core/src/leann/mcp.py
@@ -116,6 +116,7 @@ def handle_request(request):
                    f"--top-k={args.get('top_k', 5)}",
                    f"--complexity={args.get('complexity', 32)}",
                ]
+
                result = subprocess.run(cmd, capture_output=True, text=True)

            elif tool_name == "leann_status":
--- a/packages/leann-core/src/leann/searcher_base.py
+++ b/packages/leann-core/src/leann/searcher_base.py
@@ -132,10 +132,15 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
        import msgpack
        import zmq

+        context = None
+        socket = None
        try:
            context = zmq.Context()
            socket = context.socket(zmq.REQ)
-            socket.setsockopt(zmq.RCVTIMEO, 30000)  # 30 second timeout
+            socket.setsockopt(zmq.LINGER, 0)  # Don't block on close
+            socket.setsockopt(zmq.RCVTIMEO, 5000)  # 5 second timeout
+            socket.setsockopt(zmq.SNDTIMEO, 5000)  # 5 second timeout
+            socket.setsockopt(zmq.IMMEDIATE, 1)  # Don't wait for connection
            socket.connect(f"tcp://localhost:{zmq_port}")

            # Send embedding request
@@ -147,9 +152,6 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
            response_bytes = socket.recv()
            response = msgpack.unpackb(response_bytes)

-            socket.close()
-            context.term()
-
            # Convert response to numpy array
            if isinstance(response, list) and len(response) > 0:
                return np.array(response, dtype=np.float32)
@@ -158,6 +160,11 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):

        except Exception as e:
            raise RuntimeError(f"Failed to compute embeddings via server: {e}")
+        finally:
+            if socket:
+                socket.close(linger=0)
+            if context:
+                context.term()

    @abstractmethod
    def search(
@@ -191,7 +198,27 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
        """
        pass

-    def __del__(self):
-        """Ensures the embedding server is stopped when the searcher is destroyed."""
+    def cleanup(self):
+        """Cleanup resources including embedding server and ZMQ connections."""
+        # Stop embedding server
        if hasattr(self, "embedding_server_manager"):
            self.embedding_server_manager.stop_server()
+
+        # Set ZMQ linger but don't terminate global context
+        try:
+            import zmq
+
+            # Just set linger on the global instance
+            ctx = zmq.Context.instance()
+            ctx.linger = 0
+            # NEVER call ctx.term() on the global instance
+        except Exception:
+            pass
+
+    def __del__(self):
+        """Ensures resources are cleaned up when the searcher is destroyed."""
+        try:
+            self.cleanup()
+        except Exception:
+            # Ignore errors during destruction
+            pass
--- a/packages/leann-mcp/README.md
+++ b/packages/leann-mcp/README.md
@@ -45,42 +45,6 @@ leann build my-project --docs ./
 claude
 ```

-## 🚀 Advanced Usage Examples
-
-### Index Entire Git Repository
-```bash
-# Index all tracked files in your git repository, note right now we will skip submodules, but we can add it back easily if you want
-leann build my-repo --docs $(git ls-files) --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-
-# Index only specific file types from git
-leann build my-python-code --docs $(git ls-files "*.py") --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-```
-
-### Multiple Directories and Files
-```bash
-# Index multiple directories
-leann build my-codebase --docs ./src ./tests ./docs ./config --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-
-# Mix files and directories
-leann build my-project --docs ./README.md ./src/ ./package.json ./docs/ --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-
-# Specific files only
-leann build my-configs --docs ./tsconfig.json ./package.json ./webpack.config.js --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-```
-
-### Advanced Git Integration
-```bash
-# Index recently modified files
-leann build recent-changes --docs $(git diff --name-only HEAD~10..HEAD) --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-
-# Index files matching pattern
-leann build frontend --docs $(git ls-files "*.tsx" "*.ts" "*.jsx" "*.js") --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-
-# Index documentation and config files
-leann build docs-and-configs --docs $(git ls-files "*.md" "*.yml" "*.yaml" "*.json" "*.toml") --embedding-mode sentence-transformers --embedding-model all-MiniLM-L6-v2 --backend hnsw
-```
-
-
 **Try this in Claude Code:**
 ```
 Help me understand this codebase. List available indexes and search for authentication patterns.
--- a/packages/leann/pyproject.toml
+++ b/packages/leann/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann"
-version = "0.2.8"
+version = "0.2.7"
 description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
 readme = "README.md"
 requires-python = ">=3.9"
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -40,8 +40,8 @@ dependencies = [
    # Other dependencies
    "ipykernel==6.29.5",
    "msgpack>=1.1.1",
-    "mlx>=0.26.3; sys_platform == 'darwin' and platform_machine == 'arm64'",
-    "mlx-lm>=0.26.0; sys_platform == 'darwin' and platform_machine == 'arm64'",
+    "mlx>=0.26.3; sys_platform == 'darwin'",
+    "mlx-lm>=0.26.0; sys_platform == 'darwin'",
    "psutil>=5.8.0",
    "pybind11>=3.0.0",
    "pathspec>=0.12.1",
@@ -51,9 +51,9 @@ dependencies = [

 [project.optional-dependencies]
 dev = [
-    "pytest>=7.0",
-    "pytest-cov>=4.0",
-    "pytest-xdist>=3.0",  # For parallel test execution
+    "pytest>=8.3.0",  # Minimum version for Python 3.13 support
+"pytest-cov>=5.0",
+"pytest-xdist>=3.5",  # For parallel test execution
    "black>=23.0",
    "ruff==0.12.7",  # Fixed version to ensure consistent formatting across all environments
    "matplotlib",
@@ -62,8 +62,10 @@ dev = [
 ]

 test = [
-    "pytest>=7.0",
-    "pytest-timeout>=2.0",
+    "pytest>=8.3.0",  # Minimum version for Python 3.13 support
+    "pytest-timeout>=2.3",
+    "anyio>=4.0",  # For async test support (includes pytest plugin)
+    "psutil>=5.9.0",  # For process cleanup in tests
    "llama-index-core>=0.12.0",
    "llama-index-readers-file>=0.4.0",
    "python-dotenv>=1.0.0",
@@ -156,6 +158,7 @@ markers = [
    "openai: marks tests that require OpenAI API key",
 ]
 timeout = 300  # Reduced from 600s (10min) to 300s (5min) for CI safety
+timeout_method = "thread"  # Use thread method to avoid non-daemon thread issues
 addopts = [
    "-v",
    "--tb=short",
--- a/scripts/diagnose_hang.sh
+++ b/scripts/diagnose_hang.sh
@@ -0,0 +1,103 @@
+#!/bin/bash
+# Diagnostic script for debugging CI hangs
+
+echo "========================================="
+echo "      CI HANG DIAGNOSTIC SCRIPT"
+echo "========================================="
+echo ""
+
+echo "📅 Current time: $(date)"
+echo "🖥️  Hostname: $(hostname)"
+echo "👤 User: $(whoami)"
+echo "📂 Working directory: $(pwd)"
+echo ""
+
+echo "=== PYTHON ENVIRONMENT ==="
+python --version 2>&1 || echo "Python not found"
+pip list 2>&1 | head -20 || echo "pip not available"
+echo ""
+
+echo "=== PROCESS INFORMATION ==="
+echo "Current shell PID: $$"
+echo "Parent PID: $PPID"
+echo ""
+
+echo "All Python processes:"
+ps aux | grep -E "[p]ython" || echo "No Python processes"
+echo ""
+
+echo "All pytest processes:"
+ps aux | grep -E "[p]ytest" || echo "No pytest processes"
+echo ""
+
+echo "Embedding server processes:"
+ps aux | grep -E "[e]mbedding_server" || echo "No embedding server processes"
+echo ""
+
+echo "Zombie processes:"
+ps aux | grep "<defunct>" || echo "No zombie processes"
+echo ""
+
+echo "=== NETWORK INFORMATION ==="
+echo "Network listeners on typical embedding server ports:"
+ss -ltn 2>/dev/null | grep -E ":555[0-9]|:556[0-9]" || netstat -ltn 2>/dev/null | grep -E ":555[0-9]|:556[0-9]" || echo "No listeners on embedding ports"
+echo ""
+
+echo "All network listeners:"
+ss -ltn 2>/dev/null | head -20 || netstat -ltn 2>/dev/null | head -20 || echo "Cannot get network info"
+echo ""
+
+echo "=== FILE DESCRIPTORS ==="
+echo "Open files for current shell:"
+lsof -p $$ 2>/dev/null | head -20 || echo "lsof not available"
+echo ""
+
+if [ -d "/proc/$$" ]; then
+    echo "File descriptors for current shell (/proc/$$/fd):"
+    ls -la /proc/$$/fd 2>/dev/null | head -20 || echo "Cannot access /proc/$$/fd"
+    echo ""
+fi
+
+echo "=== SYSTEM RESOURCES ==="
+echo "Memory usage:"
+free -h 2>/dev/null || vm_stat 2>/dev/null || echo "Cannot get memory info"
+echo ""
+
+echo "Disk usage:"
+df -h . 2>/dev/null || echo "Cannot get disk info"
+echo ""
+
+echo "CPU info:"
+nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "Cannot get CPU info"
+echo ""
+
+echo "=== PYTHON SPECIFIC CHECKS ==="
+python -c "
+import sys
+import os
+print(f'Python executable: {sys.executable}')
+print(f'Python path: {sys.path[:3]}...')
+print(f'Environment PYTHONPATH: {os.environ.get(\"PYTHONPATH\", \"Not set\")}')
+print(f'Site packages: {[p for p in sys.path if \"site-packages\" in p][:2]}')
+" 2>&1 || echo "Cannot run Python diagnostics"
+echo ""
+
+echo "=== ZMQ SPECIFIC CHECKS ==="
+python -c "
+try:
+    import zmq
+    print(f'ZMQ version: {zmq.zmq_version()}')
+    print(f'PyZMQ version: {zmq.pyzmq_version()}')
+    ctx = zmq.Context.instance()
+    print(f'ZMQ context instance: {ctx}')
+except Exception as e:
+    print(f'ZMQ check failed: {e}')
+" 2>&1 || echo "Cannot check ZMQ"
+echo ""
+
+echo "=== PYTEST CHECK ==="
+pytest --version 2>&1 || echo "pytest not found"
+echo ""
+
+echo "=== END OF DIAGNOSTICS ==="
+echo "Generated at: $(date)"
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,301 @@
+"""Global test configuration and cleanup fixtures."""
+
+import faulthandler
+import os
+import signal
+import time
+from collections.abc import Generator
+
+import pytest
+
+# Enable faulthandler to dump stack traces
+faulthandler.enable()
+
+
+@pytest.fixture(scope="session", autouse=True)
+def _ci_backtraces():
+    """Dump stack traces before CI timeout to diagnose hanging."""
+    if os.getenv("CI") == "true":
+        # Dump stack traces 10s before the 180s timeout
+        faulthandler.dump_traceback_later(170, repeat=True)
+    yield
+    faulthandler.cancel_dump_traceback_later()
+
+
+@pytest.fixture(scope="session", autouse=True)
+def global_test_cleanup() -> Generator:
+    """Global cleanup fixture that runs after all tests.
+
+    This ensures all ZMQ connections and child processes are properly cleaned up,
+    preventing the test runner from hanging on exit.
+    """
+    yield
+
+    # Cleanup after all tests
+    print("\n🧹 Running global test cleanup...")
+
+    # 1. Force cleanup of any LeannSearcher instances
+    try:
+        import gc
+
+        # Force garbage collection to trigger __del__ methods
+        gc.collect()
+        time.sleep(0.2)
+    except Exception:
+        pass
+
+    # 2. Set ZMQ linger but DON'T term Context.instance()
+    # Terminating the global instance can block if other code still has sockets
+    try:
+        import zmq
+
+        # Just set linger on the global instance, don't terminate it
+        ctx = zmq.Context.instance()
+        ctx.linger = 0
+        # Do NOT call ctx.term() or ctx.destroy() on the global instance!
+        # That would block waiting for all sockets to close
+    except Exception:
+        pass
+
+    # Kill any leftover child processes (including grandchildren)
+    try:
+        import psutil
+
+        current_process = psutil.Process()
+        # Get ALL descendants recursively
+        children = current_process.children(recursive=True)
+
+        if children:
+            print(f"\n⚠️  Cleaning up {len(children)} leftover child processes...")
+
+            # First try to terminate gracefully
+            for child in children:
+                try:
+                    print(f"  Terminating {child.pid} ({child.name()})")
+                    child.terminate()
+                except (psutil.NoSuchProcess, psutil.AccessDenied):
+                    pass
+
+            # Wait a bit for processes to terminate
+            gone, alive = psutil.wait_procs(children, timeout=2)
+
+            # Force kill any remaining processes
+            for child in alive:
+                try:
+                    print(f"  Force killing process {child.pid} ({child.name()})")
+                    child.kill()
+                except (psutil.NoSuchProcess, psutil.AccessDenied):
+                    pass
+
+            # Final wait to ensure cleanup
+            psutil.wait_procs(alive, timeout=1)
+    except ImportError:
+        # psutil not installed, try basic process cleanup
+        try:
+            # Send SIGTERM to all child processes
+            os.killpg(os.getpgid(os.getpid()), signal.SIGTERM)
+        except Exception:
+            pass
+    except Exception as e:
+        print(f"Warning: Error during process cleanup: {e}")
+
+    # List and clean up remaining threads
+    try:
+        import threading
+
+        threads = [t for t in threading.enumerate() if t is not threading.main_thread()]
+        if threads:
+            print(f"\n⚠️  {len(threads)} non-main threads still running:")
+            for t in threads:
+                print(f"  - {t.name} (daemon={t.daemon})")
+
+                # Force cleanup of pytest-timeout threads that block exit
+                if "pytest_timeout" in t.name and not t.daemon:
+                    print(f"  🔧 Converting pytest-timeout thread to daemon: {t.name}")
+                    try:
+                        t.daemon = True
+                        print("     ✓ Converted to daemon thread")
+                    except Exception as e:
+                        print(f"     ✗ Failed: {e}")
+
+        # Check if only daemon threads remain
+        non_daemon = [
+            t for t in threading.enumerate() if t is not threading.main_thread() and not t.daemon
+        ]
+        if non_daemon:
+            print(f"\n⚠️  {len(non_daemon)} non-daemon threads still blocking exit")
+            # Force exit in CI to prevent hanging
+            if os.environ.get("CI") == "true":
+                print("🔨 Forcing exit in CI environment...")
+                os._exit(0)
+    except Exception as e:
+        print(f"Thread cleanup error: {e}")
+
+
+@pytest.fixture
+def auto_cleanup_searcher():
+    """Fixture that automatically cleans up LeannSearcher instances."""
+    searchers = []
+
+    def register(searcher):
+        """Register a searcher for cleanup."""
+        searchers.append(searcher)
+        return searcher
+
+    yield register
+
+    # Cleanup all registered searchers
+    for searcher in searchers:
+        try:
+            searcher.cleanup()
+        except Exception:
+            pass
+
+    # Force garbage collection
+    import gc
+
+    gc.collect()
+    time.sleep(0.1)
+
+
+@pytest.fixture(scope="session", autouse=True)
+def _reap_children():
+    """Reap all child processes at session end as a safety net."""
+    yield
+
+    # Final aggressive cleanup
+    try:
+        import psutil
+
+        me = psutil.Process()
+        kids = me.children(recursive=True)
+        for p in kids:
+            try:
+                p.terminate()
+            except Exception:
+                pass
+
+        _, alive = psutil.wait_procs(kids, timeout=2)
+        for p in alive:
+            try:
+                p.kill()
+            except Exception:
+                pass
+    except Exception:
+        pass
+
+
+@pytest.fixture(autouse=True)
+def cleanup_after_each_test():
+    """Cleanup after each test to prevent resource leaks."""
+    yield
+
+    # Force garbage collection to trigger any __del__ methods
+    import gc
+
+    gc.collect()
+
+    # Give a moment for async cleanup
+    time.sleep(0.1)
+
+
+def pytest_configure(config):
+    """Configure pytest with better timeout handling."""
+    # Set default timeout method to thread if not specified
+    if not config.getoption("--timeout-method", None):
+        config.option.timeout_method = "thread"
+
+    # Add more logging
+    print(f"🔧 Pytest configured at {time.strftime('%Y-%m-%d %H:%M:%S')}")
+    print(f"   Python version: {os.sys.version}")
+    print(f"   Platform: {os.sys.platform}")
+
+
+def pytest_sessionstart(session):
+    """Called after the Session object has been created."""
+    print(f"🏁 Pytest session starting at {time.strftime('%Y-%m-%d %H:%M:%S')}")
+    print(f"   Session ID: {id(session)}")
+
+    # Show initial process state
+    try:
+        import psutil
+
+        current = psutil.Process()
+        print(f"   Current PID: {current.pid}")
+        print(f"   Parent PID: {current.ppid()}")
+        children = current.children(recursive=True)
+        if children:
+            print(f"   ⚠️ Already have {len(children)} child processes at start!")
+    except Exception:
+        pass
+
+
+def pytest_sessionfinish(session, exitstatus):
+    """Called after whole test run finished."""
+    print(f"🏁 Pytest session finishing at {time.strftime('%Y-%m-%d %H:%M:%S')}")
+    print(f"   Exit status: {exitstatus}")
+
+    # Aggressive cleanup before pytest exits
+    print("🧹 Starting aggressive cleanup...")
+
+    # First, clean up child processes
+    try:
+        import psutil
+
+        current = psutil.Process()
+        children = current.children(recursive=True)
+
+        if children:
+            print(f"   Found {len(children)} child processes to clean up:")
+            for child in children:
+                try:
+                    print(f"     - PID {child.pid}: {child.name()} (status: {child.status()})")
+                    child.terminate()
+                except Exception as e:
+                    print(f"     - Failed to terminate {child.pid}: {e}")
+
+            # Wait briefly then kill
+            time.sleep(0.5)
+            _, alive = psutil.wait_procs(children, timeout=1)
+
+            for child in alive:
+                try:
+                    print(f"     - Force killing {child.pid}")
+                    child.kill()
+                except Exception:
+                    pass
+        else:
+            print("   No child processes found")
+
+    except Exception as e:
+        print(f"   Process cleanup error: {e}")
+
+    # Second, clean up problematic threads
+    try:
+        import threading
+
+        threads = [t for t in threading.enumerate() if t is not threading.main_thread()]
+        if threads:
+            print(f"   Found {len(threads)} non-main threads:")
+            for t in threads:
+                print(f"     - {t.name} (daemon={t.daemon})")
+                # Convert pytest-timeout threads to daemon so they don't block exit
+                if "pytest_timeout" in t.name and not t.daemon:
+                    try:
+                        t.daemon = True
+                        print("       ✓ Converted to daemon")
+                    except Exception:
+                        pass
+
+        # Force exit if non-daemon threads remain in CI
+        non_daemon = [
+            t for t in threading.enumerate() if t is not threading.main_thread() and not t.daemon
+        ]
+        if non_daemon and os.environ.get("CI") == "true":
+            print(f"   ⚠️ {len(non_daemon)} non-daemon threads remain, forcing exit...")
+            os._exit(exitstatus or 0)
+
+    except Exception as e:
+        print(f"   Thread cleanup error: {e}")
+
+    print(f"✅ Pytest exiting at {time.strftime('%Y-%m-%d %H:%M:%S')}")
--- a/tests/test_basic.py
+++ b/tests/test_basic.py
@@ -7,6 +7,7 @@ import tempfile
 from pathlib import Path

 import pytest
+from test_timeout import ci_timeout


 def test_imports():
@@ -19,6 +20,7 @@ def test_imports():
    os.environ.get("CI") == "true", reason="Skip model tests in CI to avoid MPS memory issues"
 )
@pytest.mark.parametrize("backend_name", ["hnsw", "diskann"])
+@ci_timeout(120)  # 2 minute timeout for backend tests
 def test_backend_basic(backend_name):
    """Test basic functionality for each backend."""
    from leann.api import LeannBuilder, LeannSearcher, SearchResult
@@ -64,13 +66,11 @@ def test_backend_basic(backend_name):
        assert isinstance(results[0], SearchResult)
        assert "topic 2" in results[0].text or "document" in results[0].text

-        # Ensure cleanup to avoid hanging background servers
-        searcher.cleanup()
-

@pytest.mark.skipif(
    os.environ.get("CI") == "true", reason="Skip model tests in CI to avoid MPS memory issues"
 )
+@ci_timeout(180)  # 3 minute timeout for large index test
 def test_large_index():
    """Test with larger dataset."""
    from leann.api import LeannBuilder, LeannSearcher
@@ -93,5 +93,3 @@ def test_large_index():
        searcher = LeannSearcher(index_path)
        results = searcher.search(["word10 word20"], top_k=10)
        assert len(results[0]) == 10
-        # Cleanup
-        searcher.cleanup()
--- a/tests/test_document_rag.py
+++ b/tests/test_document_rag.py
@@ -9,6 +9,7 @@ import tempfile
 from pathlib import Path

 import pytest
+from test_timeout import ci_timeout


@pytest.fixture
@@ -59,8 +60,9 @@ def test_document_rag_simulated(test_data_dir):

@pytest.mark.skipif(not os.environ.get("OPENAI_API_KEY"), reason="OpenAI API key not available")
@pytest.mark.skipif(
-    os.environ.get("CI") == "true", reason="Skip OpenAI tests in CI to avoid API costs"
+    os.environ.get("CI") == "true", reason="Skip OpenAI embedding tests in CI to avoid hanging"
 )
+@ci_timeout(60)  # 60 second timeout to avoid hanging on OpenAI API calls
 def test_document_rag_openai(test_data_dir):
    """Test document_rag with OpenAI embeddings."""
    with tempfile.TemporaryDirectory() as temp_dir:
--- a/tests/test_readme_examples.py
+++ b/tests/test_readme_examples.py
@@ -8,17 +8,16 @@ import tempfile
 from pathlib import Path

 import pytest
+from test_timeout import ci_timeout


@pytest.mark.parametrize("backend_name", ["hnsw", "diskann"])
+@ci_timeout(90)  # 90 second timeout for this comprehensive test
 def test_readme_basic_example(backend_name):
    """Test the basic example from README.md with both backends."""
    # Skip on macOS CI due to MPS environment issues with all-MiniLM-L6-v2
    if os.environ.get("CI") == "true" and platform.system() == "Darwin":
        pytest.skip("Skipping on macOS CI due to MPS environment issues with all-MiniLM-L6-v2")
-    # Skip DiskANN on CI (Linux runners) due to C++ extension memory/hardware constraints
-    if os.environ.get("CI") == "true" and backend_name == "diskann":
-        pytest.skip("Skip DiskANN tests in CI due to resource constraints and instability")

    # This is the exact code from README (with smaller model for CI)
    from leann import LeannBuilder, LeannChat, LeannSearcher
@@ -62,9 +61,6 @@ def test_readme_basic_example(backend_name):
        # The second text about banana-crocodile should be more relevant
        assert "banana" in results[0].text or "crocodile" in results[0].text

-        # Ensure we cleanup background embedding server
-        searcher.cleanup()
-
        # Chat with your data (using simulated LLM to avoid external dependencies)
        chat = LeannChat(INDEX_PATH, llm_config={"type": "simulated"})
        response = chat.ask("How much storage does LEANN save?", top_k=1)
@@ -72,8 +68,6 @@ def test_readme_basic_example(backend_name):
        # Verify chat works
        assert isinstance(response, str)
        assert len(response) > 0
-        # Cleanup chat resources
-        chat.cleanup()


 def test_readme_imports():
@@ -87,6 +81,7 @@ def test_readme_imports():
    assert callable(LeannChat)


+@ci_timeout(60)  # 60 second timeout
 def test_backend_options():
    """Test different backend options mentioned in documentation."""
    # Skip on macOS CI due to MPS environment issues with all-MiniLM-L6-v2
@@ -123,6 +118,7 @@ def test_backend_options():


@pytest.mark.parametrize("backend_name", ["hnsw", "diskann"])
+@ci_timeout(75)  # 75 second timeout for LLM tests
 def test_llm_config_simulated(backend_name):
    """Test simulated LLM configuration option with both backends."""
    # Skip on macOS CI due to MPS environment issues with all-MiniLM-L6-v2
--- a/tests/test_timeout.py
+++ b/tests/test_timeout.py
@@ -0,0 +1,129 @@
+"""
+Test timeout utilities for CI environments.
+"""
+
+import functools
+import os
+import signal
+import sys
+from typing import Any, Callable
+
+
+def timeout_test(seconds: int = 30):
+    """
+    Decorator to add timeout to test functions, especially useful in CI environments.
+
+    Args:
+        seconds: Timeout in seconds (default: 30)
+    """
+
+    def decorator(func: Callable) -> Callable:
+        @functools.wraps(func)
+        def wrapper(*args: Any, **kwargs: Any) -> Any:
+            # Only apply timeout in CI environment
+            if os.environ.get("CI") != "true":
+                return func(*args, **kwargs)
+
+            # Set up timeout handler
+            def timeout_handler(signum, frame):
+                print(f"\n❌ Test {func.__name__} timed out after {seconds} seconds in CI!")
+                print("This usually indicates a hanging process or infinite loop.")
+                # Try to cleanup any hanging processes
+                try:
+                    import subprocess
+
+                    subprocess.run(
+                        ["pkill", "-f", "embedding_server"], capture_output=True, timeout=2
+                    )
+                    subprocess.run(
+                        ["pkill", "-f", "hnsw_embedding"], capture_output=True, timeout=2
+                    )
+                except Exception:
+                    pass
+                # Exit with timeout code
+                sys.exit(124)  # Standard timeout exit code
+
+            # Set signal handler and alarm
+            old_handler = signal.signal(signal.SIGALRM, timeout_handler)
+            signal.alarm(seconds)
+
+            try:
+                result = func(*args, **kwargs)
+                signal.alarm(0)  # Cancel alarm
+                return result
+            except Exception:
+                signal.alarm(0)  # Cancel alarm on exception
+                raise
+            finally:
+                # Restore original handler
+                signal.signal(signal.SIGALRM, old_handler)
+
+        return wrapper
+
+    return decorator
+
+
+def ci_timeout(seconds: int = 60):
+    """
+    Timeout decorator specifically for CI environments.
+    Uses threading for more reliable timeout handling.
+
+    Args:
+        seconds: Timeout in seconds (default: 60)
+    """
+
+    def decorator(func: Callable) -> Callable:
+        @functools.wraps(func)
+        def wrapper(*args: Any, **kwargs: Any) -> Any:
+            # Only apply in CI
+            if os.environ.get("CI") != "true":
+                return func(*args, **kwargs)
+
+            import threading
+
+            result = [None]
+            exception = [None]
+            finished = threading.Event()
+
+            def target():
+                try:
+                    result[0] = func(*args, **kwargs)
+                except Exception as e:
+                    exception[0] = e
+                finally:
+                    finished.set()
+
+            # Start function in thread
+            thread = threading.Thread(target=target, daemon=True)
+            thread.start()
+
+            # Wait for completion or timeout
+            if not finished.wait(timeout=seconds):
+                print(f"\n💥 CI TIMEOUT: Test {func.__name__} exceeded {seconds}s limit!")
+                print("This usually indicates hanging embedding servers or infinite loops.")
+
+                # Try to cleanup embedding servers
+                try:
+                    import subprocess
+
+                    subprocess.run(
+                        ["pkill", "-9", "-f", "embedding_server"], capture_output=True, timeout=2
+                    )
+                    subprocess.run(
+                        ["pkill", "-9", "-f", "hnsw_embedding"], capture_output=True, timeout=2
+                    )
+                    print("Attempted to kill hanging embedding servers.")
+                except Exception as e:
+                    print(f"Cleanup failed: {e}")
+
+                # Raise TimeoutError instead of sys.exit for better pytest integration
+                raise TimeoutError(f"Test {func.__name__} timed out after {seconds} seconds")
+
+            if exception[0]:
+                raise exception[0]
+
+            return result[0]
+
+        return wrapper
+
+    return decorator
--- a/uv.lock
+++ b/uv.lock
@@ -2223,7 +2223,7 @@ wheels = [

 [[package]]
 name = "leann-backend-diskann"
-version = "0.2.8"
+version = "0.2.6"
 source = { editable = "packages/leann-backend-diskann" }
 dependencies = [
    { name = "leann-core" },
@@ -2235,14 +2235,14 @@ dependencies = [

 [package.metadata]
 requires-dist = [
-    { name = "leann-core", specifier = "==0.2.8" },
+    { name = "leann-core", specifier = "==0.2.6" },
    { name = "numpy" },
    { name = "protobuf", specifier = ">=3.19.0" },
 ]

 [[package]]
 name = "leann-backend-hnsw"
-version = "0.2.8"
+version = "0.2.6"
 source = { editable = "packages/leann-backend-hnsw" }
 dependencies = [
    { name = "leann-core" },
@@ -2255,7 +2255,7 @@ dependencies = [

 [package.metadata]
 requires-dist = [
-    { name = "leann-core", specifier = "==0.2.8" },
+    { name = "leann-core", specifier = "==0.2.6" },
    { name = "msgpack", specifier = ">=1.0.0" },
    { name = "numpy" },
    { name = "pyzmq", specifier = ">=23.0.0" },
@@ -2263,7 +2263,7 @@ requires-dist = [

 [[package]]
 name = "leann-core"
-version = "0.2.8"
+version = "0.2.6"
 source = { editable = "packages/leann-core" }
 dependencies = [
    { name = "accelerate" },
@@ -2272,8 +2272,8 @@ dependencies = [
    { name = "llama-index-core" },
    { name = "llama-index-embeddings-huggingface" },
    { name = "llama-index-readers-file" },
-    { name = "mlx", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'" },
-    { name = "mlx-lm", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'" },
+    { name = "mlx", marker = "sys_platform == 'darwin'" },
+    { name = "mlx-lm", marker = "sys_platform == 'darwin'" },
    { name = "msgpack" },
    { name = "nbconvert" },
    { name = "numpy", version = "2.0.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
@@ -2302,8 +2302,8 @@ requires-dist = [
    { name = "llama-index-core", specifier = ">=0.12.0" },
    { name = "llama-index-embeddings-huggingface", specifier = ">=0.5.5" },
    { name = "llama-index-readers-file", specifier = ">=0.4.0" },
-    { name = "mlx", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'", specifier = ">=0.26.3" },
-    { name = "mlx-lm", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'", specifier = ">=0.26.0" },
+    { name = "mlx", marker = "sys_platform == 'darwin'", specifier = ">=0.26.3" },
+    { name = "mlx-lm", marker = "sys_platform == 'darwin'", specifier = ">=0.26.0" },
    { name = "msgpack", specifier = ">=1.0.0" },
    { name = "nbconvert", specifier = ">=7.0.0" },
    { name = "numpy", specifier = ">=1.20.0" },
@@ -2343,8 +2343,8 @@ dependencies = [
    { name = "llama-index-embeddings-huggingface" },
    { name = "llama-index-readers-file" },
    { name = "llama-index-vector-stores-faiss" },
-    { name = "mlx", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'" },
-    { name = "mlx-lm", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'" },
+    { name = "mlx", marker = "sys_platform == 'darwin'" },
+    { name = "mlx-lm", marker = "sys_platform == 'darwin'" },
    { name = "msgpack" },
    { name = "nbconvert" },
    { name = "numpy", version = "2.0.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
@@ -2425,8 +2425,8 @@ requires-dist = [
    { name = "llama-index-readers-file", marker = "extra == 'test'", specifier = ">=0.4.0" },
    { name = "llama-index-vector-stores-faiss", specifier = ">=0.4.0" },
    { name = "matplotlib", marker = "extra == 'dev'" },
-    { name = "mlx", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'", specifier = ">=0.26.3" },
-    { name = "mlx-lm", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'", specifier = ">=0.26.0" },
+    { name = "mlx", marker = "sys_platform == 'darwin'", specifier = ">=0.26.3" },
+    { name = "mlx-lm", marker = "sys_platform == 'darwin'", specifier = ">=0.26.0" },
    { name = "msgpack", specifier = ">=1.1.1" },
    { name = "nbconvert", specifier = ">=7.16.6" },
    { name = "numpy", specifier = ">=1.26.0" },
@@ -2451,7 +2451,7 @@ requires-dist = [
    { name = "python-docx", marker = "extra == 'documents'", specifier = ">=0.8.11" },
    { name = "python-dotenv", marker = "extra == 'test'", specifier = ">=1.0.0" },
    { name = "requests", specifier = ">=2.25.0" },
-    { name = "ruff", marker = "extra == 'dev'", specifier = "==0.12.7" },
+    { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.1.0" },
    { name = "sentence-transformers", specifier = ">=2.2.0" },
    { name = "sentence-transformers", marker = "extra == 'test'", specifier = ">=2.2.0" },
    { name = "sglang" },
@@ -4364,9 +4364,9 @@ wheels = [
 name = "pybind11"
 version = "3.0.0"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/ef/83/698d120e257a116f2472c710932023ad779409adf2734d2e940f34eea2c5/pybind11-3.0.0.tar.gz", hash = "sha256:c3f07bce3ada51c3e4b76badfa85df11688d12c46111f9d242bc5c9415af7862", size = 544819 }
+sdist = { url = "https://files.pythonhosted.org/packages/ef/83/698d120e257a116f2472c710932023ad779409adf2734d2e940f34eea2c5/pybind11-3.0.0.tar.gz", hash = "sha256:c3f07bce3ada51c3e4b76badfa85df11688d12c46111f9d242bc5c9415af7862", size = 544819, upload-time = "2025-07-10T16:52:09.335Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/41/9c/85f50a5476832c3efc67b6d7997808388236ae4754bf53e1749b3bc27577/pybind11-3.0.0-py3-none-any.whl", hash = "sha256:7c5cac504da5a701b5163f0e6a7ba736c713a096a5378383c5b4b064b753f607", size = 292118 },
+    { url = "https://files.pythonhosted.org/packages/41/9c/85f50a5476832c3efc67b6d7997808388236ae4754bf53e1749b3bc27577/pybind11-3.0.0-py3-none-any.whl", hash = "sha256:7c5cac504da5a701b5163f0e6a7ba736c713a096a5378383c5b4b064b753f607", size = 292118, upload-time = "2025-07-10T16:52:07.828Z" },
 ]

 [[package]]
@@ -5215,27 +5215,27 @@ wheels = [

 [[package]]
 name = "ruff"
-version = "0.12.7"
+version = "0.12.5"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/a1/81/0bd3594fa0f690466e41bd033bdcdf86cba8288345ac77ad4afbe5ec743a/ruff-0.12.7.tar.gz", hash = "sha256:1fc3193f238bc2d7968772c82831a4ff69252f673be371fb49663f0068b7ec71", size = 5197814 }
+sdist = { url = "https://files.pythonhosted.org/packages/30/cd/01015eb5034605fd98d829c5839ec2c6b4582b479707f7c1c2af861e8258/ruff-0.12.5.tar.gz", hash = "sha256:b209db6102b66f13625940b7f8c7d0f18e20039bb7f6101fbdac935c9612057e", size = 5170722 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/e1/d2/6cb35e9c85e7a91e8d22ab32ae07ac39cc34a71f1009a6f9e4a2a019e602/ruff-0.12.7-py3-none-linux_armv6l.whl", hash = "sha256:76e4f31529899b8c434c3c1dede98c4483b89590e15fb49f2d46183801565303", size = 11852189 },
-    { url = "https://files.pythonhosted.org/packages/63/5b/a4136b9921aa84638f1a6be7fb086f8cad0fde538ba76bda3682f2599a2f/ruff-0.12.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:789b7a03e72507c54fb3ba6209e4bb36517b90f1a3569ea17084e3fd295500fb", size = 12519389 },
-    { url = "https://files.pythonhosted.org/packages/a8/c9/3e24a8472484269b6b1821794141f879c54645a111ded4b6f58f9ab0705f/ruff-0.12.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:2e1c2a3b8626339bb6369116e7030a4cf194ea48f49b64bb505732a7fce4f4e3", size = 11743384 },
-    { url = "https://files.pythonhosted.org/packages/26/7c/458dd25deeb3452c43eaee853c0b17a1e84169f8021a26d500ead77964fd/ruff-0.12.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:32dec41817623d388e645612ec70d5757a6d9c035f3744a52c7b195a57e03860", size = 11943759 },
-    { url = "https://files.pythonhosted.org/packages/7f/8b/658798472ef260ca050e400ab96ef7e85c366c39cf3dfbef4d0a46a528b6/ruff-0.12.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47ef751f722053a5df5fa48d412dbb54d41ab9b17875c6840a58ec63ff0c247c", size = 11654028 },
-    { url = "https://files.pythonhosted.org/packages/a8/86/9c2336f13b2a3326d06d39178fd3448dcc7025f82514d1b15816fe42bfe8/ruff-0.12.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a828a5fc25a3efd3e1ff7b241fd392686c9386f20e5ac90aa9234a5faa12c423", size = 13225209 },
-    { url = "https://files.pythonhosted.org/packages/76/69/df73f65f53d6c463b19b6b312fd2391dc36425d926ec237a7ed028a90fc1/ruff-0.12.7-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:5726f59b171111fa6a69d82aef48f00b56598b03a22f0f4170664ff4d8298efb", size = 14182353 },
-    { url = "https://files.pythonhosted.org/packages/58/1e/de6cda406d99fea84b66811c189b5ea139814b98125b052424b55d28a41c/ruff-0.12.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:74e6f5c04c4dd4aba223f4fe6e7104f79e0eebf7d307e4f9b18c18362124bccd", size = 13631555 },
-    { url = "https://files.pythonhosted.org/packages/6f/ae/625d46d5164a6cc9261945a5e89df24457dc8262539ace3ac36c40f0b51e/ruff-0.12.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d0bfe4e77fba61bf2ccadf8cf005d6133e3ce08793bbe870dd1c734f2699a3e", size = 12667556 },
-    { url = "https://files.pythonhosted.org/packages/55/bf/9cb1ea5e3066779e42ade8d0cd3d3b0582a5720a814ae1586f85014656b6/ruff-0.12.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:06bfb01e1623bf7f59ea749a841da56f8f653d641bfd046edee32ede7ff6c606", size = 12939784 },
-    { url = "https://files.pythonhosted.org/packages/55/7f/7ead2663be5627c04be83754c4f3096603bf5e99ed856c7cd29618c691bd/ruff-0.12.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:e41df94a957d50083fd09b916d6e89e497246698c3f3d5c681c8b3e7b9bb4ac8", size = 11771356 },
-    { url = "https://files.pythonhosted.org/packages/17/40/a95352ea16edf78cd3a938085dccc55df692a4d8ba1b3af7accbe2c806b0/ruff-0.12.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:4000623300563c709458d0ce170c3d0d788c23a058912f28bbadc6f905d67afa", size = 11612124 },
-    { url = "https://files.pythonhosted.org/packages/4d/74/633b04871c669e23b8917877e812376827c06df866e1677f15abfadc95cb/ruff-0.12.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:69ffe0e5f9b2cf2b8e289a3f8945b402a1b19eff24ec389f45f23c42a3dd6fb5", size = 12479945 },
-    { url = "https://files.pythonhosted.org/packages/be/34/c3ef2d7799c9778b835a76189c6f53c179d3bdebc8c65288c29032e03613/ruff-0.12.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:a07a5c8ffa2611a52732bdc67bf88e243abd84fe2d7f6daef3826b59abbfeda4", size = 12998677 },
-    { url = "https://files.pythonhosted.org/packages/77/ab/aca2e756ad7b09b3d662a41773f3edcbd262872a4fc81f920dc1ffa44541/ruff-0.12.7-py3-none-win32.whl", hash = "sha256:c928f1b2ec59fb77dfdf70e0419408898b63998789cc98197e15f560b9e77f77", size = 11756687 },
-    { url = "https://files.pythonhosted.org/packages/b4/71/26d45a5042bc71db22ddd8252ca9d01e9ca454f230e2996bb04f16d72799/ruff-0.12.7-py3-none-win_amd64.whl", hash = "sha256:9c18f3d707ee9edf89da76131956aba1270c6348bfee8f6c647de841eac7194f", size = 12912365 },
-    { url = "https://files.pythonhosted.org/packages/4c/9b/0b8aa09817b63e78d94b4977f18b1fcaead3165a5ee49251c5d5c245bb2d/ruff-0.12.7-py3-none-win_arm64.whl", hash = "sha256:dfce05101dbd11833a0776716d5d1578641b7fddb537fe7fa956ab85d1769b69", size = 11982083 },
+    { url = "https://files.pythonhosted.org/packages/d4/de/ad2f68f0798ff15dd8c0bcc2889558970d9a685b3249565a937cd820ad34/ruff-0.12.5-py3-none-linux_armv6l.whl", hash = "sha256:1de2c887e9dec6cb31fcb9948299de5b2db38144e66403b9660c9548a67abd92", size = 11819133 },
+    { url = "https://files.pythonhosted.org/packages/f8/fc/c6b65cd0e7fbe60f17e7ad619dca796aa49fbca34bb9bea5f8faf1ec2643/ruff-0.12.5-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:d1ab65e7d8152f519e7dea4de892317c9da7a108da1c56b6a3c1d5e7cf4c5e9a", size = 12501114 },
+    { url = "https://files.pythonhosted.org/packages/c5/de/c6bec1dce5ead9f9e6a946ea15e8d698c35f19edc508289d70a577921b30/ruff-0.12.5-py3-none-macosx_11_0_arm64.whl", hash = "sha256:962775ed5b27c7aa3fdc0d8f4d4433deae7659ef99ea20f783d666e77338b8cf", size = 11716873 },
+    { url = "https://files.pythonhosted.org/packages/a1/16/cf372d2ebe91e4eb5b82a2275c3acfa879e0566a7ac94d331ea37b765ac8/ruff-0.12.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:73b4cae449597e7195a49eb1cdca89fd9fbb16140c7579899e87f4c85bf82f73", size = 11958829 },
+    { url = "https://files.pythonhosted.org/packages/25/bf/cd07e8f6a3a6ec746c62556b4c4b79eeb9b0328b362bb8431b7b8afd3856/ruff-0.12.5-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8b13489c3dc50de5e2d40110c0cce371e00186b880842e245186ca862bf9a1ac", size = 11626619 },
+    { url = "https://files.pythonhosted.org/packages/d8/c9/c2ccb3b8cbb5661ffda6925f81a13edbb786e623876141b04919d1128370/ruff-0.12.5-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f1504fea81461cf4841778b3ef0a078757602a3b3ea4b008feb1308cb3f23e08", size = 13221894 },
+    { url = "https://files.pythonhosted.org/packages/6b/58/68a5be2c8e5590ecdad922b2bcd5583af19ba648f7648f95c51c3c1eca81/ruff-0.12.5-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:c7da4129016ae26c32dfcbd5b671fe652b5ab7fc40095d80dcff78175e7eddd4", size = 14163909 },
+    { url = "https://files.pythonhosted.org/packages/bd/d1/ef6b19622009ba8386fdb792c0743f709cf917b0b2f1400589cbe4739a33/ruff-0.12.5-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:ca972c80f7ebcfd8af75a0f18b17c42d9f1ef203d163669150453f50ca98ab7b", size = 13583652 },
+    { url = "https://files.pythonhosted.org/packages/62/e3/1c98c566fe6809a0c83751d825a03727f242cdbe0d142c9e292725585521/ruff-0.12.5-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8dbbf9f25dfb501f4237ae7501d6364b76a01341c6f1b2cd6764fe449124bb2a", size = 12700451 },
+    { url = "https://files.pythonhosted.org/packages/24/ff/96058f6506aac0fbc0d0fc0d60b0d0bd746240a0594657a2d94ad28033ba/ruff-0.12.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2c47dea6ae39421851685141ba9734767f960113d51e83fd7bb9958d5be8763a", size = 12937465 },
+    { url = "https://files.pythonhosted.org/packages/eb/d3/68bc5e7ab96c94b3589d1789f2dd6dd4b27b263310019529ac9be1e8f31b/ruff-0.12.5-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:c5076aa0e61e30f848846f0265c873c249d4b558105b221be1828f9f79903dc5", size = 11771136 },
+    { url = "https://files.pythonhosted.org/packages/52/75/7356af30a14584981cabfefcf6106dea98cec9a7af4acb5daaf4b114845f/ruff-0.12.5-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:a5a4c7830dadd3d8c39b1cc85386e2c1e62344f20766be6f173c22fb5f72f293", size = 11601644 },
+    { url = "https://files.pythonhosted.org/packages/c2/67/91c71d27205871737cae11025ee2b098f512104e26ffd8656fd93d0ada0a/ruff-0.12.5-py3-none-musllinux_1_2_i686.whl", hash = "sha256:46699f73c2b5b137b9dc0fc1a190b43e35b008b398c6066ea1350cce6326adcb", size = 12478068 },
+    { url = "https://files.pythonhosted.org/packages/34/04/b6b00383cf2f48e8e78e14eb258942fdf2a9bf0287fbf5cdd398b749193a/ruff-0.12.5-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:5a655a0a0d396f0f072faafc18ebd59adde8ca85fb848dc1b0d9f024b9c4d3bb", size = 12991537 },
+    { url = "https://files.pythonhosted.org/packages/3e/b9/053d6445dc7544fb6594785056d8ece61daae7214859ada4a152ad56b6e0/ruff-0.12.5-py3-none-win32.whl", hash = "sha256:dfeb2627c459b0b78ca2bbdc38dd11cc9a0a88bf91db982058b26ce41714ffa9", size = 11751575 },
+    { url = "https://files.pythonhosted.org/packages/bc/0f/ab16e8259493137598b9149734fec2e06fdeda9837e6f634f5c4e35916da/ruff-0.12.5-py3-none-win_amd64.whl", hash = "sha256:ae0d90cf5f49466c954991b9d8b953bd093c32c27608e409ae3564c63c5306a5", size = 12882273 },
+    { url = "https://files.pythonhosted.org/packages/00/db/c376b0661c24cf770cb8815268190668ec1330eba8374a126ceef8c72d55/ruff-0.12.5-py3-none-win_arm64.whl", hash = "sha256:48cdbfc633de2c5c37d9f090ba3b352d1576b0015bfc3bc98eaf230275b7e805", size = 11951564 },
 ]

 [[package]]
Author	SHA1	Message	Date
Andy Lee	d9e5d5d6aa	Merge branch 'main' into feature/graph-partition-support	2025-08-11 01:46:31 -07:00
Andy Lee	a437f558a3	fix: handle non-daemon threads blocking process exit The root cause was pytest-timeout creating non-daemon threads that prevented the Python process from exiting, even after all tests completed. Fixes: 1. Configure pytest-timeout to use 'thread' method instead of default - Avoids creating problematic non-daemon threads 2. Add aggressive thread cleanup in conftest.py - Convert pytest-timeout threads to daemon threads - Force exit with os._exit(0) in CI if non-daemon threads remain 3. Enhanced cleanup in both global_test_cleanup and pytest_sessionfinish - Detect and handle stuck threads - Clear diagnostics about what's blocking exit The issue was that even though tests finished in 51 seconds, a non-daemon thread 'pytest_timeout tests/test_readme_examples.py::test_llm_config_hf' was preventing process exit, causing the 6-minute CI timeout. This should finally solve the hanging CI problem.	2025-08-08 23:20:52 -07:00
Andy Lee	742c9baabc	fix: increase outer timeout to 360s to respect pytest's 300s timeout The outer shell timeout must be larger than pytest's internal timeout (300s) to allow pytest to handle its own timeout gracefully and perform cleanup. Changes: - Increased outer timeout from 180s to 360s (300s + 60s buffer) - Made timeouts configurable via environment variables - Added clear documentation about timeout hierarchy - Display timeout configuration at runtime Timeout hierarchy: 1. Individual test: 20s (markers) 2. Pytest session: 300s (pyproject.toml) 3. Outer shell: 360s (for cleanup) 4. GitHub Actions: 6 hours (default) This prevents the outer timeout from killing pytest before it can finish its own timeout handling, which was likely causing the hanging issues.	2025-08-08 22:48:40 -07:00
Andy Lee	60eef4b440	fix: add diagnostic script (force add to override .gitignore) The diagnose_hang.sh script needs to be in git for CI to use it. Using -f to override *.sh rule in .gitignore.	2025-08-08 21:27:04 -07:00
Andy Lee	f2c5355c73	feat: add comprehensive debugging capabilities with tmate integration 1. Tmate SSH Debugging: - Added manual workflow_dispatch trigger with debug_enabled option - Integrated mxschmitt/action-tmate@v3 for SSH access to CI runner - Can be triggered manually or by adding [debug] to commit message - Detached mode with 30min timeout, limited to actor only - Also triggers on test failure when debug is enabled 2. Enhanced Pytest Output: - Added --capture=no to see real-time output - Added --log-cli-level=DEBUG for maximum verbosity - Added --tb=short for cleaner tracebacks - Pipe output to tee for both display and logging - Show last 20 lines of output on completion 3. Environment Diagnostics: - Export PYTHONUNBUFFERED=1 for immediate output - Show Python/Pytest versions at start - Display relevant environment variables - Check network ports before/after tests 4. Diagnostic Script: - Created scripts/diagnose_hang.sh for comprehensive system checks - Shows processes, network, file descriptors, memory, ZMQ status - Automatically runs on timeout for detailed debugging info This allows debugging CI hangs via SSH when needed while providing extensive logging by default.	2025-08-08 21:25:58 -07:00
Andy Lee	439debbd3f	fix: add extensive logging and fix subprocess PIPE blocking 1. CI Logging Enhancements: - Added comprehensive diagnostics with process tree, network listeners, file descriptors - Added timestamps at every stage (before/during/after pytest) - Added trap EXIT to always show diagnostics - Added immediate process checks after pytest finishes - Added sub-shell execution with immediate cleanup 2. Fixed Subprocess PIPE Blocking: - Changed Colab mode from PIPE to DEVNULL to prevent blocking - PIPE without reading can cause parent process to wait indefinitely 3. Pytest Session Hooks: - Added pytest_sessionstart to log initial state - Added pytest_sessionfinish for aggressive cleanup before exit - Shows all child processes and their status This should reveal exactly where the hang is happening.	2025-08-08 18:55:50 -07:00
Andy Lee	a35bfb0354	fix: comprehensive ZMQ timeout and cleanup fixes based on detailed analysis Based on excellent diagnostic suggestions, implemented multiple fixes: 1. Diagnostics: - Added faulthandler to dump stack traces 10s before CI timeout - Enhanced CI script with trap handler to show processes/network on timeout - Added diag() function to capture pstree, processes, network listeners 2. ZMQ Socket Timeouts (critical fix): - Added RCVTIMEO=1000ms and SNDTIMEO=1000ms to all client sockets - Added IMMEDIATE=1 to avoid connection blocking - Reduced searcher timeout from 30s to 5s - This prevents infinite blocking on recv/send operations 3. Context.instance() Fix (major issue): - NEVER call term() or destroy() on Context.instance() - This was causing blocking as it waits for ALL sockets to close - Now only set linger=0 without terminating 4. Enhanced Process Cleanup: - Added _reap_children fixture for aggressive session-end cleanup - Better recursive child process termination - Added final wait to ensure cleanup completes The 180s timeout was happening because: - ZMQ recv() was blocking indefinitely without timeout - Context.instance().term() was waiting for all sockets - Child processes weren't being fully cleaned up These changes should prevent the hanging completely.	2025-08-08 18:29:09 -07:00
Andy Lee	a6dad47280	fix: address root cause of test hanging - improper ZMQ/C++ resource cleanup Fixed the actual root cause instead of just masking it in tests: 1. Root Problem: - C++ side's ZmqDistanceComputer creates ZMQ connections but doesn't clean them - Python 3.9/3.13 are more sensitive to cleanup timing during shutdown 2. Core Fixes in SearcherBase and LeannSearcher: - Added cleanup() method to BaseSearcher that cleans ZMQ and embedding server - LeannSearcher.cleanup() now also handles ZMQ context cleanup - Both HNSW and DiskANN searchers now properly delete C++ index objects 3. Backend-Specific Cleanup: - HNSWSearcher.cleanup(): Deletes self.index to trigger C++ destructors - DiskannSearcher.cleanup(): Deletes self._index and resets state - Both force garbage collection after deletion 4. Test Infrastructure: - Added auto_cleanup_searcher fixture for explicit resource management - Global cleanup now more aggressive with ZMQ context destruction This is the proper fix - cleaning up resources at the source, not just working around the issue in tests. The hanging was caused by C++ side ZMQ connections not being properly terminated when is_recompute=True.	2025-08-08 17:54:03 -07:00
Andy Lee	131f10b286	Merge branch 'main' into feature/graph-partition-support	2025-08-08 16:02:54 -07:00
Andy Lee	e3762458fc	fix: prevent test runner hanging on Python 3.9/3.13 due to ZMQ and process cleanup issues Based on excellent analysis from user, implemented comprehensive fixes: 1. ZMQ Socket Cleanup: - Set LINGER=0 on all ZMQ sockets (client and server) - Use try-finally blocks to ensure socket.close() and context.term() - Prevents blocking on exit when ZMQ contexts have pending operations 2. Global Test Cleanup: - Added tests/conftest.py with session-scoped cleanup fixture - Cleans up leftover ZMQ contexts and child processes after all tests - Lists remaining threads for debugging 3. CI Improvements: - Apply timeout to ALL Python versions on Linux (not just 3.13) - Increased timeout to 180s for better reliability - Added process cleanup (pkill) on timeout 4. Dependencies: - Added psutil>=5.9.0 to test dependencies for process management Root cause: Python 3.9/3.13 are more sensitive to cleanup timing during interpreter shutdown. ZMQ's default LINGER=-1 was blocking exit, and atexit handlers were unreliable for cleanup. This should resolve the 'all tests pass but CI hangs' issue.	2025-08-08 15:57:22 -07:00
Andy Lee	05e1efa00a	ci: use timeout command only on Linux for Python 3.13 debugging - Added OS check ( == Linux) before using timeout command - macOS doesn't have GNU timeout by default, so skip it there - Still run tests with verbose output on all platforms - This avoids 'timeout: command not found' error on macOS CI	2025-08-08 11:34:38 -07:00
Andy Lee	6363fc5f83	fix: correct pytest async plugin dependency - Changed pytest-anyio to anyio (the correct package name) - The anyio package includes built-in pytest plugin support - pytest-anyio==0.0.0 was causing dependency resolution failures - anyio>=4.0 provides the pytest plugin for async test support	2025-08-08 11:23:02 -07:00
Andy Lee	319dc34a24	ci: add timeout debugging for Python 3.13 pytest hanging issue - Added timeout --signal=INT to pytest runs on Python 3.13 - This will interrupt hanging tests and provide full traceback - Added extra debugging steps for Python 3.13 to isolate the issue: - Test collection only with timeout - Run single simple test with timeout - Reference: https://youtu.be/QRywzsBftfc (debugging hanging tests) - Will help identify if hanging occurs during collection or execution	2025-08-08 11:17:54 -07:00
Andy Lee	72a5993f02	fix: update pytest and dependencies for Python 3.13 compatibility - Updated pytest to >=8.3.0 (required for Python 3.13 support) - Updated pytest-cov to >=5.0 - Updated pytest-xdist to >=3.5 - Updated pytest-timeout to >=2.3 - Added pytest-anyio>=4.0 for async test support with Python 3.13 - These version requirements ensure compatibility with Python 3.13 - No need to disable Python 3.13 in CI matrix	2025-08-08 11:13:11 -07:00
Andy Lee	250272a3be	fix: prevent test_document_rag_openai from hanging - Skip the test in CI environment to avoid hanging on OpenAI API calls - Add 60-second timeout decorator for local runs - Import ci_timeout from test_timeout module - The test uses OpenAI embeddings which can hang due to network/API issues	2025-08-08 10:28:19 -07:00