fix(hnsw-server): robust ZMQ responses to prevent size mismatch and segfault in CI

sky: expand leann-build.yaml with configurable params and flags (backend, recompute, compact, embedding options)
docs: dedupe recomputation guidance; keep single Low-resource setups section
2025-08-13 14:53:46 -07:00 · 2025-08-13 14:18:48 -07:00 · 2025-08-13 14:10:10 -07:00 · 2025-08-13 14:09:05 -07:00 · 2025-08-13 14:08:23 -07:00 · 2025-08-13 14:06:22 -07:00
6 changed files with 285 additions and 71 deletions
--- a/README.md
+++ b/README.md
@@ -71,6 +71,8 @@ source .venv/bin/activate
 uv pip install leann
 ```

+> Low-resource? See “Low-resource setups” in the [Configuration Guide](docs/configuration-guide.md#low-resource-setups).
+
 <details>
 <summary>
 <strong>🔧 Build from Source (Recommended for development)</strong>
--- a/docs/configuration-guide.md
+++ b/docs/configuration-guide.md
@@ -259,24 +259,80 @@ Every configuration choice involves trade-offs:

 The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.

-## Deep Dive: Critical Configuration Decisions
+## Low-resource setups

-### When to Disable Recomputation
+If you don’t have a local GPU or builds/searches are too slow, use one or more of the options below.

-LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
+### 1) Use OpenAI embeddings (no local compute)
+
+Fastest path with zero local GPU requirements. Set your API key and use OpenAI embeddings during build and search:

 ```bash
--no-recompute  # Disable selective recomputation
+export OPENAI_API_KEY=sk-...
+
+# Build with OpenAI embeddings
+leann build my-index \
+  --embedding-mode openai \
+  --embedding-model text-embedding-3-small
+
+# Search with OpenAI embeddings (recompute at query time)
+leann search my-index "your query" \
+  --recompute-embeddings
 ```

-**Trade-offs**:
- **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
- **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search
+### 2) Run remote builds with SkyPilot (cloud GPU)

-**Disable when**:
- You have abundant storage and memory
- Need extremely low latency (< 100ms)
- Running a read-heavy workload where storage cost is acceptable
+Offload embedding generation and index building to a GPU VM using SkyPilot. A template is provided at `sky/leann-build.yaml`.
+
+```bash
+# One-time: install and configure SkyPilot
+pip install skypilot
+sky launch -c leann-gpu sky/leann-build.yaml
+
+# Build remotely (template installs uv + leann CLI)
+sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32"
+```
+
+Details: see “Running Builds on SkyPilot (Optional)” below.
+
+### 3) Disable recomputation to trade storage for speed
+
+If you need lower latency and have more storage/memory, disable recomputation. This stores full embeddings and avoids recomputing at search time.
+
+```bash
+# Build without recomputation (HNSW requires non-compact in this mode)
+leann build my-index --no-recompute --no-compact
+
+# Search without recomputation
+leann search my-index "your query" --no-recompute
+```
+
+Trade-offs: lower query-time latency, but significantly higher storage usage.
+
+## Running Builds on SkyPilot (Optional)
+
+You can offload embedding generation and index building to a cloud GPU VM using SkyPilot, without changing any LEANN code. This is useful when your local machine lacks a GPU or you want faster throughput.
+
+### Quick Start
+
+1) Install SkyPilot by following their docs (`pip install skypilot`), then configure cloud credentials.
+
+2) Use the provided SkyPilot template:
+
+```bash
+sky launch -c leann-gpu sky/leann-build.yaml
+```
+
+3) On the remote, either put your data under the mounted path or adjust `file_mounts` in `sky/leann-build.yaml`. Then run the LEANN build:
+
+```bash
+sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32"
+```
+
+Notes:
+- The template installs `uv` and the `leann` CLI globally on the remote instance.
+- Change the `accelerators` and `cloud` settings in `sky/leann-build.yaml` to match your budget/availability (e.g., `A10G:1`, `A100:1`, or CPU-only if you prefer).
+- You can also build with `diskann` by switching `--backend diskann`.

 ## Further Reading

--- a/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
+++ b/packages/leann-backend-hnsw/leann_backend_hnsw/hnsw_embedding_server.py
@@ -95,6 +95,8 @@ def create_hnsw_embedding_server(
        passage_sources.append(source_copy)

    passages = PassageManager(passage_sources)
+    # Use index dimensions from metadata for shaping fallback responses
+    embedding_dim: int = int(meta.get("dimensions", 0))
    logger.info(
        f"Loaded PassageManager with {len(passages.global_offset_map)} passages from metadata"
    )
@@ -109,6 +111,9 @@ def create_hnsw_embedding_server(
        socket.setsockopt(zmq.RCVTIMEO, 300000)
        socket.setsockopt(zmq.SNDTIMEO, 300000)

+        # Track last request type for safe fallback responses on exceptions
+        last_request_type = "unknown"  # one of: 'text', 'distance', 'embedding', 'unknown'
+        last_request_length = 0
        while True:
            try:
                message_bytes = socket.recv()
@@ -121,6 +126,8 @@ def create_hnsw_embedding_server(
                if isinstance(request_payload, list) and len(request_payload) > 0:
                    # Check if this is a direct text request (list of strings)
                    if all(isinstance(item, str) for item in request_payload):
+                        last_request_type = "text"
+                        last_request_length = len(request_payload)
                        logger.info(
                            f"Processing direct text embedding request for {len(request_payload)} texts in {embedding_mode} mode"
                        )
@@ -145,43 +152,66 @@ def create_hnsw_embedding_server(
                ):
                    node_ids = request_payload[0]
                    query_vector = np.array(request_payload[1], dtype=np.float32)
+                    last_request_type = "distance"
+                    last_request_length = len(node_ids)

                    logger.debug("Distance calculation request received")
                    logger.debug(f"    Node IDs: {node_ids}")
                    logger.debug(f"    Query vector dim: {len(query_vector)}")

-                    # Get embeddings for node IDs
-                    texts = []
-                    for nid in node_ids:
+                    # Get embeddings for node IDs, tolerate missing IDs
+                    texts: list[str] = []
+                    found_indices: list[int] = []
+                    for idx, nid in enumerate(node_ids):
                        try:
                            passage_data = passages.get_passage(str(nid))
-                            txt = passage_data["text"]
-                            texts.append(txt)
+                            txt = passage_data.get("text", "")
+                            if isinstance(txt, str) and len(txt) > 0:
+                                texts.append(txt)
+                                found_indices.append(idx)
+                            else:
+                                logger.error(f"Empty text for passage ID {nid}")
                        except KeyError:
                            logger.error(f"Passage ID {nid} not found")
-                            raise RuntimeError(f"FATAL: Passage with ID {nid} not found")
                        except Exception as e:
                            logger.error(f"Exception looking up passage ID {nid}: {e}")
-                            raise

-                    # Process embeddings
-                    embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
-                    logger.info(
-                        f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
+                    # Prepare full-length response distances with safe fallbacks
+                    large_distance = 1e9
+                    response_distances = [large_distance] * len(node_ids)
+
+                    if texts:
+                        try:
+                            # Process embeddings only for found indices
+                            embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                            logger.info(
+                                f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
+                            )
+
+                            # Calculate distances for found embeddings only
+                            if distance_metric == "l2":
+                                partial_distances = np.sum(
+                                    np.square(embeddings - query_vector.reshape(1, -1)), axis=1
+                                )
+                            else:  # mips or cosine
+                                partial_distances = -np.dot(embeddings, query_vector)
+
+                            # Place computed distances back into the full response array
+                            for pos, dval in zip(
+                                found_indices, partial_distances.flatten().tolist()
+                            ):
+                                response_distances[pos] = float(dval)
+                        except Exception as e:
+                            logger.error(
+                                f"Distance computation error, falling back to large distances: {e}"
+                            )
+
+                    # Always reply with exactly len(node_ids) distances
+                    response_bytes = msgpack.packb([response_distances], use_single_float=True)
+                    logger.debug(
+                        f"Sending distance response with {len(response_distances)} distances (found={len(found_indices)})"
                    )

-                    # Calculate distances
-                    if distance_metric == "l2":
-                        distances = np.sum(
-                            np.square(embeddings - query_vector.reshape(1, -1)), axis=1
-                        )
-                    else:  # mips or cosine
-                        distances = -np.dot(embeddings, query_vector)
-
-                    response_payload = distances.flatten().tolist()
-                    response_bytes = msgpack.packb([response_payload], use_single_float=True)
-                    logger.debug(f"Sending distance response with {len(distances)} distances")
-
                    socket.send(response_bytes)
                    e2e_end = time.time()
                    logger.info(f"⏱️  Distance calculation E2E time: {e2e_end - e2e_start:.6f}s")
@@ -201,40 +231,61 @@ def create_hnsw_embedding_server(

                node_ids = request_payload[0]
                logger.debug(f"Request for {len(node_ids)} node embeddings")
+                last_request_type = "embedding"
+                last_request_length = len(node_ids)

-                # Look up texts by node IDs
-                texts = []
-                for nid in node_ids:
+                # Allocate output buffer (B, D) and fill with zeros for robustness
+                if embedding_dim <= 0:
+                    logger.error("Embedding dimension unknown; cannot serve embedding request")
+                    dims = [0, 0]
+                    data = []
+                else:
+                    dims = [len(node_ids), embedding_dim]
+                    data = [0.0] * (dims[0] * dims[1])
+
+                # Look up texts by node IDs; compute embeddings where available
+                texts: list[str] = []
+                found_indices: list[int] = []
+                for idx, nid in enumerate(node_ids):
                    try:
                        passage_data = passages.get_passage(str(nid))
-                        txt = passage_data["text"]
-                        if not txt:
-                            raise RuntimeError(f"FATAL: Empty text for passage ID {nid}")
-                        texts.append(txt)
+                        txt = passage_data.get("text", "")
+                        if isinstance(txt, str) and len(txt) > 0:
+                            texts.append(txt)
+                            found_indices.append(idx)
+                        else:
+                            logger.error(f"Empty text for passage ID {nid}")
                    except KeyError:
-                        raise RuntimeError(f"FATAL: Passage with ID {nid} not found")
+                        logger.error(f"Passage with ID {nid} not found")
                    except Exception as e:
                        logger.error(f"Exception looking up passage ID {nid}: {e}")
-                        raise

-                # Process embeddings
-                embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
-                logger.info(
-                    f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
-                )
+                if texts:
+                    try:
+                        # Process embeddings for found texts only
+                        embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                        logger.info(
+                            f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
+                        )

-                # Serialization and response
-                if np.isnan(embeddings).any() or np.isinf(embeddings).any():
-                    logger.error(
-                        f"NaN or Inf detected in embeddings! Requested IDs: {node_ids[:5]}..."
-                    )
-                    raise AssertionError()
+                        if np.isnan(embeddings).any() or np.isinf(embeddings).any():
+                            logger.error(
+                                f"NaN or Inf detected in embeddings! Requested IDs: {node_ids[:5]}..."
+                            )
+                            dims = [0, embedding_dim]
+                            data = []
+                        else:
+                            # Copy computed embeddings into the correct positions
+                            emb_f32 = np.ascontiguousarray(embeddings, dtype=np.float32)
+                            flat = emb_f32.flatten().tolist()
+                            for j, pos in enumerate(found_indices):
+                                start = pos * embedding_dim
+                                end = start + embedding_dim
+                                data[start:end] = flat[j * embedding_dim : (j + 1) * embedding_dim]
+                    except Exception as e:
+                        logger.error(f"Embedding computation error, returning zeros: {e}")

-                hidden_contiguous_f32 = np.ascontiguousarray(embeddings, dtype=np.float32)
-                response_payload = [
-                    list(hidden_contiguous_f32.shape),
-                    hidden_contiguous_f32.flatten().tolist(),
-                ]
+                response_payload = [dims, data]
                response_bytes = msgpack.packb(response_payload, use_single_float=True)

                socket.send(response_bytes)
@@ -249,7 +300,22 @@ def create_hnsw_embedding_server(
                import traceback

                traceback.print_exc()
-                socket.send(msgpack.packb([[], []]))
+                # Fallback to a safe, minimal-structure response to avoid client crashes
+                if last_request_type == "distance":
+                    # Return a vector of large distances with the expected length
+                    fallback_len = max(0, int(last_request_length))
+                    large_distance = 1e9
+                    safe_response = [[large_distance] * fallback_len]
+                elif last_request_type == "embedding":
+                    # Return an empty embedding block with known dimension if available
+                    if embedding_dim > 0:
+                        safe_response = [[0, embedding_dim], []]
+                    else:
+                        safe_response = [[0, 0], []]
+                else:
+                    # Unknown request type: default to empty embedding structure
+                    safe_response = [[0, int(embedding_dim) if embedding_dim > 0 else 0], []]
+                socket.send(msgpack.packb(safe_response, use_single_float=True))

    zmq_thread = threading.Thread(target=zmq_server_thread, daemon=True)
    zmq_thread.start()
--- a/packages/leann-core/src/leann/cli.py
+++ b/packages/leann-core/src/leann/cli.py
@@ -117,7 +117,19 @@ Examples:
        build_parser.add_argument("--complexity", type=int, default=64)
        build_parser.add_argument("--num-threads", type=int, default=1)
        build_parser.add_argument("--compact", action="store_true", default=True)
+        build_parser.add_argument(
+            "--no-compact",
+            dest="compact",
+            action="store_false",
+            help="Disable compact index storage (store full embeddings; higher storage)",
+        )
        build_parser.add_argument("--recompute", action="store_true", default=True)
+        build_parser.add_argument(
+            "--no-recompute",
+            dest="recompute",
+            action="store_false",
+            help="Disable embedding recomputation (store full embeddings; lower query latency)",
+        )
        build_parser.add_argument(
            "--file-types",
            type=str,
@@ -138,6 +150,18 @@ Examples:
            default=True,
            help="Recompute embeddings (default: True)",
        )
+        search_parser.add_argument(
+            "--no-recompute-embeddings",
+            dest="recompute_embeddings",
+            action="store_false",
+            help="Disable embedding recomputation during search",
+        )
+        search_parser.add_argument(
+            "--no-recompute",
+            dest="recompute_embeddings",
+            action="store_false",
+            help="Alias for --no-recompute-embeddings",
+        )
        search_parser.add_argument(
            "--pruning-strategy",
            choices=["global", "local", "proportional"],
@@ -166,6 +190,18 @@ Examples:
            default=True,
            help="Recompute embeddings (default: True)",
        )
+        ask_parser.add_argument(
+            "--no-recompute-embeddings",
+            dest="recompute_embeddings",
+            action="store_false",
+            help="Disable embedding recomputation during ask",
+        )
+        ask_parser.add_argument(
+            "--no-recompute",
+            dest="recompute_embeddings",
+            action="store_false",
+            help="Alias for --no-recompute-embeddings",
+        )
        ask_parser.add_argument(
            "--pruning-strategy",
            choices=["global", "local", "proportional"],
--- a/packages/leann-mcp/README.md
+++ b/packages/leann-mcp/README.md
@@ -4,20 +4,12 @@ Transform your development workflow with intelligent code assistance using LEANN

 ## Prerequisites

-**Step 1:** First, complete the basic LEANN installation following the [📦 Installation guide](../../README.md#installation) in the root README:
+Install LEANN globally for MCP integration (with default backend):

 ```bash
-uv venv
-source .venv/bin/activate
-uv pip install leann
+uv tool install leann-core --with leann
 ```
-
-**Step 2:** Install LEANN globally for MCP integration:
-```bash
-uv tool install leann-core
-```
-
-This makes the `leann` command available system-wide, which `leann_mcp` requires.
+This installs the `leann` CLI into an isolated tool environment and includes both backends so `leann build` works out-of-the-box.

 ## 🚀 Quick Setup

--- a/sky/leann-build.yaml
+++ b/sky/leann-build.yaml
@@ -0,0 +1,62 @@
+name: leann-build
+
+resources:
+  # Choose a GPU for fast embeddings (examples: L4, A10G, A100). CPU also works but is slower.
+  accelerators: L4:1
+  # Optionally pin a cloud, otherwise SkyPilot will auto-select
+  # cloud: aws
+  disk_size: 100
+
+env:
+  # Build parameters (override with: sky launch -c leann-gpu sky/leann-build.yaml -e key=value)
+  index_name: my-index
+  docs: ./data
+  backend: hnsw               # hnsw | diskann
+  complexity: 64
+  graph_degree: 32
+  num_threads: 8
+  # Embedding selection
+  embedding_mode: sentence-transformers   # sentence-transformers | openai | mlx | ollama
+  embedding_model: facebook/contriever
+  # Storage/latency knobs
+  recompute: true             # true => selective recomputation; false => store full embeddings
+  compact: true               # for HNSW only: false when recompute=false
+  # Optional pass-through
+  extra_args: ""
+
+# Sync local paths to the remote VM. Adjust as needed.
+file_mounts:
+  # Example: mount your local data directory used for building
+  ~/leann-data: ${docs}
+
+setup: |
+  set -e
+  # Install uv (package manager)
+  curl -LsSf https://astral.sh/uv/install.sh | sh
+  export PATH="$HOME/.local/bin:$PATH"
+
+  # Install the LEANN CLI globally on the remote machine
+  uv tool install leann
+
+run: |
+  export PATH="$HOME/.local/bin:$PATH"
+  # Derive flags from env
+  recompute_flag=""
+  if [ "${recompute}" = "false" ] || [ "${recompute}" = "0" ]; then
+    recompute_flag="--no-recompute"
+  fi
+  compact_flag=""
+  if [ "${compact}" = "false" ] || [ "${compact}" = "0" ]; then
+    compact_flag="--no-compact"
+  fi
+
+  # Build command
+  leann build ${index_name} \
+    --docs ~/leann-data \
+    --backend ${backend} \
+    --complexity ${complexity} \
+    --graph-degree ${graph_degree} \
+    --num-threads ${num_threads} \
+    --embedding-mode ${embedding_mode} \
+    --embedding-model ${embedding_model} \
+    ${recompute_flag} ${compact_flag} ${extra_args}
Author	SHA1	Message	Date
Andy Lee	909d3cc6a8	fix(hnsw-server): robust ZMQ responses to prevent size mismatch and segfault in CI	2025-08-13 14:53:46 -07:00
Andy Lee	c994635af6	sky: expand leann-build.yaml with configurable params and flags (backend, recompute, compact, embedding options)	2025-08-13 14:18:48 -07:00
Andy Lee	23b80647c5	docs: dedupe recomputation guidance; keep single Low-resource setups section	2025-08-13 14:10:10 -07:00
Andy Lee	50121972ee	cli: add --no-recompute and --no-recompute-embeddings flags; docs: clarify HNSW requires --no-compact when disabling recompute	2025-08-13 14:09:05 -07:00
Andy Lee	07e5f10204	docs: consolidate low-resource guidance into config guide; README points to it	2025-08-13 14:08:23 -07:00
Andy Lee	58711bff7e	docs: add low-resource note in README; point to config guide; suggest OpenAI embeddings, SkyPilot remote build, and --no-recompute	2025-08-13 14:06:22 -07:00
Andy Lee	a69464eb16	docs: add SkyPilot template and instructions for running embeddings/index build on cloud GPU	2025-08-13 14:01:32 -07:00
Andy Lee	46565b9249	docs: follows #34 , patch leann backends into tool environment	2025-08-12 17:56:02 -07:00