From a69464eb16f087d48245312cf6853d80f3c8abd9 Mon Sep 17 00:00:00 2001 From: Andy Lee Date: Wed, 13 Aug 2025 14:01:32 -0700 Subject: [PATCH] docs: add SkyPilot template and instructions for running embeddings/index build on cloud GPU --- README.md | 11 +++++++++++ docs/configuration-guide.md | 25 +++++++++++++++++++++++++ sky/leann-build.yaml | 28 ++++++++++++++++++++++++++++ 3 files changed, 64 insertions(+) create mode 100644 sky/leann-build.yaml diff --git a/README.md b/README.md index 06403ae..627fec0 100755 --- a/README.md +++ b/README.md @@ -545,6 +545,17 @@ Options: **Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets. +### Cloud Builds with SkyPilot (Optional) + +If your local machine lacks a GPU or you want faster embedding/index builds, you can run LEANN builds on a cloud GPU VM using SkyPilot. A ready-to-use template is provided at `sky/leann-build.yaml`. + +```bash +sky launch -c leann-gpu sky/leann-build.yaml +sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32" +``` + +See the configuration guide section “Running Builds on SkyPilot (Optional)” for details. + ## Benchmarks diff --git a/docs/configuration-guide.md b/docs/configuration-guide.md index 22dcaa8..989bdf4 100644 --- a/docs/configuration-guide.md +++ b/docs/configuration-guide.md @@ -278,6 +278,31 @@ LEANN's recomputation feature provides exact distance calculations but can be di - Need extremely low latency (< 100ms) - Running a read-heavy workload where storage cost is acceptable +## Running Builds on SkyPilot (Optional) + +You can offload embedding generation and index building to a cloud GPU VM using SkyPilot, without changing any LEANN code. This is useful when your local machine lacks a GPU or you want faster throughput. + +### Quick Start + +1) Install SkyPilot by following their docs (`pip install skypilot`, then configure cloud credentials). + +2) Use the provided SkyPilot template: + +```bash +sky launch -c leann-gpu sky/leann-build.yaml +``` + +3) On the remote, either put your data under the mounted path or adjust `file_mounts` in `sky/leann-build.yaml`. Then run the LEANN build: + +```bash +sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32" +``` + +Notes: +- The template installs `uv` and the `leann` CLI globally on the remote instance. +- Change the `accelerators` and `cloud` settings in `sky/leann-build.yaml` to match your budget/availability (e.g., `A10G:1`, `A100:1`, or CPU-only if you prefer). +- You can also build with `diskann` by switching `--backend diskann`. + ## Further Reading - [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/) diff --git a/sky/leann-build.yaml b/sky/leann-build.yaml new file mode 100644 index 0000000..f5e04e7 --- /dev/null +++ b/sky/leann-build.yaml @@ -0,0 +1,28 @@ +name: leann-build + +resources: + # Choose a GPU for fast embeddings (examples: L4, A10G, A100). CPU also works but is slower. + accelerators: L4:1 + # Optionally pin a cloud, otherwise SkyPilot will auto-select + # cloud: aws + disk_size: 100 + +# Sync local paths to the remote VM. Adjust as needed. +file_mounts: + # Example: mount your local data directory used for building + ~/leann-data: ./data + +setup: | + set -e + # Install uv (package manager) + curl -LsSf https://astral.sh/uv/install.sh | sh + export PATH="$HOME/.local/bin:$PATH" + + # Install the LEANN CLI globally on the remote machine + uv tool install leann + +# Optional: you can immediately kick off a build here, or use `sky exec` later. +# run: | +# export PATH="$HOME/.local/bin:$PATH" +# # Example build using the mounted data directory +# leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32