docs: add SkyPilot template and instructions for running embeddings/index build on cloud GPU

This commit is contained in:
Andy Lee
2025-08-13 14:01:32 -07:00
parent 46565b9249
commit a69464eb16
3 changed files with 64 additions and 0 deletions

View File

@@ -545,6 +545,17 @@ Options:
**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
### Cloud Builds with SkyPilot (Optional)
If your local machine lacks a GPU or you want faster embedding/index builds, you can run LEANN builds on a cloud GPU VM using SkyPilot. A ready-to-use template is provided at `sky/leann-build.yaml`.
```bash
sky launch -c leann-gpu sky/leann-build.yaml
sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32"
```
See the configuration guide section “Running Builds on SkyPilot (Optional)” for details.
## Benchmarks

View File

@@ -278,6 +278,31 @@ LEANN's recomputation feature provides exact distance calculations but can be di
- Need extremely low latency (< 100ms)
- Running a read-heavy workload where storage cost is acceptable
## Running Builds on SkyPilot (Optional)
You can offload embedding generation and index building to a cloud GPU VM using SkyPilot, without changing any LEANN code. This is useful when your local machine lacks a GPU or you want faster throughput.
### Quick Start
1) Install SkyPilot by following their docs (`pip install skypilot`, then configure cloud credentials).
2) Use the provided SkyPilot template:
```bash
sky launch -c leann-gpu sky/leann-build.yaml
```
3) On the remote, either put your data under the mounted path or adjust `file_mounts` in `sky/leann-build.yaml`. Then run the LEANN build:
```bash
sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32"
```
Notes:
- The template installs `uv` and the `leann` CLI globally on the remote instance.
- Change the `accelerators` and `cloud` settings in `sky/leann-build.yaml` to match your budget/availability (e.g., `A10G:1`, `A100:1`, or CPU-only if you prefer).
- You can also build with `diskann` by switching `--backend diskann`.
## Further Reading
- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)

28
sky/leann-build.yaml Normal file
View File

@@ -0,0 +1,28 @@
name: leann-build
resources:
# Choose a GPU for fast embeddings (examples: L4, A10G, A100). CPU also works but is slower.
accelerators: L4:1
# Optionally pin a cloud, otherwise SkyPilot will auto-select
# cloud: aws
disk_size: 100
# Sync local paths to the remote VM. Adjust as needed.
file_mounts:
# Example: mount your local data directory used for building
~/leann-data: ./data
setup: |
set -e
# Install uv (package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# Install the LEANN CLI globally on the remote machine
uv tool install leann
# Optional: you can immediately kick off a build here, or use `sky exec` later.
# run: |
# export PATH="$HOME/.local/bin:$PATH"
# # Example build using the mounted data directory
# leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32