docs: add SkyPilot template and instructions for running embeddings/index build on cloud GPU
This commit is contained in:
11
README.md
11
README.md
@@ -545,6 +545,17 @@ Options:
|
||||
|
||||
**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
|
||||
|
||||
### Cloud Builds with SkyPilot (Optional)
|
||||
|
||||
If your local machine lacks a GPU or you want faster embedding/index builds, you can run LEANN builds on a cloud GPU VM using SkyPilot. A ready-to-use template is provided at `sky/leann-build.yaml`.
|
||||
|
||||
```bash
|
||||
sky launch -c leann-gpu sky/leann-build.yaml
|
||||
sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32"
|
||||
```
|
||||
|
||||
See the configuration guide section “Running Builds on SkyPilot (Optional)” for details.
|
||||
|
||||
## Benchmarks
|
||||
|
||||
|
||||
|
||||
@@ -278,6 +278,31 @@ LEANN's recomputation feature provides exact distance calculations but can be di
|
||||
- Need extremely low latency (< 100ms)
|
||||
- Running a read-heavy workload where storage cost is acceptable
|
||||
|
||||
## Running Builds on SkyPilot (Optional)
|
||||
|
||||
You can offload embedding generation and index building to a cloud GPU VM using SkyPilot, without changing any LEANN code. This is useful when your local machine lacks a GPU or you want faster throughput.
|
||||
|
||||
### Quick Start
|
||||
|
||||
1) Install SkyPilot by following their docs (`pip install skypilot`, then configure cloud credentials).
|
||||
|
||||
2) Use the provided SkyPilot template:
|
||||
|
||||
```bash
|
||||
sky launch -c leann-gpu sky/leann-build.yaml
|
||||
```
|
||||
|
||||
3) On the remote, either put your data under the mounted path or adjust `file_mounts` in `sky/leann-build.yaml`. Then run the LEANN build:
|
||||
|
||||
```bash
|
||||
sky exec leann-gpu -- "leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32"
|
||||
```
|
||||
|
||||
Notes:
|
||||
- The template installs `uv` and the `leann` CLI globally on the remote instance.
|
||||
- Change the `accelerators` and `cloud` settings in `sky/leann-build.yaml` to match your budget/availability (e.g., `A10G:1`, `A100:1`, or CPU-only if you prefer).
|
||||
- You can also build with `diskann` by switching `--backend diskann`.
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
|
||||
|
||||
28
sky/leann-build.yaml
Normal file
28
sky/leann-build.yaml
Normal file
@@ -0,0 +1,28 @@
|
||||
name: leann-build
|
||||
|
||||
resources:
|
||||
# Choose a GPU for fast embeddings (examples: L4, A10G, A100). CPU also works but is slower.
|
||||
accelerators: L4:1
|
||||
# Optionally pin a cloud, otherwise SkyPilot will auto-select
|
||||
# cloud: aws
|
||||
disk_size: 100
|
||||
|
||||
# Sync local paths to the remote VM. Adjust as needed.
|
||||
file_mounts:
|
||||
# Example: mount your local data directory used for building
|
||||
~/leann-data: ./data
|
||||
|
||||
setup: |
|
||||
set -e
|
||||
# Install uv (package manager)
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
export PATH="$HOME/.local/bin:$PATH"
|
||||
|
||||
# Install the LEANN CLI globally on the remote machine
|
||||
uv tool install leann
|
||||
|
||||
# Optional: you can immediately kick off a build here, or use `sky exec` later.
|
||||
# run: |
|
||||
# export PATH="$HOME/.local/bin:$PATH"
|
||||
# # Example build using the mounted data directory
|
||||
# leann build my-index --docs ~/leann-data --backend hnsw --complexity 64 --graph-degree 32
|
||||
Reference in New Issue
Block a user