Files
SparkyUI/README.md
T
TBNilles 399acabd58 feat(model-manager): "Free GPU memory" button to unload ComfyUI models
ComfyUI caches the last model when RAM is plentiful (unified memory), so
memory doesn't drop after switching models even though models are being
swapped, not accumulated. Add a sidebar "Free GPU memory" button that
proxies ComfyUI's POST /free (unload_models + free_memory) via a new
/api/comfyui/free endpoint (COMFYUI_URL env). Verified it releases ~7GB.
README documents this plus the --disable-smart-memory auto-unload option.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 17:14:37 -04:00

314 lines
14 KiB
Markdown

# SparkyUI
**ComfyUI + SageAttention for NVIDIA DGX Spark (Blackwell GB10)**
A Docker-based ComfyUI setup specifically engineered for the DGX Spark's unique ARM64 + Blackwell architecture.
## Why This Exists
The NVIDIA DGX Spark uses the **GB10 GPU** with compute capability **12.1 (sm_121)** - Blackwell architecture. This creates challenges:
| CUDA Version | Max Compute Capability | Can compile for GB10? |
|--------------|------------------------|----------------------|
| CUDA 12.8 | sm_120 | **No** |
| CUDA 13.0+ | sm_121 | **Yes** |
Standard ComfyUI containers and PyTorch wheels don't support sm_121. SparkyUI solves this by:
1. Using **CUDA 13.0.2** base image (supports sm_121)
2. Installing **PyTorch cu130** ARM64 wheels
3. Compiling **SageAttention** with `TORCH_CUDA_ARCH_LIST="12.1"`
4. Disabling **Triton/torch.compile** (doesn't support sm_121 yet)
5. **Optimized for Grace-Blackwell unified memory architecture**
## What's Included
- **ComfyUI** (latest master branch)
- **ComfyUI-Manager** - auto-installed on first run for easy custom node management
- **ComfyUIMini** - mobile-friendly web UI for phones/tablets (separate container)
- **Model Manager** - StabilityMatrix-style UI to download/manage models (separate container)
- **SageAttention** - compiled natively for sm_121 (Blackwell tensor cores)
- **PyTorch 2.9.1+cu130** - ARM64 wheels with CUDA 13.0 support
## Unified Memory Architecture
The DGX Spark's Grace-Blackwell architecture uses **unified memory** - a coherent memory fabric shared between CPU and GPU. This is fundamentally different from discrete GPUs and requires different optimization strategies.
**Key insight: Don't fight the fabric.** Forcing everything GPU-side (`--gpu-only`, `--cache-none`) actually hurts performance.
**Optimized flags (default in SparkyUI):**
```bash
--disable-pinned-memory # Reduces overhead on unified fabric
--force-fp16 # Enables SageAttention optimization
--fp16-unet --fp16-text-enc # FP16 precision for UNet + text encoder
--fp32-vae # VAE in fp32 - fp16 VAE causes NaNs -> BLACK images
--dont-upcast-attention # Keeps attention in FP16 for speed
```
> **Black/blank images?** That's the classic fp16-VAE NaN issue, not an NSFW
> filter (there is none). Keep `--fp32-vae` (default). `--bf16-vae` is a faster
> alternative that also avoids the NaNs.
**What NOT to use:**
- `--gpu-only` - fights the unified memory fabric, hurts performance
- `--cache-none` - disables natural caching, slows model loading
- `--disable-mmap` - prevents memory-mapped model loading
**CUDA environment variables** are also tuned for unified memory:
- `CUDA_MANAGED_FORCE_DEVICE_ALLOC=1` - prefer GPU allocation
- `PYTORCH_NO_CUDA_MEMORY_CACHING=1` - let fabric manage memory
- `OMP_NUM_THREADS=20` - utilize all 20 ARM cores
## Quick Start
```bash
# Clone
git clone https://github.com/ecarmen16/SparkyUI.git
cd SparkyUI
# Configure paths
cp .env.example .env
# Edit .env with your paths
# Build (compiles SageAttention for sm_121 - takes ~10 min)
docker compose build
# Start
docker compose up -d
# View logs
docker compose logs -f
```
**Access:**
- **ComfyUI (Desktop):** http://localhost:8188
- **ComfyUIMini (Mobile):** http://localhost:3000
- **Model Manager:** http://localhost:8189
## Requirements
- **NVIDIA DGX Spark** (or other GB10-based system)
- **Docker** with NVIDIA Container Toolkit
- **NVIDIA Driver** 560+ (tested with 580.95)
- **~15GB** disk for Docker image
- **Models** from existing ComfyUI install (mounted read-only)
## Configuration
Copy `.env.example` to `.env` and edit:
```bash
# Base path holding the models/ directory (defaults to the project root).
# The Model Manager downloads into <COMFYUI_HOST_PATH>/models; ComfyUI reads it.
COMFYUI_HOST_PATH=.
# Path for SparkyUI data (custom_nodes, outputs, inputs, manager DB).
# Defaults to the project root.
SPARKYUI_DATA_PATH=.
# Ports
COMFYUI_PORT=8188
COMFYUIMINI_PORT=3000
MODEL_MANAGER_PORT=8189
# Optional: pin to specific versions
COMFYUI_REF=master
SAGEATTN_REF=main
```
Both paths default to the project root, so out of the box models are stored in
`./models` and the Model Manager's database in `./sparkyui-data`. Point
`COMFYUI_HOST_PATH` at an existing ComfyUI install if you'd rather reuse its models.
## Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ DGX Spark Host │
│ Ubuntu 24.04 (DGX OS 7) / Driver 580.x │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Docker Network (sparky_net) │ │
│ │ │ │
│ │ ┌─────────────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ comfyui (sparkyui:cu130)│ │ comfyuimini (node:20) │ │ │
│ │ │ │ │ │ │ │
│ │ │ CUDA 13.0.2 + PyTorch │◄─┤ Mobile-friendly UI │ │ │
│ │ │ SageAttention (sm_121) │ │ REST + WebSocket proxy │ │ │
│ │ │ ComfyUI + Manager │ │ │ │ │
│ │ │ │ │ Shares /output volume │ │ │
│ │ └───────────┬─────────────┘ └────────────┬─────────────┘ │ │
│ │ │ │ │ │
│ └──────────────┼─────────────────────────────┼────────────────┘ │
│ │ │ │
│ Port 8188 (Desktop) Port 3000 (Mobile) │
└──────────────────────────────────────────────────────────────────┘
```
## Version Compatibility
Tested combinations:
| Component | Version | Notes |
|-----------|---------|-------|
| CUDA Base | 13.0.2 | Required for sm_121 |
| PyTorch | 2.9.1+cu130 | ARM64 wheel from PyTorch index |
| torchvision | 0.24.1+cu130 | ARM64 wheel |
| SageAttention | 2.2.0 | Compiled with sm_121 |
| ComfyUI | 0.7.0 | master branch |
| Driver | 580.95 | DGX OS 7 default |
## Known Limitations
1. **PyTorch Warning**: You'll see a warning about compute capability 12.1 being "outside supported range (8.0-12.0)". This is harmless - PyTorch works, and SageAttention's custom kernels are compiled natively.
2. **torch.compile Disabled**: Triton doesn't support sm_121 yet. `torch.compile()` is disabled via environment variables. Some nodes may run slower than on supported architectures.
3. **No GitHub Actions CI**: Can't build for ARM64 + sm_121 in GitHub's hosted runners. Must build locally on DGX Spark.
## Troubleshooting
### "no kernel image is available for execution on the device"
Your SageAttention wasn't compiled for sm_121. Rebuild:
```bash
docker compose build --no-cache
```
### PyTorch can't find CUDA
Ensure NVIDIA Container Toolkit is installed:
```bash
nvidia-ctk --version
docker run --rm --gpus all nvidia/cuda:13.0.2-base-ubuntu24.04 nvidia-smi
```
### ComfyUI-Manager missing
The entrypoint auto-clones it. Check logs:
```bash
docker compose logs | grep -i manager
```
## Host-Level GPU Optimizations (Optional)
For maximum performance, apply these optimizations on the **host** (not in Docker):
```bash
# Lock GPU clocks to maximum (3003 MHz) - prevents throttling
sudo nvidia-smi -lgc 3003,3003
# Enable core clock boost (GPU core > memory clock for compute)
sudo nvidia-smi boost-slider --vboost 1
# Enable persistence mode (reduces driver load latency)
sudo nvidia-smi -pm 1
# Verify settings
nvidia-smi --query-gpu=clocks.sm,clocks.max.sm,persistence_mode --format=csv
```
**Note:** GPU clock settings don't persist across reboots due to GB10 firmware behavior. Re-apply after each boot.
## ComfyUIMini (Mobile UI)
SparkyUI includes [ComfyUIMini](https://github.com/ImDarkTom/ComfyUIMini) - a lightweight, mobile-friendly web UI that runs in a separate container.
**Features:**
- Responsive design optimized for phones and tablets
- Simplified workflow execution interface
- Built-in image gallery (reads from shared output directory)
- Import workflows from ComfyUI in "API Format"
- Multiple themes (dark, light, aurora, nord, etc.)
**How it works:**
- Runs as a Node.js Express server in its own container (~150MB)
- Connects to ComfyUI via internal Docker network (`http://comfyui:8188`)
- Proxies REST API calls and WebSocket connections
- Shares the output directory for gallery viewing
**Access:** `http://<your-dgx-ip>:3000`
**Build only ComfyUIMini** (if ComfyUI already built):
```bash
docker compose build comfyuimini
docker compose up -d comfyuimini
```
## Model Manager
SparkyUI includes a **StabilityMatrix-style Model Manager** - a lightweight FastAPI web app
(separate container) for downloading and managing models without touching the command line.
**Access:** `http://<your-dgx-ip>:8189`
**Features:**
- **Gallery** - browse generated photos from ComfyUI's `output/` in a large desktop grid,
click for a full-size lightbox view, and **permanently delete** photos one at a time or
all at once (with confirm).
- **Browse CivitAI** - search the CivitAI catalog in a thumbnail grid (filter by type,
base model (multi-select), sort, period, NSFW toggle) and **click a model to download
it** - no URL pasting needed.
Multi-version models get a version picker on the card. **Early Access** versions are
flagged (they require purchased access on CivitAI and otherwise fail with 401).
- **Installed Models** - browse what's on disk, grouped by type, with size and delete actions
- **Add / Download** - paste a download URL and pick a type; live progress bars
- **Direct URLs** - any direct download link
- **CivitAI** - paste a model page link (`civitai.com/models/...`, the `civitai.red`
mirror, or an `api/download/models/...` link); the type and filename are auto-detected
- **HuggingFace** - paste a `resolve` URL (works with gated repos via your token)
- **Settings** - store your **CivitAI API key** and **HuggingFace token** persistently
(saved to a SQLite DB under `./sparkyui-data`, never committed to git)
- **Free GPU memory** - a sidebar button that unloads all models from ComfyUI and releases
memory (proxies ComfyUI's `/free`). ComfyUI keeps the last model cached for fast reuse when
RAM is plentiful, so memory won't drop on its own after switching models - use this to
release it on demand. (For automatic unload after every generation, add
`--disable-smart-memory` to `COMFYUI_FLAGS`, at the cost of reloading each run.)
**How it works:**
- Runs as a FastAPI server in its own container (`python:3.12-slim`)
- Downloads land in the shared `models/` folder, sorted into ComfyUI's standard sub-folders
by type (`checkpoints/`, `loras/`, `vae/`, `controlnet/`, `upscale_models/`, …) - these are
created automatically on first start
- ComfyUI mounts the same `models/` folder read-only, so new downloads appear in its loaders
- Mounts the shared `output/` folder read-write for the Gallery's delete feature
**Device-aware entry point:** open `http://<host>:8189/start` and it detects your device -
**phones** are sent to the mobile UI (ComfyUIMini), **desktops** land on the Model Manager's
Gallery. Append `?force=mobile` or `?force=desktop` to override. Bookmark `/start` as your
single SparkyUI link.
ComfyUIMini also gets a **"Manage Photos"** link in its sidebar that jumps to this Gallery,
so you can delete generated photos from the mobile UI too (its built-in gallery is view-only).
**Build only the Model Manager** (if the rest is already built):
```bash
docker compose build model-manager
docker compose up -d model-manager
```
## SageAttention Notes
SageAttention PR #297 added sm_121 support but was merged then reverted due to stability issues. Our approach:
- Build SageAttention from main branch with `TORCH_CUDA_ARCH_LIST="12.1"`
- Disable Triton via `TORCHDYNAMO_DISABLE=1` (Triton doesn't support sm_121a)
- This gives working SageAttention without the unstable PR #297 changes
For full Triton support (more complex), see [HurbaLurba's DGX-SPARK-COMFYUI-DOCKER](https://github.com/HurbaLurba/DGX-SPARK-COMFYUI-DOCKER) which builds custom Triton from source.
## Future
When these land, SparkyUI can be simplified:
- [ ] PyTorch native sm_121 support → remove explicit `TORCH_CUDA_ARCH_LIST`
- [ ] Triton sm_121 support → remove `TORCHDYNAMO_DISABLE`
- [ ] SageAttention prebuilt ARM64 wheels → remove source build
## Credits
- Unified memory architecture insights from [HurbaLurba's DGX-SPARK-COMFYUI-DOCKER](https://github.com/HurbaLurba/DGX-SPARK-COMFYUI-DOCKER)
- [ComfyUIMini](https://github.com/ImDarkTom/ComfyUIMini) by ImDarkTom
- [SageAttention](https://github.com/thu-ml/SageAttention) by thu-ml
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) by comfyanonymous
## License
MIT