c2b0f202bf
fp16 VAE produces NaNs on many SD1.5/SDXL checkpoints, yielding black images (often mistaken for NSFW censorship - there is no NSFW filter in the generator). Switch COMFYUI_FLAGS from --fp16-vae to --fp32-vae in .env.example and document the fix in the README. Verified end-to-end: a test generation now produces a real image (mean ~86) instead of black. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
309 lines
13 KiB
Markdown
309 lines
13 KiB
Markdown
# SparkyUI
|
|
|
|
**ComfyUI + SageAttention for NVIDIA DGX Spark (Blackwell GB10)**
|
|
|
|
A Docker-based ComfyUI setup specifically engineered for the DGX Spark's unique ARM64 + Blackwell architecture.
|
|
|
|
## Why This Exists
|
|
|
|
The NVIDIA DGX Spark uses the **GB10 GPU** with compute capability **12.1 (sm_121)** - Blackwell architecture. This creates challenges:
|
|
|
|
| CUDA Version | Max Compute Capability | Can compile for GB10? |
|
|
|--------------|------------------------|----------------------|
|
|
| CUDA 12.8 | sm_120 | **No** |
|
|
| CUDA 13.0+ | sm_121 | **Yes** |
|
|
|
|
Standard ComfyUI containers and PyTorch wheels don't support sm_121. SparkyUI solves this by:
|
|
|
|
1. Using **CUDA 13.0.2** base image (supports sm_121)
|
|
2. Installing **PyTorch cu130** ARM64 wheels
|
|
3. Compiling **SageAttention** with `TORCH_CUDA_ARCH_LIST="12.1"`
|
|
4. Disabling **Triton/torch.compile** (doesn't support sm_121 yet)
|
|
5. **Optimized for Grace-Blackwell unified memory architecture**
|
|
|
|
## What's Included
|
|
|
|
- **ComfyUI** (latest master branch)
|
|
- **ComfyUI-Manager** - auto-installed on first run for easy custom node management
|
|
- **ComfyUIMini** - mobile-friendly web UI for phones/tablets (separate container)
|
|
- **Model Manager** - StabilityMatrix-style UI to download/manage models (separate container)
|
|
- **SageAttention** - compiled natively for sm_121 (Blackwell tensor cores)
|
|
- **PyTorch 2.9.1+cu130** - ARM64 wheels with CUDA 13.0 support
|
|
|
|
## Unified Memory Architecture
|
|
|
|
The DGX Spark's Grace-Blackwell architecture uses **unified memory** - a coherent memory fabric shared between CPU and GPU. This is fundamentally different from discrete GPUs and requires different optimization strategies.
|
|
|
|
**Key insight: Don't fight the fabric.** Forcing everything GPU-side (`--gpu-only`, `--cache-none`) actually hurts performance.
|
|
|
|
**Optimized flags (default in SparkyUI):**
|
|
```bash
|
|
--disable-pinned-memory # Reduces overhead on unified fabric
|
|
--force-fp16 # Enables SageAttention optimization
|
|
--fp16-unet --fp16-text-enc # FP16 precision for UNet + text encoder
|
|
--fp32-vae # VAE in fp32 - fp16 VAE causes NaNs -> BLACK images
|
|
--dont-upcast-attention # Keeps attention in FP16 for speed
|
|
```
|
|
|
|
> **Black/blank images?** That's the classic fp16-VAE NaN issue, not an NSFW
|
|
> filter (there is none). Keep `--fp32-vae` (default). `--bf16-vae` is a faster
|
|
> alternative that also avoids the NaNs.
|
|
|
|
**What NOT to use:**
|
|
- `--gpu-only` - fights the unified memory fabric, hurts performance
|
|
- `--cache-none` - disables natural caching, slows model loading
|
|
- `--disable-mmap` - prevents memory-mapped model loading
|
|
|
|
**CUDA environment variables** are also tuned for unified memory:
|
|
- `CUDA_MANAGED_FORCE_DEVICE_ALLOC=1` - prefer GPU allocation
|
|
- `PYTORCH_NO_CUDA_MEMORY_CACHING=1` - let fabric manage memory
|
|
- `OMP_NUM_THREADS=20` - utilize all 20 ARM cores
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Clone
|
|
git clone https://github.com/ecarmen16/SparkyUI.git
|
|
cd SparkyUI
|
|
|
|
# Configure paths
|
|
cp .env.example .env
|
|
# Edit .env with your paths
|
|
|
|
# Build (compiles SageAttention for sm_121 - takes ~10 min)
|
|
docker compose build
|
|
|
|
# Start
|
|
docker compose up -d
|
|
|
|
# View logs
|
|
docker compose logs -f
|
|
```
|
|
|
|
**Access:**
|
|
- **ComfyUI (Desktop):** http://localhost:8188
|
|
- **ComfyUIMini (Mobile):** http://localhost:3000
|
|
- **Model Manager:** http://localhost:8189
|
|
|
|
## Requirements
|
|
|
|
- **NVIDIA DGX Spark** (or other GB10-based system)
|
|
- **Docker** with NVIDIA Container Toolkit
|
|
- **NVIDIA Driver** 560+ (tested with 580.95)
|
|
- **~15GB** disk for Docker image
|
|
- **Models** from existing ComfyUI install (mounted read-only)
|
|
|
|
## Configuration
|
|
|
|
Copy `.env.example` to `.env` and edit:
|
|
|
|
```bash
|
|
# Base path holding the models/ directory (defaults to the project root).
|
|
# The Model Manager downloads into <COMFYUI_HOST_PATH>/models; ComfyUI reads it.
|
|
COMFYUI_HOST_PATH=.
|
|
|
|
# Path for SparkyUI data (custom_nodes, outputs, inputs, manager DB).
|
|
# Defaults to the project root.
|
|
SPARKYUI_DATA_PATH=.
|
|
|
|
# Ports
|
|
COMFYUI_PORT=8188
|
|
COMFYUIMINI_PORT=3000
|
|
MODEL_MANAGER_PORT=8189
|
|
|
|
# Optional: pin to specific versions
|
|
COMFYUI_REF=master
|
|
SAGEATTN_REF=main
|
|
```
|
|
|
|
Both paths default to the project root, so out of the box models are stored in
|
|
`./models` and the Model Manager's database in `./sparkyui-data`. Point
|
|
`COMFYUI_HOST_PATH` at an existing ComfyUI install if you'd rather reuse its models.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ DGX Spark Host │
|
|
│ Ubuntu 24.04 (DGX OS 7) / Driver 580.x │
|
|
│ │
|
|
│ ┌────────────────────────────────────────────────────────────┐ │
|
|
│ │ Docker Network (sparky_net) │ │
|
|
│ │ │ │
|
|
│ │ ┌─────────────────────────┐ ┌──────────────────────────┐ │ │
|
|
│ │ │ comfyui (sparkyui:cu130)│ │ comfyuimini (node:20) │ │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ │ CUDA 13.0.2 + PyTorch │◄─┤ Mobile-friendly UI │ │ │
|
|
│ │ │ SageAttention (sm_121) │ │ REST + WebSocket proxy │ │ │
|
|
│ │ │ ComfyUI + Manager │ │ │ │ │
|
|
│ │ │ │ │ Shares /output volume │ │ │
|
|
│ │ └───────────┬─────────────┘ └────────────┬─────────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ └──────────────┼─────────────────────────────┼────────────────┘ │
|
|
│ │ │ │
|
|
│ Port 8188 (Desktop) Port 3000 (Mobile) │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Version Compatibility
|
|
|
|
Tested combinations:
|
|
|
|
| Component | Version | Notes |
|
|
|-----------|---------|-------|
|
|
| CUDA Base | 13.0.2 | Required for sm_121 |
|
|
| PyTorch | 2.9.1+cu130 | ARM64 wheel from PyTorch index |
|
|
| torchvision | 0.24.1+cu130 | ARM64 wheel |
|
|
| SageAttention | 2.2.0 | Compiled with sm_121 |
|
|
| ComfyUI | 0.7.0 | master branch |
|
|
| Driver | 580.95 | DGX OS 7 default |
|
|
|
|
## Known Limitations
|
|
|
|
1. **PyTorch Warning**: You'll see a warning about compute capability 12.1 being "outside supported range (8.0-12.0)". This is harmless - PyTorch works, and SageAttention's custom kernels are compiled natively.
|
|
|
|
2. **torch.compile Disabled**: Triton doesn't support sm_121 yet. `torch.compile()` is disabled via environment variables. Some nodes may run slower than on supported architectures.
|
|
|
|
3. **No GitHub Actions CI**: Can't build for ARM64 + sm_121 in GitHub's hosted runners. Must build locally on DGX Spark.
|
|
|
|
## Troubleshooting
|
|
|
|
### "no kernel image is available for execution on the device"
|
|
Your SageAttention wasn't compiled for sm_121. Rebuild:
|
|
```bash
|
|
docker compose build --no-cache
|
|
```
|
|
|
|
### PyTorch can't find CUDA
|
|
Ensure NVIDIA Container Toolkit is installed:
|
|
```bash
|
|
nvidia-ctk --version
|
|
docker run --rm --gpus all nvidia/cuda:13.0.2-base-ubuntu24.04 nvidia-smi
|
|
```
|
|
|
|
### ComfyUI-Manager missing
|
|
The entrypoint auto-clones it. Check logs:
|
|
```bash
|
|
docker compose logs | grep -i manager
|
|
```
|
|
|
|
## Host-Level GPU Optimizations (Optional)
|
|
|
|
For maximum performance, apply these optimizations on the **host** (not in Docker):
|
|
|
|
```bash
|
|
# Lock GPU clocks to maximum (3003 MHz) - prevents throttling
|
|
sudo nvidia-smi -lgc 3003,3003
|
|
|
|
# Enable core clock boost (GPU core > memory clock for compute)
|
|
sudo nvidia-smi boost-slider --vboost 1
|
|
|
|
# Enable persistence mode (reduces driver load latency)
|
|
sudo nvidia-smi -pm 1
|
|
|
|
# Verify settings
|
|
nvidia-smi --query-gpu=clocks.sm,clocks.max.sm,persistence_mode --format=csv
|
|
```
|
|
|
|
**Note:** GPU clock settings don't persist across reboots due to GB10 firmware behavior. Re-apply after each boot.
|
|
|
|
## ComfyUIMini (Mobile UI)
|
|
|
|
SparkyUI includes [ComfyUIMini](https://github.com/ImDarkTom/ComfyUIMini) - a lightweight, mobile-friendly web UI that runs in a separate container.
|
|
|
|
**Features:**
|
|
- Responsive design optimized for phones and tablets
|
|
- Simplified workflow execution interface
|
|
- Built-in image gallery (reads from shared output directory)
|
|
- Import workflows from ComfyUI in "API Format"
|
|
- Multiple themes (dark, light, aurora, nord, etc.)
|
|
|
|
**How it works:**
|
|
- Runs as a Node.js Express server in its own container (~150MB)
|
|
- Connects to ComfyUI via internal Docker network (`http://comfyui:8188`)
|
|
- Proxies REST API calls and WebSocket connections
|
|
- Shares the output directory for gallery viewing
|
|
|
|
**Access:** `http://<your-dgx-ip>:3000`
|
|
|
|
**Build only ComfyUIMini** (if ComfyUI already built):
|
|
```bash
|
|
docker compose build comfyuimini
|
|
docker compose up -d comfyuimini
|
|
```
|
|
|
|
## Model Manager
|
|
|
|
SparkyUI includes a **StabilityMatrix-style Model Manager** - a lightweight FastAPI web app
|
|
(separate container) for downloading and managing models without touching the command line.
|
|
|
|
**Access:** `http://<your-dgx-ip>:8189`
|
|
|
|
**Features:**
|
|
- **Gallery** - browse generated photos from ComfyUI's `output/` in a large desktop grid,
|
|
click for a full-size lightbox view, and **permanently delete** photos one at a time or
|
|
all at once (with confirm).
|
|
- **Browse CivitAI** - search the CivitAI catalog in a thumbnail grid (filter by type,
|
|
base model (multi-select), sort, period, NSFW toggle) and **click a model to download
|
|
it** - no URL pasting needed.
|
|
Multi-version models get a version picker on the card. **Early Access** versions are
|
|
flagged (they require purchased access on CivitAI and otherwise fail with 401).
|
|
- **Installed Models** - browse what's on disk, grouped by type, with size and delete actions
|
|
- **Add / Download** - paste a download URL and pick a type; live progress bars
|
|
- **Direct URLs** - any direct download link
|
|
- **CivitAI** - paste a model page link (`civitai.com/models/...`, the `civitai.red`
|
|
mirror, or an `api/download/models/...` link); the type and filename are auto-detected
|
|
- **HuggingFace** - paste a `resolve` URL (works with gated repos via your token)
|
|
- **Settings** - store your **CivitAI API key** and **HuggingFace token** persistently
|
|
(saved to a SQLite DB under `./sparkyui-data`, never committed to git)
|
|
|
|
**How it works:**
|
|
- Runs as a FastAPI server in its own container (`python:3.12-slim`)
|
|
- Downloads land in the shared `models/` folder, sorted into ComfyUI's standard sub-folders
|
|
by type (`checkpoints/`, `loras/`, `vae/`, `controlnet/`, `upscale_models/`, …) - these are
|
|
created automatically on first start
|
|
- ComfyUI mounts the same `models/` folder read-only, so new downloads appear in its loaders
|
|
- Mounts the shared `output/` folder read-write for the Gallery's delete feature
|
|
|
|
**Device-aware entry point:** open `http://<host>:8189/start` and it detects your device -
|
|
**phones** are sent to the mobile UI (ComfyUIMini), **desktops** land on the Model Manager's
|
|
Gallery. Append `?force=mobile` or `?force=desktop` to override. Bookmark `/start` as your
|
|
single SparkyUI link.
|
|
|
|
ComfyUIMini also gets a **"Manage Photos"** link in its sidebar that jumps to this Gallery,
|
|
so you can delete generated photos from the mobile UI too (its built-in gallery is view-only).
|
|
|
|
**Build only the Model Manager** (if the rest is already built):
|
|
```bash
|
|
docker compose build model-manager
|
|
docker compose up -d model-manager
|
|
```
|
|
|
|
## SageAttention Notes
|
|
|
|
SageAttention PR #297 added sm_121 support but was merged then reverted due to stability issues. Our approach:
|
|
|
|
- Build SageAttention from main branch with `TORCH_CUDA_ARCH_LIST="12.1"`
|
|
- Disable Triton via `TORCHDYNAMO_DISABLE=1` (Triton doesn't support sm_121a)
|
|
- This gives working SageAttention without the unstable PR #297 changes
|
|
|
|
For full Triton support (more complex), see [HurbaLurba's DGX-SPARK-COMFYUI-DOCKER](https://github.com/HurbaLurba/DGX-SPARK-COMFYUI-DOCKER) which builds custom Triton from source.
|
|
|
|
## Future
|
|
|
|
When these land, SparkyUI can be simplified:
|
|
- [ ] PyTorch native sm_121 support → remove explicit `TORCH_CUDA_ARCH_LIST`
|
|
- [ ] Triton sm_121 support → remove `TORCHDYNAMO_DISABLE`
|
|
- [ ] SageAttention prebuilt ARM64 wheels → remove source build
|
|
|
|
## Credits
|
|
|
|
- Unified memory architecture insights from [HurbaLurba's DGX-SPARK-COMFYUI-DOCKER](https://github.com/HurbaLurba/DGX-SPARK-COMFYUI-DOCKER)
|
|
- [ComfyUIMini](https://github.com/ImDarkTom/ComfyUIMini) by ImDarkTom
|
|
- [SageAttention](https://github.com/thu-ml/SageAttention) by thu-ml
|
|
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI) by comfyanonymous
|
|
|
|
## License
|
|
|
|
MIT
|