ComfyUI caches the last model when RAM is plentiful (unified memory), so memory doesn't drop after switching models even though models are being swapped, not accumulated. Add a sidebar "Free GPU memory" button that proxies ComfyUI's POST /free (unload_models + free_memory) via a new /api/comfyui/free endpoint (COMFYUI_URL env). Verified it releases ~7GB. README documents this plus the --disable-smart-memory auto-unload option. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SparkyUI
ComfyUI + SageAttention for NVIDIA DGX Spark (Blackwell GB10)
A Docker-based ComfyUI setup specifically engineered for the DGX Spark's unique ARM64 + Blackwell architecture.
Why This Exists
The NVIDIA DGX Spark uses the GB10 GPU with compute capability 12.1 (sm_121) - Blackwell architecture. This creates challenges:
| CUDA Version | Max Compute Capability | Can compile for GB10? |
|---|---|---|
| CUDA 12.8 | sm_120 | No |
| CUDA 13.0+ | sm_121 | Yes |
Standard ComfyUI containers and PyTorch wheels don't support sm_121. SparkyUI solves this by:
- Using CUDA 13.0.2 base image (supports sm_121)
- Installing PyTorch cu130 ARM64 wheels
- Compiling SageAttention with
TORCH_CUDA_ARCH_LIST="12.1" - Disabling Triton/torch.compile (doesn't support sm_121 yet)
- Optimized for Grace-Blackwell unified memory architecture
What's Included
- ComfyUI (latest master branch)
- ComfyUI-Manager - auto-installed on first run for easy custom node management
- ComfyUIMini - mobile-friendly web UI for phones/tablets (separate container)
- Model Manager - StabilityMatrix-style UI to download/manage models (separate container)
- SageAttention - compiled natively for sm_121 (Blackwell tensor cores)
- PyTorch 2.9.1+cu130 - ARM64 wheels with CUDA 13.0 support
Unified Memory Architecture
The DGX Spark's Grace-Blackwell architecture uses unified memory - a coherent memory fabric shared between CPU and GPU. This is fundamentally different from discrete GPUs and requires different optimization strategies.
Key insight: Don't fight the fabric. Forcing everything GPU-side (--gpu-only, --cache-none) actually hurts performance.
Optimized flags (default in SparkyUI):
--disable-pinned-memory # Reduces overhead on unified fabric
--force-fp16 # Enables SageAttention optimization
--fp16-unet --fp16-text-enc # FP16 precision for UNet + text encoder
--fp32-vae # VAE in fp32 - fp16 VAE causes NaNs -> BLACK images
--dont-upcast-attention # Keeps attention in FP16 for speed
Black/blank images? That's the classic fp16-VAE NaN issue, not an NSFW filter (there is none). Keep
--fp32-vae(default).--bf16-vaeis a faster alternative that also avoids the NaNs.
What NOT to use:
--gpu-only- fights the unified memory fabric, hurts performance--cache-none- disables natural caching, slows model loading--disable-mmap- prevents memory-mapped model loading
CUDA environment variables are also tuned for unified memory:
CUDA_MANAGED_FORCE_DEVICE_ALLOC=1- prefer GPU allocationPYTORCH_NO_CUDA_MEMORY_CACHING=1- let fabric manage memoryOMP_NUM_THREADS=20- utilize all 20 ARM cores
Quick Start
# Clone
git clone https://github.com/ecarmen16/SparkyUI.git
cd SparkyUI
# Configure paths
cp .env.example .env
# Edit .env with your paths
# Build (compiles SageAttention for sm_121 - takes ~10 min)
docker compose build
# Start
docker compose up -d
# View logs
docker compose logs -f
Access:
- ComfyUI (Desktop): http://localhost:8188
- ComfyUIMini (Mobile): http://localhost:3000
- Model Manager: http://localhost:8189
Requirements
- NVIDIA DGX Spark (or other GB10-based system)
- Docker with NVIDIA Container Toolkit
- NVIDIA Driver 560+ (tested with 580.95)
- ~15GB disk for Docker image
- Models from existing ComfyUI install (mounted read-only)
Configuration
Copy .env.example to .env and edit:
# Base path holding the models/ directory (defaults to the project root).
# The Model Manager downloads into <COMFYUI_HOST_PATH>/models; ComfyUI reads it.
COMFYUI_HOST_PATH=.
# Path for SparkyUI data (custom_nodes, outputs, inputs, manager DB).
# Defaults to the project root.
SPARKYUI_DATA_PATH=.
# Ports
COMFYUI_PORT=8188
COMFYUIMINI_PORT=3000
MODEL_MANAGER_PORT=8189
# Optional: pin to specific versions
COMFYUI_REF=master
SAGEATTN_REF=main
Both paths default to the project root, so out of the box models are stored in
./models and the Model Manager's database in ./sparkyui-data. Point
COMFYUI_HOST_PATH at an existing ComfyUI install if you'd rather reuse its models.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ DGX Spark Host │
│ Ubuntu 24.04 (DGX OS 7) / Driver 580.x │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Docker Network (sparky_net) │ │
│ │ │ │
│ │ ┌─────────────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ comfyui (sparkyui:cu130)│ │ comfyuimini (node:20) │ │ │
│ │ │ │ │ │ │ │
│ │ │ CUDA 13.0.2 + PyTorch │◄─┤ Mobile-friendly UI │ │ │
│ │ │ SageAttention (sm_121) │ │ REST + WebSocket proxy │ │ │
│ │ │ ComfyUI + Manager │ │ │ │ │
│ │ │ │ │ Shares /output volume │ │ │
│ │ └───────────┬─────────────┘ └────────────┬─────────────┘ │ │
│ │ │ │ │ │
│ └──────────────┼─────────────────────────────┼────────────────┘ │
│ │ │ │
│ Port 8188 (Desktop) Port 3000 (Mobile) │
└──────────────────────────────────────────────────────────────────┘
Version Compatibility
Tested combinations:
| Component | Version | Notes |
|---|---|---|
| CUDA Base | 13.0.2 | Required for sm_121 |
| PyTorch | 2.9.1+cu130 | ARM64 wheel from PyTorch index |
| torchvision | 0.24.1+cu130 | ARM64 wheel |
| SageAttention | 2.2.0 | Compiled with sm_121 |
| ComfyUI | 0.7.0 | master branch |
| Driver | 580.95 | DGX OS 7 default |
Known Limitations
-
PyTorch Warning: You'll see a warning about compute capability 12.1 being "outside supported range (8.0-12.0)". This is harmless - PyTorch works, and SageAttention's custom kernels are compiled natively.
-
torch.compile Disabled: Triton doesn't support sm_121 yet.
torch.compile()is disabled via environment variables. Some nodes may run slower than on supported architectures. -
No GitHub Actions CI: Can't build for ARM64 + sm_121 in GitHub's hosted runners. Must build locally on DGX Spark.
Troubleshooting
"no kernel image is available for execution on the device"
Your SageAttention wasn't compiled for sm_121. Rebuild:
docker compose build --no-cache
PyTorch can't find CUDA
Ensure NVIDIA Container Toolkit is installed:
nvidia-ctk --version
docker run --rm --gpus all nvidia/cuda:13.0.2-base-ubuntu24.04 nvidia-smi
ComfyUI-Manager missing
The entrypoint auto-clones it. Check logs:
docker compose logs | grep -i manager
Host-Level GPU Optimizations (Optional)
For maximum performance, apply these optimizations on the host (not in Docker):
# Lock GPU clocks to maximum (3003 MHz) - prevents throttling
sudo nvidia-smi -lgc 3003,3003
# Enable core clock boost (GPU core > memory clock for compute)
sudo nvidia-smi boost-slider --vboost 1
# Enable persistence mode (reduces driver load latency)
sudo nvidia-smi -pm 1
# Verify settings
nvidia-smi --query-gpu=clocks.sm,clocks.max.sm,persistence_mode --format=csv
Note: GPU clock settings don't persist across reboots due to GB10 firmware behavior. Re-apply after each boot.
ComfyUIMini (Mobile UI)
SparkyUI includes ComfyUIMini - a lightweight, mobile-friendly web UI that runs in a separate container.
Features:
- Responsive design optimized for phones and tablets
- Simplified workflow execution interface
- Built-in image gallery (reads from shared output directory)
- Import workflows from ComfyUI in "API Format"
- Multiple themes (dark, light, aurora, nord, etc.)
How it works:
- Runs as a Node.js Express server in its own container (~150MB)
- Connects to ComfyUI via internal Docker network (
http://comfyui:8188) - Proxies REST API calls and WebSocket connections
- Shares the output directory for gallery viewing
Access: http://<your-dgx-ip>:3000
Build only ComfyUIMini (if ComfyUI already built):
docker compose build comfyuimini
docker compose up -d comfyuimini
Model Manager
SparkyUI includes a StabilityMatrix-style Model Manager - a lightweight FastAPI web app (separate container) for downloading and managing models without touching the command line.
Access: http://<your-dgx-ip>:8189
Features:
- Gallery - browse generated photos from ComfyUI's
output/in a large desktop grid, click for a full-size lightbox view, and permanently delete photos one at a time or all at once (with confirm). - Browse CivitAI - search the CivitAI catalog in a thumbnail grid (filter by type, base model (multi-select), sort, period, NSFW toggle) and click a model to download it - no URL pasting needed. Multi-version models get a version picker on the card. Early Access versions are flagged (they require purchased access on CivitAI and otherwise fail with 401).
- Installed Models - browse what's on disk, grouped by type, with size and delete actions
- Add / Download - paste a download URL and pick a type; live progress bars
- Direct URLs - any direct download link
- CivitAI - paste a model page link (
civitai.com/models/..., thecivitai.redmirror, or anapi/download/models/...link); the type and filename are auto-detected - HuggingFace - paste a
resolveURL (works with gated repos via your token)
- Settings - store your CivitAI API key and HuggingFace token persistently
(saved to a SQLite DB under
./sparkyui-data, never committed to git) - Free GPU memory - a sidebar button that unloads all models from ComfyUI and releases
memory (proxies ComfyUI's
/free). ComfyUI keeps the last model cached for fast reuse when RAM is plentiful, so memory won't drop on its own after switching models - use this to release it on demand. (For automatic unload after every generation, add--disable-smart-memorytoCOMFYUI_FLAGS, at the cost of reloading each run.)
How it works:
- Runs as a FastAPI server in its own container (
python:3.12-slim) - Downloads land in the shared
models/folder, sorted into ComfyUI's standard sub-folders by type (checkpoints/,loras/,vae/,controlnet/,upscale_models/, …) - these are created automatically on first start - ComfyUI mounts the same
models/folder read-only, so new downloads appear in its loaders - Mounts the shared
output/folder read-write for the Gallery's delete feature
Device-aware entry point: open http://<host>:8189/start and it detects your device -
phones are sent to the mobile UI (ComfyUIMini), desktops land on the Model Manager's
Gallery. Append ?force=mobile or ?force=desktop to override. Bookmark /start as your
single SparkyUI link.
ComfyUIMini also gets a "Manage Photos" link in its sidebar that jumps to this Gallery, so you can delete generated photos from the mobile UI too (its built-in gallery is view-only).
Build only the Model Manager (if the rest is already built):
docker compose build model-manager
docker compose up -d model-manager
SageAttention Notes
SageAttention PR #297 added sm_121 support but was merged then reverted due to stability issues. Our approach:
- Build SageAttention from main branch with
TORCH_CUDA_ARCH_LIST="12.1" - Disable Triton via
TORCHDYNAMO_DISABLE=1(Triton doesn't support sm_121a) - This gives working SageAttention without the unstable PR #297 changes
For full Triton support (more complex), see HurbaLurba's DGX-SPARK-COMFYUI-DOCKER which builds custom Triton from source.
Future
When these land, SparkyUI can be simplified:
- PyTorch native sm_121 support → remove explicit
TORCH_CUDA_ARCH_LIST - Triton sm_121 support → remove
TORCHDYNAMO_DISABLE - SageAttention prebuilt ARM64 wheels → remove source build
Credits
- Unified memory architecture insights from HurbaLurba's DGX-SPARK-COMFYUI-DOCKER
- ComfyUIMini by ImDarkTom
- SageAttention by thu-ml
- ComfyUI by comfyanonymous
License
MIT