Files
TBNilles 399acabd58 feat(model-manager): "Free GPU memory" button to unload ComfyUI models
ComfyUI caches the last model when RAM is plentiful (unified memory), so
memory doesn't drop after switching models even though models are being
swapped, not accumulated. Add a sidebar "Free GPU memory" button that
proxies ComfyUI's POST /free (unload_models + free_memory) via a new
/api/comfyui/free endpoint (COMFYUI_URL env). Verified it releases ~7GB.
README documents this plus the --disable-smart-memory auto-unload option.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 17:14:37 -04:00

14 KiB

SparkyUI

ComfyUI + SageAttention for NVIDIA DGX Spark (Blackwell GB10)

A Docker-based ComfyUI setup specifically engineered for the DGX Spark's unique ARM64 + Blackwell architecture.

Why This Exists

The NVIDIA DGX Spark uses the GB10 GPU with compute capability 12.1 (sm_121) - Blackwell architecture. This creates challenges:

CUDA Version Max Compute Capability Can compile for GB10?
CUDA 12.8 sm_120 No
CUDA 13.0+ sm_121 Yes

Standard ComfyUI containers and PyTorch wheels don't support sm_121. SparkyUI solves this by:

  1. Using CUDA 13.0.2 base image (supports sm_121)
  2. Installing PyTorch cu130 ARM64 wheels
  3. Compiling SageAttention with TORCH_CUDA_ARCH_LIST="12.1"
  4. Disabling Triton/torch.compile (doesn't support sm_121 yet)
  5. Optimized for Grace-Blackwell unified memory architecture

What's Included

  • ComfyUI (latest master branch)
  • ComfyUI-Manager - auto-installed on first run for easy custom node management
  • ComfyUIMini - mobile-friendly web UI for phones/tablets (separate container)
  • Model Manager - StabilityMatrix-style UI to download/manage models (separate container)
  • SageAttention - compiled natively for sm_121 (Blackwell tensor cores)
  • PyTorch 2.9.1+cu130 - ARM64 wheels with CUDA 13.0 support

Unified Memory Architecture

The DGX Spark's Grace-Blackwell architecture uses unified memory - a coherent memory fabric shared between CPU and GPU. This is fundamentally different from discrete GPUs and requires different optimization strategies.

Key insight: Don't fight the fabric. Forcing everything GPU-side (--gpu-only, --cache-none) actually hurts performance.

Optimized flags (default in SparkyUI):

--disable-pinned-memory   # Reduces overhead on unified fabric
--force-fp16              # Enables SageAttention optimization
--fp16-unet --fp16-text-enc  # FP16 precision for UNet + text encoder
--fp32-vae               # VAE in fp32 - fp16 VAE causes NaNs -> BLACK images
--dont-upcast-attention   # Keeps attention in FP16 for speed

Black/blank images? That's the classic fp16-VAE NaN issue, not an NSFW filter (there is none). Keep --fp32-vae (default). --bf16-vae is a faster alternative that also avoids the NaNs.

What NOT to use:

  • --gpu-only - fights the unified memory fabric, hurts performance
  • --cache-none - disables natural caching, slows model loading
  • --disable-mmap - prevents memory-mapped model loading

CUDA environment variables are also tuned for unified memory:

  • CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 - prefer GPU allocation
  • PYTORCH_NO_CUDA_MEMORY_CACHING=1 - let fabric manage memory
  • OMP_NUM_THREADS=20 - utilize all 20 ARM cores

Quick Start

# Clone
git clone https://github.com/ecarmen16/SparkyUI.git
cd SparkyUI

# Configure paths
cp .env.example .env
# Edit .env with your paths

# Build (compiles SageAttention for sm_121 - takes ~10 min)
docker compose build

# Start
docker compose up -d

# View logs
docker compose logs -f

Access:

Requirements

  • NVIDIA DGX Spark (or other GB10-based system)
  • Docker with NVIDIA Container Toolkit
  • NVIDIA Driver 560+ (tested with 580.95)
  • ~15GB disk for Docker image
  • Models from existing ComfyUI install (mounted read-only)

Configuration

Copy .env.example to .env and edit:

# Base path holding the models/ directory (defaults to the project root).
# The Model Manager downloads into <COMFYUI_HOST_PATH>/models; ComfyUI reads it.
COMFYUI_HOST_PATH=.

# Path for SparkyUI data (custom_nodes, outputs, inputs, manager DB).
# Defaults to the project root.
SPARKYUI_DATA_PATH=.

# Ports
COMFYUI_PORT=8188
COMFYUIMINI_PORT=3000
MODEL_MANAGER_PORT=8189

# Optional: pin to specific versions
COMFYUI_REF=master
SAGEATTN_REF=main

Both paths default to the project root, so out of the box models are stored in ./models and the Model Manager's database in ./sparkyui-data. Point COMFYUI_HOST_PATH at an existing ComfyUI install if you'd rather reuse its models.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        DGX Spark Host                             │
│  Ubuntu 24.04 (DGX OS 7) / Driver 580.x                          │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                    Docker Network (sparky_net)              │  │
│  │                                                             │  │
│  │  ┌─────────────────────────┐  ┌──────────────────────────┐ │  │
│  │  │  comfyui (sparkyui:cu130)│  │  comfyuimini (node:20)   │ │  │
│  │  │                         │  │                          │ │  │
│  │  │  CUDA 13.0.2 + PyTorch  │◄─┤  Mobile-friendly UI      │ │  │
│  │  │  SageAttention (sm_121) │  │  REST + WebSocket proxy  │ │  │
│  │  │  ComfyUI + Manager      │  │                          │ │  │
│  │  │                         │  │  Shares /output volume   │ │  │
│  │  └───────────┬─────────────┘  └────────────┬─────────────┘ │  │
│  │              │                             │                │  │
│  └──────────────┼─────────────────────────────┼────────────────┘  │
│                 │                             │                    │
│          Port 8188 (Desktop)           Port 3000 (Mobile)         │
└──────────────────────────────────────────────────────────────────┘

Version Compatibility

Tested combinations:

Component Version Notes
CUDA Base 13.0.2 Required for sm_121
PyTorch 2.9.1+cu130 ARM64 wheel from PyTorch index
torchvision 0.24.1+cu130 ARM64 wheel
SageAttention 2.2.0 Compiled with sm_121
ComfyUI 0.7.0 master branch
Driver 580.95 DGX OS 7 default

Known Limitations

  1. PyTorch Warning: You'll see a warning about compute capability 12.1 being "outside supported range (8.0-12.0)". This is harmless - PyTorch works, and SageAttention's custom kernels are compiled natively.

  2. torch.compile Disabled: Triton doesn't support sm_121 yet. torch.compile() is disabled via environment variables. Some nodes may run slower than on supported architectures.

  3. No GitHub Actions CI: Can't build for ARM64 + sm_121 in GitHub's hosted runners. Must build locally on DGX Spark.

Troubleshooting

"no kernel image is available for execution on the device"

Your SageAttention wasn't compiled for sm_121. Rebuild:

docker compose build --no-cache

PyTorch can't find CUDA

Ensure NVIDIA Container Toolkit is installed:

nvidia-ctk --version
docker run --rm --gpus all nvidia/cuda:13.0.2-base-ubuntu24.04 nvidia-smi

ComfyUI-Manager missing

The entrypoint auto-clones it. Check logs:

docker compose logs | grep -i manager

Host-Level GPU Optimizations (Optional)

For maximum performance, apply these optimizations on the host (not in Docker):

# Lock GPU clocks to maximum (3003 MHz) - prevents throttling
sudo nvidia-smi -lgc 3003,3003

# Enable core clock boost (GPU core > memory clock for compute)
sudo nvidia-smi boost-slider --vboost 1

# Enable persistence mode (reduces driver load latency)
sudo nvidia-smi -pm 1

# Verify settings
nvidia-smi --query-gpu=clocks.sm,clocks.max.sm,persistence_mode --format=csv

Note: GPU clock settings don't persist across reboots due to GB10 firmware behavior. Re-apply after each boot.

ComfyUIMini (Mobile UI)

SparkyUI includes ComfyUIMini - a lightweight, mobile-friendly web UI that runs in a separate container.

Features:

  • Responsive design optimized for phones and tablets
  • Simplified workflow execution interface
  • Built-in image gallery (reads from shared output directory)
  • Import workflows from ComfyUI in "API Format"
  • Multiple themes (dark, light, aurora, nord, etc.)

How it works:

  • Runs as a Node.js Express server in its own container (~150MB)
  • Connects to ComfyUI via internal Docker network (http://comfyui:8188)
  • Proxies REST API calls and WebSocket connections
  • Shares the output directory for gallery viewing

Access: http://<your-dgx-ip>:3000

Build only ComfyUIMini (if ComfyUI already built):

docker compose build comfyuimini
docker compose up -d comfyuimini

Model Manager

SparkyUI includes a StabilityMatrix-style Model Manager - a lightweight FastAPI web app (separate container) for downloading and managing models without touching the command line.

Access: http://<your-dgx-ip>:8189

Features:

  • Gallery - browse generated photos from ComfyUI's output/ in a large desktop grid, click for a full-size lightbox view, and permanently delete photos one at a time or all at once (with confirm).
  • Browse CivitAI - search the CivitAI catalog in a thumbnail grid (filter by type, base model (multi-select), sort, period, NSFW toggle) and click a model to download it - no URL pasting needed. Multi-version models get a version picker on the card. Early Access versions are flagged (they require purchased access on CivitAI and otherwise fail with 401).
  • Installed Models - browse what's on disk, grouped by type, with size and delete actions
  • Add / Download - paste a download URL and pick a type; live progress bars
    • Direct URLs - any direct download link
    • CivitAI - paste a model page link (civitai.com/models/..., the civitai.red mirror, or an api/download/models/... link); the type and filename are auto-detected
    • HuggingFace - paste a resolve URL (works with gated repos via your token)
  • Settings - store your CivitAI API key and HuggingFace token persistently (saved to a SQLite DB under ./sparkyui-data, never committed to git)
  • Free GPU memory - a sidebar button that unloads all models from ComfyUI and releases memory (proxies ComfyUI's /free). ComfyUI keeps the last model cached for fast reuse when RAM is plentiful, so memory won't drop on its own after switching models - use this to release it on demand. (For automatic unload after every generation, add --disable-smart-memory to COMFYUI_FLAGS, at the cost of reloading each run.)

How it works:

  • Runs as a FastAPI server in its own container (python:3.12-slim)
  • Downloads land in the shared models/ folder, sorted into ComfyUI's standard sub-folders by type (checkpoints/, loras/, vae/, controlnet/, upscale_models/, …) - these are created automatically on first start
  • ComfyUI mounts the same models/ folder read-only, so new downloads appear in its loaders
  • Mounts the shared output/ folder read-write for the Gallery's delete feature

Device-aware entry point: open http://<host>:8189/start and it detects your device - phones are sent to the mobile UI (ComfyUIMini), desktops land on the Model Manager's Gallery. Append ?force=mobile or ?force=desktop to override. Bookmark /start as your single SparkyUI link.

ComfyUIMini also gets a "Manage Photos" link in its sidebar that jumps to this Gallery, so you can delete generated photos from the mobile UI too (its built-in gallery is view-only).

Build only the Model Manager (if the rest is already built):

docker compose build model-manager
docker compose up -d model-manager

SageAttention Notes

SageAttention PR #297 added sm_121 support but was merged then reverted due to stability issues. Our approach:

  • Build SageAttention from main branch with TORCH_CUDA_ARCH_LIST="12.1"
  • Disable Triton via TORCHDYNAMO_DISABLE=1 (Triton doesn't support sm_121a)
  • This gives working SageAttention without the unstable PR #297 changes

For full Triton support (more complex), see HurbaLurba's DGX-SPARK-COMFYUI-DOCKER which builds custom Triton from source.

Future

When these land, SparkyUI can be simplified:

  • PyTorch native sm_121 support → remove explicit TORCH_CUDA_ARCH_LIST
  • Triton sm_121 support → remove TORCHDYNAMO_DISABLE
  • SageAttention prebuilt ARM64 wheels → remove source build

Credits

License

MIT