T

Evan Carmen 7e4d22e41c feat: Grace-Blackwell unified memory optimization for ComfyUI

- Add model_management.py patch: detects GB10 unified memory (VRAM ≈ RAM > 0.95)
- Set HIGH_VRAM mode: no pointless CPU offloading (same physical memory pool)
- Increase maximum_vram_for_weights from 88% to 95% (8.4GB headroom on 128GB)
- Skip torch.cuda.empty_cache() on unified memory (avoids page faults)
- Return GPU for text_encoder/vae/intermediate offload devices on unified memory
- MPS excluded from unified detection (has its own SHARED state)
- Remove PYTORCH_NO_CUDA_MEMORY_CACHING env var (patch handles caching properly)
- Mount patched file as read-only volume override in docker-compose.yml
- DeepSeek review: safe and correct for DGX Spark target

Co-authored-by: DeepSeek (code review)

2026-05-20 16:01:51 -05:00

comfyuimini

Add ComfyUIMini mobile-friendly UI integration

2026-01-03 23:45:13 -06:00

custom_nodes

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

input

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

output

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

patches

feat: Grace-Blackwell unified memory optimization for ComfyUI

2026-05-20 16:01:51 -05:00

wheels

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

workflows

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

.dockerignore

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

.env.example

Add ComfyUIMini mobile-friendly UI integration

2026-01-03 23:45:13 -06:00

.gitignore

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

docker-compose.yml

feat: Grace-Blackwell unified memory optimization for ComfyUI

2026-05-20 16:01:51 -05:00

Dockerfile

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

entrypoint.sh

Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

2026-01-03 20:28:30 -06:00

README.md

Add ComfyUIMini mobile-friendly UI integration

2026-01-03 23:45:13 -06:00

SPARKY_REVIEW.md

feat: Grace-Blackwell unified memory optimization for ComfyUI

2026-05-20 16:01:51 -05:00

README.md

SparkyUI

ComfyUI + SageAttention for NVIDIA DGX Spark (Blackwell GB10)

A Docker-based ComfyUI setup specifically engineered for the DGX Spark's unique ARM64 + Blackwell architecture.

Why This Exists

The NVIDIA DGX Spark uses the GB10 GPU with compute capability 12.1 (sm_121) - Blackwell architecture. This creates challenges:

CUDA Version	Max Compute Capability	Can compile for GB10?
CUDA 12.8	sm_120	No
CUDA 13.0+	sm_121	Yes

Standard ComfyUI containers and PyTorch wheels don't support sm_121. SparkyUI solves this by:

Using CUDA 13.0.2 base image (supports sm_121)
Installing PyTorch cu130 ARM64 wheels
Compiling SageAttention with TORCH_CUDA_ARCH_LIST="12.1"
Disabling Triton/torch.compile (doesn't support sm_121 yet)
Optimized for Grace-Blackwell unified memory architecture

What's Included

ComfyUI (latest master branch)
ComfyUI-Manager - auto-installed on first run for easy custom node management
ComfyUIMini - mobile-friendly web UI for phones/tablets (separate container)
SageAttention - compiled natively for sm_121 (Blackwell tensor cores)
PyTorch 2.9.1+cu130 - ARM64 wheels with CUDA 13.0 support

Unified Memory Architecture

The DGX Spark's Grace-Blackwell architecture uses unified memory - a coherent memory fabric shared between CPU and GPU. This is fundamentally different from discrete GPUs and requires different optimization strategies.

Key insight: Don't fight the fabric. Forcing everything GPU-side (--gpu-only, --cache-none) actually hurts performance.

Optimized flags (default in SparkyUI):

--disable-pinned-memory   # Reduces overhead on unified fabric
--force-fp16              # Enables SageAttention optimization
--fp16-unet --fp16-vae --fp16-text-enc  # FP16 precision throughout
--dont-upcast-attention   # Keeps attention in FP16 for speed

What NOT to use:

--gpu-only - fights the unified memory fabric, hurts performance
--cache-none - disables natural caching, slows model loading
--disable-mmap - prevents memory-mapped model loading

CUDA environment variables are also tuned for unified memory:

CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 - prefer GPU allocation
PYTORCH_NO_CUDA_MEMORY_CACHING=1 - let fabric manage memory
OMP_NUM_THREADS=20 - utilize all 20 ARM cores

Quick Start

# Clone
git clone https://github.com/ecarmen16/SparkyUI.git
cd SparkyUI

# Configure paths
cp .env.example .env
# Edit .env with your paths

# Build (compiles SageAttention for sm_121 - takes ~10 min)
docker compose build

# Start
docker compose up -d

# View logs
docker compose logs -f

Access:

ComfyUI (Desktop): http://localhost:8188
ComfyUIMini (Mobile): http://localhost:3000

Requirements

NVIDIA DGX Spark (or other GB10-based system)
Docker with NVIDIA Container Toolkit
NVIDIA Driver 560+ (tested with 580.95)
~15GB disk for Docker image
Models from existing ComfyUI install (mounted read-only)

Configuration

Copy .env.example to .env and edit:

# Path to your existing ComfyUI models (mounted read-only)
COMFYUI_HOST_PATH=/path/to/your/ComfyUI

# Path for SparkyUI data (custom_nodes, outputs, inputs)
SPARKYUI_DATA_PATH=/path/to/SparkyUI

# Optional: pin to specific versions
COMFYUI_REF=master
SAGEATTN_REF=main

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        DGX Spark Host                             │
│  Ubuntu 24.04 (DGX OS 7) / Driver 580.x                          │
│                                                                   │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                    Docker Network (sparky_net)              │  │
│  │                                                             │  │
│  │  ┌─────────────────────────┐  ┌──────────────────────────┐ │  │
│  │  │  comfyui (sparkyui:cu130)│  │  comfyuimini (node:20)   │ │  │
│  │  │                         │  │                          │ │  │
│  │  │  CUDA 13.0.2 + PyTorch  │◄─┤  Mobile-friendly UI      │ │  │
│  │  │  SageAttention (sm_121) │  │  REST + WebSocket proxy  │ │  │
│  │  │  ComfyUI + Manager      │  │                          │ │  │
│  │  │                         │  │  Shares /output volume   │ │  │
│  │  └───────────┬─────────────┘  └────────────┬─────────────┘ │  │
│  │              │                             │                │  │
│  └──────────────┼─────────────────────────────┼────────────────┘  │
│                 │                             │                    │
│          Port 8188 (Desktop)           Port 3000 (Mobile)         │
└──────────────────────────────────────────────────────────────────┘

Version Compatibility

Tested combinations:

Component	Version	Notes
CUDA Base	13.0.2	Required for sm_121
PyTorch	2.9.1+cu130	ARM64 wheel from PyTorch index
torchvision	0.24.1+cu130	ARM64 wheel
SageAttention	2.2.0	Compiled with sm_121
ComfyUI	0.7.0	master branch
Driver	580.95	DGX OS 7 default

Known Limitations

PyTorch Warning: You'll see a warning about compute capability 12.1 being "outside supported range (8.0-12.0)". This is harmless - PyTorch works, and SageAttention's custom kernels are compiled natively.
torch.compile Disabled: Triton doesn't support sm_121 yet. torch.compile() is disabled via environment variables. Some nodes may run slower than on supported architectures.
No GitHub Actions CI: Can't build for ARM64 + sm_121 in GitHub's hosted runners. Must build locally on DGX Spark.

Troubleshooting

"no kernel image is available for execution on the device"

Your SageAttention wasn't compiled for sm_121. Rebuild:

docker compose build --no-cache

PyTorch can't find CUDA

Ensure NVIDIA Container Toolkit is installed:

nvidia-ctk --version
docker run --rm --gpus all nvidia/cuda:13.0.2-base-ubuntu24.04 nvidia-smi

ComfyUI-Manager missing

The entrypoint auto-clones it. Check logs:

docker compose logs | grep -i manager

Host-Level GPU Optimizations (Optional)

For maximum performance, apply these optimizations on the host (not in Docker):

# Lock GPU clocks to maximum (3003 MHz) - prevents throttling
sudo nvidia-smi -lgc 3003,3003

# Enable core clock boost (GPU core > memory clock for compute)
sudo nvidia-smi boost-slider --vboost 1

# Enable persistence mode (reduces driver load latency)
sudo nvidia-smi -pm 1

# Verify settings
nvidia-smi --query-gpu=clocks.sm,clocks.max.sm,persistence_mode --format=csv

Note: GPU clock settings don't persist across reboots due to GB10 firmware behavior. Re-apply after each boot.

ComfyUIMini (Mobile UI)

SparkyUI includes ComfyUIMini - a lightweight, mobile-friendly web UI that runs in a separate container.

Features:

Responsive design optimized for phones and tablets
Simplified workflow execution interface
Built-in image gallery (reads from shared output directory)
Import workflows from ComfyUI in "API Format"
Multiple themes (dark, light, aurora, nord, etc.)

How it works:

Runs as a Node.js Express server in its own container (~150MB)
Connects to ComfyUI via internal Docker network (http://comfyui:8188)
Proxies REST API calls and WebSocket connections
Shares the output directory for gallery viewing

Access: http://<your-dgx-ip>:3000

Build only ComfyUIMini (if ComfyUI already built):

docker compose build comfyuimini
docker compose up -d comfyuimini

SageAttention Notes

SageAttention PR #297 added sm_121 support but was merged then reverted due to stability issues. Our approach:

Build SageAttention from main branch with TORCH_CUDA_ARCH_LIST="12.1"
Disable Triton via TORCHDYNAMO_DISABLE=1 (Triton doesn't support sm_121a)
This gives working SageAttention without the unstable PR #297 changes

For full Triton support (more complex), see HurbaLurba's DGX-SPARK-COMFYUI-DOCKER which builds custom Triton from source.

Future

When these land, SparkyUI can be simplified:

PyTorch native sm_121 support → remove explicit TORCH_CUDA_ARCH_LIST
Triton sm_121 support → remove TORCHDYNAMO_DISABLE
SageAttention prebuilt ARM64 wheels → remove source build

Credits

Unified memory architecture insights from HurbaLurba's DGX-SPARK-COMFYUI-DOCKER
ComfyUIMini by ImDarkTom
SageAttention by thu-ml
ComfyUI by comfyanonymous

License

MIT