Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)

Docker-based ComfyUI setup for NVIDIA DGX Spark ARM64 + sm_121: - CUDA 13.0.2 base (required for compute_121 support) - PyTorch 2.9.1+cu130 ARM64 wheels - SageAttention compiled with TORCH_CUDA_ARCH_LIST="12.1" - Triton/torch.compile disabled (no sm_121 support yet) - ComfyUI-Manager auto-installed at runtime - Configurable model/data paths via .env 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 20:13:46 -06:00
commit 1f5aeb5248
12 changed files with 391 additions and 0 deletions
@@ -0,0 +1,30 @@
+# Runtime data (mounted as volumes)
+custom_nodes/
+output/
+input/
+workflows/
+
+# Git
+.git/
+.gitignore
+
+# Environment and secrets
+.env
+*.env.local
+
+# Documentation (not needed in image)
+*.md
+CLAUDE.md
+README.md
+LICENSE
+
+# IDE
+.vscode/
+.idea/
+
+# Python cache
+__pycache__/
+*.pyc
+
+# Prebuilt wheels (built separately)
+wheels/
@@ -0,0 +1,16 @@
+# SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)
+# Copy this to .env and customize paths as needed
+
+# Base path where your existing ComfyUI installation lives (for models)
+COMFYUI_HOST_PATH=/path/to/your/ComfyUI
+
+# Base path for SparkyUI data (custom_nodes, outputs, inputs, etc.)
+SPARKYUI_DATA_PATH=/path/to/SparkyUI
+
+# ComfyUI settings
+COMFYUI_PORT=8188
+COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --gpu-only
+
+# Build refs (pin to specific commits/tags for reproducibility)
+COMFYUI_REF=master
+SAGEATTN_REF=main
@@ -0,0 +1,60 @@
+# Project-specific internal docs
+CLAUDE.md
+
+# Environment (contains local paths)
+.env
+*.env.local
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+.venv/
+venv/
+
+# Docker
+.docker/
+
+# OS
+.DS_Store
+Thumbs.db
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# Runtime directories - ignore contents but keep .gitkeep
+custom_nodes/*
+!custom_nodes/.gitkeep
+
+output/*
+!output/.gitkeep
+
+input/*
+!input/.gitkeep
+
+workflows/*
+!workflows/.gitkeep
+
+# Wheels directory - for prebuilt ARM64/sm_121 binaries
+# Ignore contents except .gitkeep (add wheels explicitly if needed)
+wheels/*
+!wheels/.gitkeep
@@ -0,0 +1,58 @@
+# CUDA 13.0 for Blackwell GB10 (sm_121 / compute_121)
+# CUDA 12.8 only supports up to sm_120, but GB10 is sm_121.
+# "devel" includes nvcc so we can compile CUDA extensions like SageAttention.
+FROM nvidia/cuda:13.0.2-devel-ubuntu24.04
+
+ARG DEBIAN_FRONTEND=noninteractive
+ARG COMFYUI_REF=master
+ARG SAGEATTN_REF=main
+
+# Base system deps
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git curl ca-certificates \
+    python3 python3-pip python3-venv python3-dev \
+    build-essential ninja-build cmake pkg-config \
+    && rm -rf /var/lib/apt/lists/*
+
+# Create venv (keeps python deps isolated inside container)
+ENV VENV=/opt/venv
+RUN python3 -m venv $VENV
+ENV PATH="$VENV/bin:$PATH"
+
+# Upgrade packaging tools
+RUN pip install -U pip setuptools wheel
+
+# ---- PyTorch (ARM64 + CUDA 13.0) ----
+# PyTorch cu130 wheels work with CUDA 13.0.x runtime.
+# ARM64 wheels available: torch-2.9.1+cu130, torchvision-0.24.1
+RUN pip install --index-url https://download.pytorch.org/whl/cu130 \
+    torch torchvision
+
+# ---- ComfyUI ----
+RUN git clone https://github.com/comfyanonymous/ComfyUI.git /opt/ComfyUI && \
+    cd /opt/ComfyUI && \
+    git checkout ${COMFYUI_REF} || true
+
+RUN pip install -r /opt/ComfyUI/requirements.txt
+
+# ---- ComfyUI-Manager ----
+# Handled at runtime by entrypoint.sh (clones if missing in mounted volume)
+# This ensures latest version on each container start
+
+# ---- SageAttention ----
+# GB10 is compute capability 12.1 (sm_121).
+# CUDA 13.0 NVCC supports sm_121, so we compile directly for it.
+ENV TORCH_CUDA_ARCH_LIST="12.1"
+ENV CUDA_HOME=/usr/local/cuda
+
+# Build/install SageAttention from repo with sm_121 support
+RUN pip install --no-build-isolation "git+https://github.com/thu-ml/SageAttention@${SAGEATTN_REF}" || true
+
+# Expose ComfyUI
+EXPOSE 8188
+
+# Entry script handles runtime updates / flags
+COPY entrypoint.sh /entrypoint.sh
+RUN chmod +x /entrypoint.sh
+
+ENTRYPOINT ["/entrypoint.sh"]
@@ -0,0 +1,144 @@
+# SparkyUI
+
+**ComfyUI + SageAttention for NVIDIA DGX Spark (Blackwell GB10)**
+
+A Docker-based ComfyUI setup specifically engineered for the DGX Spark's unique ARM64 + Blackwell architecture.
+
+## Why This Exists
+
+The NVIDIA DGX Spark uses the **GB10 GPU** with compute capability **12.1 (sm_121)** - Blackwell architecture. This creates challenges:
+
+| CUDA Version | Max Compute Capability | Can compile for GB10? |
+|--------------|------------------------|----------------------|
+| CUDA 12.8 | sm_120 | **No** |
+| CUDA 13.0+ | sm_121 | **Yes** |
+
+Standard ComfyUI containers and PyTorch wheels don't support sm_121. SparkyUI solves this by:
+
+1. Using **CUDA 13.0.2** base image (supports sm_121)
+2. Installing **PyTorch cu130** ARM64 wheels
+3. Compiling **SageAttention** with `TORCH_CUDA_ARCH_LIST="12.1"`
+4. Disabling **Triton/torch.compile** (doesn't support sm_121 yet)
+
+## Quick Start
+
+```bash
+# Clone
+git clone https://github.com/YOUR_USERNAME/SparkyUI.git
+cd SparkyUI
+
+# Configure paths
+cp .env.example .env
+# Edit .env with your paths
+
+# Build (compiles SageAttention for sm_121 - takes ~10 min)
+docker compose build
+
+# Start
+docker compose up -d
+
+# View logs
+docker compose logs -f
+```
+
+**Access:** http://localhost:8188 (or your DGX Spark's IP on LAN)
+
+## Requirements
+
+- **NVIDIA DGX Spark** (or other GB10-based system)
+- **Docker** with NVIDIA Container Toolkit
+- **NVIDIA Driver** 560+ (tested with 580.95)
+- **~15GB** disk for Docker image
+- **Models** from existing ComfyUI install (mounted read-only)
+
+## Configuration
+
+Copy `.env.example` to `.env` and edit:
+
+```bash
+# Path to your existing ComfyUI models (mounted read-only)
+COMFYUI_HOST_PATH=/path/to/your/ComfyUI
+
+# Path for SparkyUI data (custom_nodes, outputs, inputs)
+SPARKYUI_DATA_PATH=/path/to/SparkyUI
+
+# Optional: pin to specific versions
+COMFYUI_REF=master
+SAGEATTN_REF=main
+```
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     DGX Spark Host                          │
+│  Ubuntu 24.04 (DGX OS 7) / Driver 580.x                     │
+│                                                             │
+│  ┌─────────────────────────────────────────────────────┐   │
+│  │            Docker Container (sparkyui:cu130)         │   │
+│  │                                                      │   │
+│  │  CUDA 13.0.2 + PyTorch 2.9.1+cu130                  │   │
+│  │  SageAttention 2.2.0 (compiled for sm_121)          │   │
+│  │  ComfyUI 0.7.x + ComfyUI-Manager                    │   │
+│  │                                                      │   │
+│  │  Key env vars:                                       │   │
+│  │    TORCH_CUDA_ARCH_LIST="12.1"                      │   │
+│  │    TORCHDYNAMO_DISABLE="1"                          │   │
+│  └─────────────────────────────────────────────────────┘   │
+│                           │                                 │
+│                    Port 8188 (LAN)                          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Version Compatibility
+
+Tested combinations:
+
+| Component | Version | Notes |
+|-----------|---------|-------|
+| CUDA Base | 13.0.2 | Required for sm_121 |
+| PyTorch | 2.9.1+cu130 | ARM64 wheel from PyTorch index |
+| torchvision | 0.24.1+cu130 | ARM64 wheel |
+| SageAttention | 2.2.0 | Compiled with sm_121 |
+| ComfyUI | 0.7.0 | master branch |
+| Driver | 580.95 | DGX OS 7 default |
+
+## Known Limitations
+
+1. **PyTorch Warning**: You'll see a warning about compute capability 12.1 being "outside supported range (8.0-12.0)". This is harmless - PyTorch works, and SageAttention's custom kernels are compiled natively.
+
+2. **torch.compile Disabled**: Triton doesn't support sm_121 yet. `torch.compile()` is disabled via environment variables. Some nodes may run slower than on supported architectures.
+
+3. **No GitHub Actions CI**: Can't build for ARM64 + sm_121 in GitHub's hosted runners. Must build locally on DGX Spark.
+
+## Troubleshooting
+
+### "no kernel image is available for execution on the device"
+Your SageAttention wasn't compiled for sm_121. Rebuild:
+```bash
+docker compose build --no-cache
+```
+
+### PyTorch can't find CUDA
+Ensure NVIDIA Container Toolkit is installed:
+```bash
+nvidia-ctk --version
+docker run --rm --gpus all nvidia/cuda:13.0.2-base-ubuntu24.04 nvidia-smi
+```
+
+### ComfyUI-Manager missing
+The entrypoint auto-clones it. Check logs:
+```bash
+docker compose logs | grep -i manager
+```
+
+## Future
+
+When these land, SparkyUI can be simplified:
+- [ ] PyTorch native sm_121 support → remove explicit `TORCH_CUDA_ARCH_LIST`
+- [ ] Triton sm_121 support → remove `TORCHDYNAMO_DISABLE`
+- [ ] SageAttention prebuilt ARM64 wheels → remove source build
+
+## License
+
+MIT
@@ -0,0 +1,53 @@
+services:
+  comfyui:
+    build:
+      context: .
+      dockerfile: Dockerfile
+      args:
+        # Pin ComfyUI to a known-good commit/tag if desired
+        COMFYUI_REF: "${COMFYUI_REF:-master}"
+        # SageAttention ref (e.g., "main", "v2.2.0", or specific commit)
+        SAGEATTN_REF: "${SAGEATTN_REF:-main}"
+
+    image: sparkyui:cu130
+    container_name: comfyui
+
+    # GPU enablement
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+
+    # LAN exposure
+    ports:
+      - "${COMFYUI_PORT:-8188}:8188"
+
+    environment:
+      COMFYUI_PORT: "${COMFYUI_PORT:-8188}"
+      COMFYUI_FLAGS: "${COMFYUI_FLAGS:---listen 0.0.0.0 --port 8188 --gpu-only}"
+      NVIDIA_VISIBLE_DEVICES: "all"
+      NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
+      # Disable torch.compile/inductor - Triton doesn't support Blackwell sm_121a yet
+      TORCH_COMPILE_DISABLE: "1"
+      TORCHDYNAMO_DISABLE: "1"
+
+    volumes:
+      # Models from existing ComfyUI install (read-only)
+      - ${COMFYUI_HOST_PATH}/models:/opt/ComfyUI/models:ro
+
+      # Custom nodes - comment out to use container-only (fresh) custom_nodes
+      # If mounted, ComfyUI-Manager installs persist across container restarts
+      - ${SPARKYUI_DATA_PATH}/custom_nodes:/opt/ComfyUI/custom_nodes
+
+      # Outputs/inputs/workflows - persistent across restarts
+      - ${SPARKYUI_DATA_PATH}/output:/opt/ComfyUI/output
+      - ${SPARKYUI_DATA_PATH}/input:/opt/ComfyUI/input
+      - ${SPARKYUI_DATA_PATH}/workflows:/opt/ComfyUI/workflows
+
+      # Wheel cache (optional - for prebuilt wheels)
+      - ${SPARKYUI_DATA_PATH}/wheels:/opt/wheels
+
+    restart: unless-stopped
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+COMFY_DIR="/opt/ComfyUI"
+PORT="${COMFYUI_PORT:-8188}"
+FLAGS="${COMFYUI_FLAGS:---listen 0.0.0.0 --port ${PORT}}"
+
+echo "[entrypoint] Python: $(python --version)"
+echo "[entrypoint] Torch:  $(python -c 'import torch; print(torch.__version__)')"
+echo "[entrypoint] CUDA:   $(python -c 'import torch; print(torch.version.cuda)')"
+echo "[entrypoint] Flags:  ${FLAGS}"
+
+# Ensure ComfyUI-Manager exists in mounted custom_nodes
+# Check for __init__.py to detect corrupted/partial installs
+if [[ ! -f "${COMFY_DIR}/custom_nodes/ComfyUI-Manager/__init__.py" ]]; then
+    echo "[entrypoint] ComfyUI-Manager missing or corrupted, cloning latest..."
+    rm -rf "${COMFY_DIR}/custom_nodes/ComfyUI-Manager" 2>/dev/null || true
+    git clone https://github.com/ltdrdata/ComfyUI-Manager.git \
+        "${COMFY_DIR}/custom_nodes/ComfyUI-Manager" || true
+fi
+
+# Install any requirements from custom nodes
+for req in "${COMFY_DIR}"/custom_nodes/*/requirements.txt; do
+    if [[ -f "$req" ]]; then
+        echo "[entrypoint] Installing deps from: $req"
+        pip install -q -r "$req" || true
+    fi
+done
+
+exec python "${COMFY_DIR}/main.py" ${FLAGS}