Commit Graph

7 Commits

Author SHA1 Message Date
Evan Carmen c803ea6146 fix: intermediate_device() returns cuda on unified memory
On Grace-Blackwell (GB10), CPU and GPU share the same physical RAM.
intermediate_device() was returning 'cpu', which means ComfyUI allocates
output buffers (like VAE decode) through the CPU allocator on the same
physical memory pool it thinks is free VRAM. This causes:

1. Memory accounting mismatch — ComfyUI thinks intermediates are 'over
   there' on CPU and overestimates available VRAM
2. Unnecessary .to(device) copies through separate allocator heaps
3. Heap fragmentation across the unified memory pool

Now matches text_encoder_offload_device() and vae_offload_device() which
already return get_torch_device() on UNIFIED_MEMORY.
2026-05-21 11:02:06 -05:00
Evan Carmen 31939a9710 fix: revert intermediate_device to cpu for unified memory
intermediate_device() controls where large output tensors (decoded video
frames) are accumulated. On unified memory, cpu and cuda:0 share the same
physical RAM, but the CUDA allocator has different fragmentation behavior.

With intermediate_device=cuda:0, LTX video VAE decode hung because
tiled_scale_multidim allocates the full output tensor on cuda:0 upfront,
and the CUDA allocator can't efficiently reclaim space during tiled
decode. Reverting to cpu fixes the hang.

vae_offload_device() and text_encoder_offload_device() remain cuda:0
since those model-loading paths benefit from GPU allocation.
2026-05-20 19:30:53 -05:00
Evan Carmen 7e4d22e41c feat: Grace-Blackwell unified memory optimization for ComfyUI
- Add model_management.py patch: detects GB10 unified memory (VRAM ≈ RAM > 0.95)
- Set HIGH_VRAM mode: no pointless CPU offloading (same physical memory pool)
- Increase maximum_vram_for_weights from 88% to 95% (8.4GB headroom on 128GB)
- Skip torch.cuda.empty_cache() on unified memory (avoids page faults)
- Return GPU for text_encoder/vae/intermediate offload devices on unified memory
- MPS excluded from unified detection (has its own SHARED state)
- Remove PYTORCH_NO_CUDA_MEMORY_CACHING env var (patch handles caching properly)
- Mount patched file as read-only volume override in docker-compose.yml
- DeepSeek review: safe and correct for DGX Spark target

Co-authored-by: DeepSeek (code review)
2026-05-20 16:01:51 -05:00
Evan Carmen 15fc70663f Add ComfyUIMini mobile-friendly UI integration
New features:
- ComfyUIMini container (Node.js Alpine, ~150MB) for mobile/tablet access
- Separate container architecture with shared Docker network
- Health checks on both services with proper dependency ordering
- Shared output volume for image gallery feature

Files added:
- comfyuimini/Dockerfile - Node.js 20 Alpine with tsx runtime
- comfyuimini/.dockerignore - Build context filtering

Files updated:
- docker-compose.yml - Added comfyuimini service, network, health checks
- .env.example - Added COMFYUIMINI_PORT and COMFYUIMINI_REF
- README.md - Architecture diagram, ComfyUIMini docs, updated credits

Access points:
- ComfyUI (Desktop): http://<host>:8188
- ComfyUIMini (Mobile): http://<host>:3000

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 23:45:13 -06:00
Evan Carmen 434a90741c Add What's Included section, fix repo URL
- Add "What's Included" section listing ComfyUI, ComfyUI-Manager,
  SageAttention, and PyTorch versions
- Update clone URL to actual GitHub repo (ecarmen16/SparkyUI)
- ComfyUI-Manager is auto-installed on first container run

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:27:42 -06:00
Evan Carmen 687ce72dd3 Add Grace-Blackwell unified memory optimizations
Key changes based on HurbaLurba's DGX Spark research:

- Remove --gpu-only flag (fights unified memory fabric)
- Add --disable-pinned-memory, --force-fp16, --dont-upcast-attention
- Add CUDA env vars for unified memory: CUDA_MANAGED_FORCE_DEVICE_ALLOC,
  PYTORCH_NO_CUDA_MEMORY_CACHING, OMP_NUM_THREADS=20
- Document unified memory architecture best practices
- Add host-level GPU optimization instructions (clock locking, vboost)
- Document SageAttention PR #297 status (merged then reverted)
- Add credits section

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:01:25 -06:00
Evan Carmen 1f5aeb5248 Initial commit: SparkyUI - ComfyUI for DGX Spark (Blackwell GB10)
Docker-based ComfyUI setup for NVIDIA DGX Spark ARM64 + sm_121:
- CUDA 13.0.2 base (required for compute_121 support)
- PyTorch 2.9.1+cu130 ARM64 wheels
- SageAttention compiled with TORCH_CUDA_ARCH_LIST="12.1"
- Triton/torch.compile disabled (no sm_121 support yet)
- ComfyUI-Manager auto-installed at runtime
- Configurable model/data paths via .env

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 20:28:30 -06:00