Files
SparkyUI/patches
Evan Carmen 6fa6c5041b feat: NORMAL_VRAM + AIMDO + copy=False patch + kernel caching
Major unified memory optimization changes:

1. model_management.py: HIGH_VRAM → NORMAL_VRAM
   - GB10 unified memory: offloading to CPU doesn't save physical RAM
     (same pool), but NORMAL_VRAM allows per-layer partial loading when
     memory is tight instead of all-or-nothing OOM
   - text_encoder_offload_device() and vae_offload_device() now return
     CPU (allows ComfyUI to offload unused models)
   - intermediate_device() still returns GPU (VAE outputs must stay in
     CUDA allocator for honest memory tracking)
   - User can force HIGH_VRAM with --highvram if models fit

2. utils.py: copy=True → copy=False for tensor.to(device)
   - On GB10 unified memory, copy=True creates a full duplicate in both
     CPU and CUDA allocators simultaneously (ComfyUI issue #10896)
   - copy=False makes .to(device) a zero-copy device label change since
     both allocators draw from the same physical LPDDR5X
   - Halves model loading memory usage when --disable-mmap is set

3. Removed --disable-dynamic-vram from ComfyUI flags
   - Was preventing AIMDO (comfy_aimdo) from initializing
   - AIMDO now activates: VBAR-based page-level VRAM management at 32MB
     granularity instead of blunt .to(cpu) copies
   - Falls back to NORMAL_VRAM per-layer loading if AIMDO has issues

4. Added CUDA_CACHE_MAXSIZE=4294967296 (4GB kernel cache)
   - PTX→SASS kernel caching for sm_121 (GB10 Blackwell)
   - 3x speedup on subsequent runs reported by DGX Spark community

5. System: vm.swappiness reduced from 60 to 1
   - Swap thrashing on unified memory causes silent system freezes
   - Near-zero swappiness ensures clean OOM kills instead
2026-05-21 19:04:25 -05:00
..