The mounted patches/model_management.py and patches/utils.py were authored
against an older ComfyUI, but COMFYUI_REF=master clones the latest. Upstream
added the DynamicVRAM/AIMDO system, and main.py now calls
model_management.get_all_torch_devices() (13 functions were missing in total),
causing comfyui to crash-loop on startup with AttributeError.
Regenerated both patches from the current master files and re-applied the
documented Sparky edits on top so they stay API-compatible:
- model_management.py: unified-memory detection, NORMAL_VRAM retention,
95% weight ratio, intermediate_device()->cuda, soft_empty_cache skip
- utils.py: copy=False tensor load on unified memory
comfyui now starts cleanly with DynamicVRAM enabled and the Sparky
unified-memory path active.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Major unified memory optimization changes:
1. model_management.py: HIGH_VRAM → NORMAL_VRAM
- GB10 unified memory: offloading to CPU doesn't save physical RAM
(same pool), but NORMAL_VRAM allows per-layer partial loading when
memory is tight instead of all-or-nothing OOM
- text_encoder_offload_device() and vae_offload_device() now return
CPU (allows ComfyUI to offload unused models)
- intermediate_device() still returns GPU (VAE outputs must stay in
CUDA allocator for honest memory tracking)
- User can force HIGH_VRAM with --highvram if models fit
2. utils.py: copy=True → copy=False for tensor.to(device)
- On GB10 unified memory, copy=True creates a full duplicate in both
CPU and CUDA allocators simultaneously (ComfyUI issue #10896)
- copy=False makes .to(device) a zero-copy device label change since
both allocators draw from the same physical LPDDR5X
- Halves model loading memory usage when --disable-mmap is set
3. Removed --disable-dynamic-vram from ComfyUI flags
- Was preventing AIMDO (comfy_aimdo) from initializing
- AIMDO now activates: VBAR-based page-level VRAM management at 32MB
granularity instead of blunt .to(cpu) copies
- Falls back to NORMAL_VRAM per-layer loading if AIMDO has issues
4. Added CUDA_CACHE_MAXSIZE=4294967296 (4GB kernel cache)
- PTX→SASS kernel caching for sm_121 (GB10 Blackwell)
- 3x speedup on subsequent runs reported by DGX Spark community
5. System: vm.swappiness reduced from 60 to 1
- Swap thrashing on unified memory causes silent system freezes
- Near-zero swappiness ensures clean OOM kills instead