fix: intermediate_device() returns cuda on unified memory

On Grace-Blackwell (GB10), CPU and GPU share the same physical RAM.
intermediate_device() was returning 'cpu', which means ComfyUI allocates
output buffers (like VAE decode) through the CPU allocator on the same
physical memory pool it thinks is free VRAM. This causes:

1. Memory accounting mismatch — ComfyUI thinks intermediates are 'over
   there' on CPU and overestimates available VRAM
2. Unnecessary .to(device) copies through separate allocator heaps
3. Heap fragmentation across the unified memory pool

Now matches text_encoder_offload_device() and vae_offload_device() which
already return get_torch_device() on UNIFIED_MEMORY.
This commit is contained in:
Evan Carmen
2026-05-21 11:02:06 -05:00
parent 31939a9710
commit c803ea6146
+1 -1
View File
@@ -1106,7 +1106,7 @@ def text_encoder_dtype(device=None):
def intermediate_device():
if args.gpu_only:
if args.gpu_only or UNIFIED_MEMORY:
return get_torch_device()
else:
return torch.device("cpu")