Add Grace-Blackwell unified memory optimizations

Key changes based on HurbaLurba's DGX Spark research:

- Remove --gpu-only flag (fights unified memory fabric)
- Add --disable-pinned-memory, --force-fp16, --dont-upcast-attention
- Add CUDA env vars for unified memory: CUDA_MANAGED_FORCE_DEVICE_ALLOC,
  PYTORCH_NO_CUDA_MEMORY_CACHING, OMP_NUM_THREADS=20
- Document unified memory architecture best practices
- Add host-level GPU optimization instructions (clock locking, vboost)
- Document SageAttention PR #297 status (merged then reverted)
- Add credits section

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Evan Carmen
2026-01-03 21:01:25 -06:00
parent 1f5aeb5248
commit 687ce72dd3
3 changed files with 82 additions and 2 deletions
+7 -1
View File
@@ -9,7 +9,13 @@ SPARKYUI_DATA_PATH=/path/to/SparkyUI
# ComfyUI settings
COMFYUI_PORT=8188
COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --gpu-only
# Optimized flags for Grace-Blackwell unified memory architecture
# Key: DON'T use --gpu-only - it fights the unified memory fabric
# --disable-pinned-memory: reduces overhead on unified fabric
# --force-fp16 + --fp16-*: enables SageAttention optimization
# --dont-upcast-attention: keeps attention in FP16 for speed
COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory --force-fp16 --fp16-unet --fp16-vae --fp16-text-enc --dont-upcast-attention
# Build refs (pin to specific commits/tags for reproducibility)
COMFYUI_REF=master