diff --git a/.env.example b/.env.example index 57226c9..24b5bd1 100644 --- a/.env.example +++ b/.env.example @@ -16,8 +16,10 @@ COMFYUI_PORT=8188 # Key: DON'T use --gpu-only - it fights the unified memory fabric # --disable-pinned-memory: reduces overhead on unified fabric # --force-fp16 + --fp16-*: enables SageAttention optimization +# --fp32-vae: run the VAE in fp32 - fp16 VAE produces NaNs -> BLACK images on +# many SD1.5/SDXL checkpoints. (Use --bf16-vae instead for a faster compromise.) # --dont-upcast-attention: keeps attention in FP16 for speed -COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory --force-fp16 --fp16-unet --fp16-vae --fp16-text-enc --dont-upcast-attention +COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory --force-fp16 --fp16-unet --fp32-vae --fp16-text-enc --dont-upcast-attention # Build refs (pin to specific commits/tags for reproducibility) COMFYUI_REF=master diff --git a/README.md b/README.md index da69b81..a5d0ec6 100644 --- a/README.md +++ b/README.md @@ -40,10 +40,15 @@ The DGX Spark's Grace-Blackwell architecture uses **unified memory** - a coheren ```bash --disable-pinned-memory # Reduces overhead on unified fabric --force-fp16 # Enables SageAttention optimization ---fp16-unet --fp16-vae --fp16-text-enc # FP16 precision throughout +--fp16-unet --fp16-text-enc # FP16 precision for UNet + text encoder +--fp32-vae # VAE in fp32 - fp16 VAE causes NaNs -> BLACK images --dont-upcast-attention # Keeps attention in FP16 for speed ``` +> **Black/blank images?** That's the classic fp16-VAE NaN issue, not an NSFW +> filter (there is none). Keep `--fp32-vae` (default). `--bf16-vae` is a faster +> alternative that also avoids the NaNs. + **What NOT to use:** - `--gpu-only` - fights the unified memory fabric, hurts performance - `--cache-none` - disables natural caching, slows model loading