first commit

2026-03-22 17:26:26 -04:00
commit c05cb71816
15 changed files with 2644 additions and 0 deletions
--- a/CRITICAL_FIX_ANALYSIS.md
+++ b/CRITICAL_FIX_ANALYSIS.md
@@ -0,0 +1,134 @@
+# Critical Blackwell GB10 Fixes for vLLM
+
+## Overview
+
+Three critical fixes are required for vLLM on Blackwell GB10 (sm_121a) GPUs with CUDA 13.0+:
+
+1. **CMakeLists.txt SM120 Support** - Add missing architecture
+2. **vLLM Commit Version** - Use commit with Blackwell/Triton fixes
+3. **Triton Version Pinning** - Use tested working commit
+
+## Fix 1: CMakeLists.txt SM120 Support
+
+### Root Cause
+
+vLLM v0.11.1rc3 CMakeLists.txt has **incomplete architecture support** for Blackwell GB10 (sm_121a) MOE kernels when using CUDA 13.0+.
+
+## The Problem
+
+For CUDA 13.0+, the code uses these branches:
+- **Line 490**: Regular MOE kernels
+- **Line 671**: Grouped MM MOE kernels
+
+Original v0.11.1rc3:
+```cmake
+# Line 490
+cuda_archs_loose_intersection(SCALED_MM_ARCHS "10.0f;11.0f" "${CUDA_ARCHS}")
+
+# Line 671
+cuda_archs_loose_intersection(SCALED_MM_ARCHS "10.0f;11.0f" "${CUDA_ARCHS}")
+```
+
+**BOTH lines are missing `12.0f` (SM120) support!**
+
+## The Fix
+
+Both lines need `12.0f` added:
+```cmake
+# Line 490
+cuda_archs_loose_intersection(SCALED_MM_ARCHS "10.0f;11.0f;12.0f" "${CUDA_ARCHS}")
+
+# Line 671
+cuda_archs_loose_intersection(SCALED_MM_ARCHS "10.0f;11.0f;12.0f" "${CUDA_ARCHS}")
+```
+
+## Error Symptoms
+
+Without this fix:
+```
+ImportError: undefined symbol: _Z20cutlass_moe_mm_sm100RN2at6TensorERKS0_S3_S3_S3_S3_S3_S3_S3_S3_bb
+```
+
+The MOE kernels for SM100/SM120 aren't compiled, causing import failures.
+
+## Why install.sh Works
+
+The sed command on line 323:
+```bash
+sed -i 's/cuda_archs_loose_intersection(SCALED_MM_ARCHS "10.0f;11.0f"/cuda_archs_loose_intersection(SCALED_MM_ARCHS "10.0f;11.0f;12.0f"/' CMakeLists.txt
+```
+
+This replaces **ALL** occurrences, fixing both lines 490 and 671 in one command.
+
+## Verified Solution
+
+Tested on NVIDIA DGX Spark with Blackwell GB10, CUDA 13.0:
+- [OK] Line 490 fixed: `"10.0f;11.0f;12.0f"`
+- [OK] Line 671 fixed: `"10.0f;11.0f;12.0f"`
+- [OK] vLLM imports successfully
+- [OK] No cutlass_moe_mm_sm100 symbol errors
+- [OK] Build time: ~19 minutes
+
+## Fix 2: vLLM Commit Version
+
+### Issue
+
+vLLM tag `v0.11.1rc3` lacks critical Triton/PyTorch Inductor fixes for Blackwell.
+
+### Solution
+
+Use commit `66a168a197ba214a5b70a74fa2e713c9eeb3251a` (6 commits ahead of v0.11.1rc3):
+- Contains Triton JIT compilation fixes
+- Includes PyTorch Inductor optimizations for Blackwell
+- Adds proper backend registration handling
+
+### Installation
+
+```bash
+cd vllm
+git checkout 66a168a197ba214a5b70a74fa2e713c9eeb3251a
+git submodule update --init --recursive
+```
+
+## Fix 3: Triton Version Pinning
+
+### Issue
+
+Latest Triton main branch (as of late October 2025) has intermittent JITFunction compilation issues with PyTorch Inductor on Blackwell.
+
+### Solution
+
+Pin to tested working commit: `4caa0328bf8df64896dd5f6fb9df41b0eb2e750a` (October 25, 2025)
+- Verified stable with Blackwell GB10
+- Passes all compilation tests
+- No JITFunction.constexprs errors
+
+### Installation
+
+```bash
+cd triton
+git checkout 4caa0328bf8df64896dd5f6fb9df41b0eb2e750a
+git submodule update --init --recursive
+python -m pip install --no-build-isolation -v .
+```
+
+## Complete Verified Configuration
+
+| Component | Version/Commit | Notes |
+|-----------|---------------|-------|
+| **vLLM** | `66a168a197ba214a5b70a74fa2e713c9eeb3251a` | 6 commits ahead of v0.11.1rc3 |
+| **Triton** | `4caa0328bf8df64896dd5f6fb9df41b0eb2e750a` | October 25, 2025 |
+| **PyTorch** | `2.9.0+cu130` | From vLLM requirements |
+| **CUDA** | `13.0` (V13.0.88) | System CUDA |
+| **Python** | `3.12.3` | |
+
+## Testing
+
+Verified working with:
+```bash
+python -c "from vllm import LLM, SamplingParams; \
+llm = LLM(model='Qwen/Qwen2.5-0.5B-Instruct', max_model_len=512); \
+print(llm.generate(['Hello'], SamplingParams(max_tokens=20)))"
+```
+
+**All tests pass**: Import, compilation, CUDA graphs, and text generation all work correctly.