From Loss=36 to Convergence: Integrating Whisper+Gemma2 into Megatron's TransformerEngineFour bugs we had to fix to get our AudioLLM training stably inside Megatron's TransformerEngineApr 26, 2026·9 min read·48