Video: Transformer training shootout, part 2: AWS Trainium vs. NVIDIA V100
In this video, I compare the cost/performance of AWS Trainium with the NVIDIA V100 GPU.
I first launch a trn1.32xlarge instance (16 Trainium chips) and a p3dn.24xlarge (8 V100s). Then, I run 3 benchmarks: language pretraining with GPT2, token classification with BERT Large, and image classification with the Vision Transformer.
The results? Trainium is 2 to 5x faster, and 3 to 8x cheaper!