In this video, I compare the cost-performance of AWS Trainium, a new custom chip designed by AWS, with NVIDIA A10G GPUs.
I first launch a trn1.32xlarge instance (16 Trainium chips) and a g5.48xlarge (8 A10Gs). Then, I run a natural language processing job, fine-tuning the BERT Large model on the full Yelp review datatset. I use the BF16 data format with the maximum sequence length supported by the model (512). The results? The Trainium job is 5x faster. As the trn1 instance is only 30% more expensive, this is a huge improvement in cost-performance!