In this video, I show you how to use Amazon SageMaker to train a Transformer model with AWS Trainium and compile it for AWS Inferentia.

Starting from a BERT model and the Yelp review datatset, I first train a multi-class classification model on an ml.trn1.2xlarge instance. I also show you how to reuse the Neuron SDK model cache from one training job to the next, in order to save time and money on repeated jobs. Then, I compile the trained model for Inferentia with a SageMaker Processing batch job, making it easy to automate such tasks.

--

--

In this video, I show you how to accelerate Transformer inference with Optimum, an open-source library by Hugging Face, and Better Transformer, a PyTorch extension available since PyTorch 1.12.

Using an AWS instance equipped with an NVIDIA V100 GPU, I start from a couple of models that I previously fine-tuned: a DistilBERT model for text classification and a Vision Transformer model for image classification. I first benchmark the original models, then I use Optimum and Better Transformer to optimize them with a single line of code, and I benchmark them again. This simple process delivers a 20–30% percent speedup with no accuracy drop!

--

--

In this video, I show you how to accelerate Transformer inference with Inferentia, a custom chip designed by AWS.

Starting from a Hugging Face BERT model that I fine-tuned on AWS Trainium (https://youtu.be/HweP7OYNiIA), I compile it with the Neuron SDK for Inferentia. Then, using an inf1.6xlarge instance (4 Inferentia chips, 16 Neuron Cores), I show you how to use pipeline mode to predict at scale, reaching over 4,000 predictions per second at 3-millisecond latency 🤘

--

--

In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ONNX.

I start from a DistilBERT model fine-tuned for text classification, export it to ONNX format, then optimize it, and finally quantize it. Running benchmarks on an AWS c6i instance (Intel Ice Lake architecture), we speed up the original model more than 2.5x and divide its size by 50%, with just a few lines of simple Python code and without any accuracy drop!

--

--

Following my test of Github Copilot, I was curious how Amazon CodeWhisperer would perform on the same example. So I gave it a try!

In a nutshell, things didn’t go well. I found the service slow, and most of the prompts it generated were irrelevant. It also struggled to keep generating additional lines of code. Failing to trigger on copy-pasted prompted was very annoying (probably an issue with the AWS extension for VS Code).

Unlike Copilot, I wouldn’t use CodeWhisperer to get real work done. The service is still in preview at the time of recording, and one can only hope that it will get much better over time. I’ll give another try when it becomes generally available.

--

--

And now for something completely different! With a few hours to kill in the speaker room, I decided to take a stab at writing Hugging Face code with Github Copilot. No speech on this one, just a light rock music track, I hope you’ll enjoy it.

This video was recorded in one take, with very little editing (kernel crashes, etc.). I didn’t script anything to make Copilot look good or bad. I just opened VS Code, picked a simple example, and played ball. IMHO Copilot did very well. Some suggestions definitely felt like it was reading my mind.

--

--

Building image datasets is hard work. Instead of scraping, cleaning and labeling images, why not generate them directly with a Stable Diffusion model?

In this video, I show you how to generate new images with a Stable Diffusion model and the diffusers library, in order to augment an image classification dataset. Then, I add the new images to the original dataset, and push the augmented dataset to the Hugging Face hub. Finally, I fine-tune an existing model on the augmented dataset.

--

--