Julien Simon

In this video, I demo SageMaker Studio Lab, a managed Jupyter notebook service running on AWS.

Using a CPU runtime, I first run a simple Hugging Face example based on the Pipeline API. Then, I switch to a GPU runtime and run a fine-tuning example on DistilBERT with the Trainer API.

Along the way, I inspect the infrastructure resources available in both runtimes. I also show you how to create your own conda environment to keep your dependencies neatly organized.

⭐️⭐️⭐️ Don’t forget to subscribe to be notified of future videos ⭐️⭐️⭐️

New to Transformers? Check out the Hugging Face course.

--

--

Photo by Joshua Woroniecki on Unsplash

In this video, I demo this newly launched capability, named Serverless Inference. Starting from a pre-trained DistilBERT model on the Hugging Face model hub, I fine-tune it for sentiment analysis on the IMDB movie review dataset. Then, I deploy the model to a serverless endpoint, and I run multi-threaded benchmarks with short and long token sequences. Finally, I plot latency numbers and compute latency quantiles.

⭐️⭐️⭐️ Don’t forget to subscribe to be notified of future videos ⭐️⭐️⭐️

Notebook: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/serverless-inference

Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

New to Transformers? Check out the Hugging Face course at https://huggingface.co/course

More content at plainenglish.io. Sign up for our free weekly newsletter here.

--

--