Video deep dive: Advanced distributed training with Hugging Face LLMs and AWS Trainium

Julien Simon
Jan 23, 2024

Following up on my recent “Hugging Face on AWS accelerators” deep dive, this new video zooms in on distributed training with NeuronX Distributed Optimum Neuron and AWS Trainium.

First, we explain the basics and benefits of advanced distributed techniques like tensor parallelism, pipeline parallelism, sequence parallelism, and DeepSpeed ZeRO. Then, we discuss how these techniques are implemented in NeuronX Distributed and Optimum. Finally, we launch an Amazon EC2 Trainium-powered instance and demonstrate these techniques with distributed training runs on the TinyLlama and Llama 2 7B models.

Of course, we share results on training time and cost, which will probably surprise you!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Julien Simon
Julien Simon

No responses yet

What are your thoughts?