Video: Llama 3 on Amazon SageMaker

Julien Simon
Apr 18, 2024

In this video, I walk you through the simple process of deploying a Llama 3 8B model with Amazon SageMaker.

I use the latest version of the Text Generation Inference containers (TGI 2.0), and show you how to run synchronous inference and streaming inference.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Julien Simon
Julien Simon

Responses (1)

What are your thoughts?

This did not work for me in sagemaker notebooks. I had to use sagemaker's ModelBuilder (with deployment to SAGEMAKER_CONTAINER instead of LOCAL_CONTAINER) in order to deploy Llama-x.x. What instance type and how much memory are you using for the Code Editor or Jupyter Notebook?

1