Video: SLM inference on AWS Graviton4

Julien Simon
Nov 20, 2024

CPU inference? Hell yes.

In this episode, Lorenzo Winfrey, Jeff Underhill, and I discuss there’s hope beyond huge closed models and expensive GPU instances. Yes, AWS Graviton4 packs a punch and is possibly the most cost-effective platform for SLM inference. To prove our point, I show how to quantize and run our Llama-3.1-SuperNova-Lite model on a small Graviton4 instance. You won’t believe the text generation speed 😃

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Julien Simon
Julien Simon

No responses yet

What are your thoughts?