Video: SLM inference on AWS Graviton4

Nov 20, 2024

CPU inference? Hell yes.

In this episode, Lorenzo Winfrey, Jeff Underhill, and I discuss there’s hope beyond huge closed models and expensive GPU instances. Yes, AWS Graviton4 packs a punch and is possibly the most cost-effective platform for SLM inference. To prove our point, I show how to quantize and run our Llama-3.1-SuperNova-Lite model on a small Graviton4 instance. You won’t believe the text generation speed 😃

Video: SLM inference on AWS Graviton4

Written by Julien Simon

No responses yet