Deploying SuperNova-Lite on Inferentia2: the best 8B model for $1 an hour!

Sep 19, 2024

In this video, you will learn about Llama-3.1-SuperNova-Lite, the best open-source 8B model available today according to the Hugging Face Open LLM Leaderboard.

Llama-3.1-SuperNova-Lite is an 8B parameter model developed by Arcee.ai, based on the Llama-3.1–8B-Instruct architecture. It is a distilled version of the larger Llama-3.1–405B-Instruct model, leveraging offline logits extracted from the 405B parameter variant. This 8B variation of Llama-3.1-SuperNova maintains high performance while offering exceptional instruction-following capabilities and domain-specific adaptability.

I’ll show you how to compile on the fly and deploy SuperNova Lite on a SageMaker endpoint powered by an inf2.xlarge instance, the smallest Inferentia2 instance available at only $0.99 an hour!

Deploying SuperNova-Lite on Inferentia2: the best 8B model for $1 an hour!

Written by Julien Simon

No responses yet