Open in app

Sign in

Write

Sign in

Julien Simon
Julien Simon

7K Followers

Home

About

Pinned

Next public talks

Here’s the current list of public events I’ll be speaking at. I will keep it as up-to-date as possible! I’m always open to speaking at public events (online or in-person). I’m also happy to explore opportunities for in-house talks. Don’t hesitate to get in touch with details at julsimon@huggingface.co. September-November…

3 min read

3 min read


Nov 2

Retrieval Augmented Chatbot, part 2! LangChain, Hugging Face, Amazon SageMaker, and Amazon OpenSearch Serverless 😀

We start by deploying Mistral 7B, a cutting-edge open-source LLM, onto a SageMaker endpoint. Following this, we work with the Reuters dataset, a Hugging Face dataset comprising 20,000 news articles. We break down these articles into smaller sections and apply bge-small, a compact open-source embedding model, to them. Next, we proceed to index these sections into an Amazon OpenSearch Serverless vector index, which we then query through LangChain. Additionally, aside from the RAG demonstration, we delve into some vital yet often overlooked steps related to authentication and security for OpenSearch Serverless.

AWS

1 min read

AWS

1 min read


Oct 24

Building a Retrieval-Augmented Generation (RAG) Chatbot with LangChain, Hugging Face, and AWS

SIn this video, I’ll guide you through the process of creating a Retrieval-Augmented Generation (RAG) chatbot using open-source tools and AWS services, such as LangChain, Hugging Face, FAISS, Amazon SageMaker, and Amazon TextTract. We begin by working with PDF files in the Energy domain. Our first step involves leveraging…

AWS

1 min read

AWS

1 min read


Oct 18

Maximize Hugging Face training efficiency with QLoRA

In this video, I delve into optimizing the fine-tuning of a Google FLAN-T5 model for legal text summarization. The focus is on employing QLoRA for parameter-efficient fine-tuning: all it takes is a few extra lines of simple code in your existing script. This methodology allows us to train the model with remarkable cost efficiency, utilizing even modest GPU instances, which I demonstrate on AWS with Amazon SageMaker. Tune in for a detailed exploration of the technical nuances behind this process.

AWS

1 min read

AWS

1 min read


Oct 10

For a fistful of dollars: fine-tune LLaMA 2 7B with QLoRA

Fine-tuning large language models doesn’t have to be complicated and expensive. In this tutorial, I provide a step-by-step demonstration of the fine-tuning process for a LLaMA 2 7-billion parameter model. Thanks to LoRA, 4-bit quantization and a modest AWS GPU instance (g5.xlarge), total cost is just a fistful of dollars 🤠 🤠 🤠

AWS

1 min read

For a fistful of dollars: fine-tune LLaMA 2 7B with QLoRA
For a fistful of dollars: fine-tune LLaMA 2 7B with QLoRA
AWS

1 min read


Oct 9

Fine-tune Stable Diffusion with LoRA for as low as $1

Fine-tuning large models doesn’t have to be complicated and expensive. In this tutorial, I provide a step-by-step demonstration of the fine-tuning process for a Stable Diffusion model geared towards Pokemon image generation. Utilizing a pre-existing script sourced from the Hugging Face diffusers library, the configuration is set to leverage the LoRA algorithm from the Hugging Face PEFT library. The training procedure is executed on a modest AWS GPU instance (g4dn.xlarge), optimizing cost-effectiveness through the utilization of EC2 Spot Instances, resulting in a total cost as low as $1.

AWS

1 min read

AWS

1 min read


Oct 8

Azure ML: start experimenting with Hugging Face models in minutes!

In this video, I show you how to deploy Hugging Face models in one click on Azure, thanks to the model catalog in Azure ML Studio. Then, I run a small Python example to predict with the model. To get started, you simply need to navigate to the Azure ML Studio website and open the model catalog. Then, you can click on a model to select it. This will initiate the setup process, which takes care of all the required infrastructure for you. Once the setup is complete, Azure ML Studio provides a sample program and you can start testing the model immediately!

Azure

1 min read

Azure

1 min read


Oct 8

SageMaker JumpStart: start experimenting with large language models in minutes!

Experimenting with the latest and greatest models doesn’t have to be difficult. With SageMaker JumpStart, you can easily access and experiment with cutting-edge large language models without the hassle of setting up complex infrastructure or writing deployment code. All it takes is a single click. In this particular video, I…

AWS

1 min read

AWS

1 min read


Sep 15

Accelerating Stable Diffusion with Optimum Neuron and AWS Inferentia2

In this video, I show you how to accelerate Stable Diffusion and Stable Diffusion XL inference with the Hugging Face Optimum Neuron library and AWS Inferentia 2. A few lines of code is all it takes, and of course, we run some benchmarks.

Hugging Face

1 min read

Hugging Face

1 min read


May 17

Video: Transformer training shootout, part 2: AWS Trainium vs. NVIDIA V100

In this video, I compare the cost/performance of AWS Trainium with the NVIDIA V100 GPU. I first launch a trn1.32xlarge instance (16 Trainium chips) and a p3dn.24xlarge (8 V100s). Then, I run 3 benchmarks: language pretraining with GPT2, token classification with BERT Large, and image classification with the Vision Transformer. The results? Trainium is 2 to 5x faster, and 3 to 8x cheaper!

Deep Learning

1 min read

Deep Learning

1 min read

Julien Simon

Julien Simon

7K Followers

Chief Evangelist, Hugging Face - Follow me on Substack at https://julsimon.substack.com/

Following
  • Netflix Technology Blog

    Netflix Technology Blog

  • Adrian Hornsby

    Adrian Hornsby

  • Danilo Poccia

    Danilo Poccia

  • Jeff Barr

    Jeff Barr

  • Less Wright

    Less Wright

See all (18)

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams