In this deep dive video, we zoom in on two popular techniques for parameter-efficient training, LoRA/QLoRA and Spectrum.
Press enter or click to view image in full size
We discuss their mathematical foundations in detail, including Singular Value Decomposition (SVD). Then, we look at some benchmarks on popular Small Language Models, Mistral-7b and Llama-3.1–8b. We conclude that Spectrum is the better choice, both in terms of training speed and model quality, and is even competitive with the accuracy of full fine-tuning.