Demo: audio classification with the Audio Spectrogram Transformer

Multi-modal transformers are rising fast. A great example is the Audio Spectrogram Transformer, an audio classification model that was just added to the Hugging Face Transformers library. This model first creates a spectrogram image of an audio clip and then classifies the image with a Vision Transformer model. Amazing results!

✅ Spaces demo: https://huggingface.co/spaces/juliensimon/keyword-spotting
✅ Model: https://huggingface.co/MIT/ast-finetuned-speech-commands-v2
✅ Paper: https://arxiv.org/abs/2104.01778

--

--

Chief Evangelist, Hugging Face (https://huggingface.co)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store