Yet another 10 Deep Learning projects based on Apache MXNet
#1 — Dual Path Networks
This is an implementation of the architecture described on the self-titled paper by Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan and Jiashi Feng.
This architecture won the ImageNet 2017 object localization competition with a top-5 error of 6.22%.
Quoting from the paper: “On the ImageNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed”.
#2— Squeeze-and-Excitation Networks
This is an implementation of the architecture described on the self-titled paper by Jie Hu, Li Shen and Gang Sun.
This architecture won the ImageNet 2017 classification competition with a top-5 error of 2.251%.
SENet.mxnet — :fire::fire: A MXNet implementation of Squeeze-and-Excitation Networks (SE-ResNext, SE-Resnet…
#3 — Capsule Networks (Symbolic API)
This project implements the CapsNet architecture presented in the “Dynamic Routing Between Capsules” paper by Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. In a nutshell, capsule networks are an exciting new development designed to overcome the limitations of convolutional neural networks.
This code achieves 99.71% accuracy on the MNIST dataset, which is in line with the scores reported in the paper.
#4 — Capsule Networks (Gluon API)
This project also implements the CapsNet architecture, but it does so using the imperative Gluon API (here’s an introduction to Gluon if you’re not familiar with it).
This implementation achieves 99.53% accuracy on MNIST, which the author suggests could be improved by adding more data augmentation.
#5 — MobileNets
This is an implementation of the architecture described in “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto and Hartwig Adam.
Quoting from the paper: MobileNets are “a class of efficient models (…) for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks”.
A model pre-trained on ImageNet is provided, with a top-5 accuracy of 90.15%.
#6— Face Recognition
This is an implementation of the architecture described in “ArcFace: Additive Angular Margin Loss for Deep Face Recognition” by Jiankang Deng, Jia Guo, and Stefanos Zafeiriou.
#7 — Speech to Text
This is an implementation of the architecture described in “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin” by Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan and Zhenyao Zhu (pfew!).
This is a great project if you want to build a speech-to-text model. Please bear in mind that you will need a very large dataset. Quoting from the original paper: “our English speech system is trained on 11,940 hours of speech, while the Mandarin system is trained on 9,400 hours. We use data synthesis to further augment the data during training”.
deepspeech.mxnet - A MXNet implementation of Baidu's DeepSpeech architecture
#8— 3D face reconstruction
This is an implementation of the architecture described in “End-to-end 3D face reconstruction with deep neural networks” by Pengfei Dou, Shishir K. Shah and Ioannis A. Kakadiaris.
Thanks to this project, you can build a 3D model of a a face using only a single 2D image. Quite impressive!
mxnet-E2FAR - MXNET/Gluon Implementation of End-to-end 3D Face Reconstruction with Deep Neural Networks
#9 — Deepo
Deepo is a set of pre-built containers for Deep Learning. It supports MXNet as well as other frameworks. Containers will run on Linux (CPU/GPU), Windows (CPU) and MacOS (CPU) with either Python 2.7 or Python 3.6.
deepo - A series of Docker images (and their generator) that allows you to quickly set up your deep learning research…
#10 — MXNet finetuner
This tool simplifies the process of fine-tuning an image classification dataset on your own dataset (here’s an introduction to fine-tuning if you’re unfamiliar with this technique).
It wil automatically build RecordIO files from a tree of images, download pre-trained models, replace the last layer according to the number of classes in your dataset, add data augmentation, run fine-tuning, visualize results, etc.
That’s it for today. Kudos to all project authors for their fascinating work. I hope they will inspire you to get started with Deep Learning.
As always, thanks a lot for reading!
One of the most addictive albums I’ve heard in years. Listen once, sing it forever \m/