Yet another 10 Deep Learning projects based on Apache MXNet

Julien Simon
5 min readFeb 25, 2018


In previous articles, I listed 10 Deep Learning projects based on Apache MXNet…. and then 10 more… and what do you know, here is another batch!

Oh, we’ll get there… eventually.


#1 — Dual Path Networks

This is an implementation of the architecture described on the self-titled paper by Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan and Jiashi Feng.

This architecture won the ImageNet 2017 object localization competition with a top-5 error of 6.22%.

Quoting from the paper: “On the ImageNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed”.

#2— Squeeze-and-Excitation Networks

This is an implementation of the architecture described on the self-titled paper by Jie Hu, Li Shen and Gang Sun.

This architecture won the ImageNet 2017 classification competition with a top-5 error of 2.251%.

#3 — Capsule Networks (Symbolic API)

This project implements the CapsNet architecture presented in the “Dynamic Routing Between Capsules” paper by Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. In a nutshell, capsule networks are an exciting new development designed to overcome the limitations of convolutional neural networks.

This code achieves 99.71% accuracy on the MNIST dataset, which is in line with the scores reported in the paper.

#4 — Capsule Networks (Gluon API)

This project also implements the CapsNet architecture, but it does so using the imperative Gluon API (here’s an introduction to Gluon if you’re not familiar with it).

This implementation achieves 99.53% accuracy on MNIST, which the author suggests could be improved by adding more data augmentation.

#5 — MobileNets

This is an implementation of the architecture described in “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications” by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto and Hartwig Adam.

Quoting from the paper: MobileNets are “a class of efficient models (…) for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks”.

A model pre-trained on ImageNet is provided, with a top-5 accuracy of 90.15%.


#6— Face Recognition

This is an implementation of the architecture described in “ArcFace: Additive Angular Margin Loss for Deep Face Recognition” by Jiankang Deng, Jia Guo, and Stefanos Zafeiriou.

InsightFace is a new face recognition method, which achieves state-of-the art scores of 99.80%+ on LFW and 98%+ on Megaface.

#7 — Speech to Text

This is an implementation of the architecture described in “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin” by Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan and Zhenyao Zhu (pfew!).

This is a great project if you want to build a speech-to-text model. Please bear in mind that you will need a very large dataset. Quoting from the original paper: “our English speech system is trained on 11,940 hours of speech, while the Mandarin system is trained on 9,400 hours. We use data synthesis to further augment the data during training”.

#8— 3D face reconstruction

This is an implementation of the architecture described in “End-to-end 3D face reconstruction with deep neural networks” by Pengfei Dou, Shishir K. Shah and Ioannis A. Kakadiaris.

Thanks to this project, you can build a 3D model of a a face using only a single 2D image. Quite impressive!

Examples taken from the original paper


#9 — Deepo

Deepo is a set of pre-built containers for Deep Learning. It supports MXNet as well as other frameworks. Containers will run on Linux (CPU/GPU), Windows (CPU) and MacOS (CPU) with either Python 2.7 or Python 3.6.

This is pretty handy if you want to work locally, and of course on AWS with one of our Docker services: ECS, EKS or Fargate.

#10 — MXNet finetuner

This tool simplifies the process of fine-tuning an image classification dataset on your own dataset (here’s an introduction to fine-tuning if you’re unfamiliar with this technique).

It wil automatically build RecordIO files from a tree of images, download pre-trained models, replace the last layer according to the number of classes in your dataset, add data augmentation, run fine-tuning, visualize results, etc.

Good stuff!

That’s it for today. Kudos to all project authors for their fascinating work. I hope they will inspire you to get started with Deep Learning.

As always, thanks a lot for reading!

One of the most addictive albums I’ve heard in years. Listen once, sing it forever \m/