Apache MXNet support in Keras

3 min readJun 8, 2017

Pretty exciting news last week. François Chollet, author of the popular Keras Deep Learning library, has announced that Apache MXNet is now available as a backend, alongside TensorFlow and Theano.

This is still beta code, but it won’t prevent us from exploring, will it? :)

A word about Keras

In their own words, Keras is “a high-level neural networks API, written in Python and developed with a focus on enabling fast experimentation”.

Indeed, Keras is pretty popular in the Deep Learning community, certainly because it provides a higher level API, allowing beginners and seasoned practitioners alike to quickly build and train networks without messing with low-level details.

Keras is also the main tool used in the equally popular — and highly recommended — Deep Learning course by fast.ai.

Keras backends

In a nutshell, Keras is a wrapper around Deep Learning backends. Initially, it supported TensorFlow and Theano, but sure enough, the community asked for MXNet support.

Lo and behold, the good folks at DMLC got to work and MXNet support is now available in their forked repository. If you’re curious about the Keras-to-MXNet API, it’s here. Well done, guys!

Setting up Keras with MXNet

In the next steps, I’m using a g2.2xlarge instance on AWS, running the very latest Deep Learning AMI, Ubuntu edition (lots of updates!).

I’m assuming that you already have MXNet installed: if not, please read this.

First, we need to install the forked Keras version. Please use virtualenv if you’re concerned about messing up your Python environment.

Then, we need to declare MXNet as the backend for Keras. Let’s edit ~/.keras/keras.json.

Now, let’s start python and make sure our setup is correct.

Looking good. As far as I can see, the fork is based on Keras v1: until it’s merged with the upstream repository, Keras v2 features won’t be available.

Let’s run a quick example. We’ve talked at length about the MNIST dataset, so how about we use it again? You’ll find the Keras code here and it’s pretty interesting to compare it to its MXNet equivalent. C’mon, isn’t the MXNet version neater? ;)

All right, that worked. So there you go: MXNet on Keras!

GPU training

The previous example ran on the instance CPU. Although Keras does support GPU computing, setup is unfortunately backend-dependent. When it comes to multi-GPU support, setup is awkward at best (long discussion here). Hopefully, this will be fixed elegantly in future releases.

When it comes to MXNet, GPU support doesn’t seem to be complete right now. GPUs are not detected automatically (like for Tensorflow) and I couldn’t find a reliable way to enable a GPU context.

Code is definitely in place: here’s the relevant snippet from https://github.com/dmlc/keras/blob/master/keras/engine/training.py

This should allow me to modify the MNIST example above to use a GPU context.

But the damn thing segfaults when I run the modified example :-/ Maybe it hasn’t been fully tested on the latest AMI, who knows. Or maybe I’m too dumb to figure it out: that’s always a strong possibility. Or too impatient. Please reach out if you have the answer :)

This is certainly going to work anytime soon. I’ll keep you posted!

Next steps

Having MXNet support in Keras is great news. Thanks again to everyone involved.

I can’t wait to get proper GPU and multi-GPU support for all backends. I think we’d all love a solution as simple and elegant as the MXNet way, i.e. context=(mx.gpu(0), mx.gpu(1), mx.gpu(2)). It would definitely allow Keras users to enjoy the near-linear scalability of MXNet while using their existing Keras code.

And once this is available, I’ll run some more benchmarks ;)

That’s it for today. Thanks for reading.