An introduction to the MXNet API — part 2
In part 1, we covered some MXNet basics and then discussed the NDArray API (tldr: NDArrays is where we’re going to store data, parameters, etc).
Now that we’ve got data covered, it’s time to look at how MXNet defines computation steps.
Computation steps? You mean code, right?
That’s a fair question! Haven’t we all learned that “program = data structures + code”? NDArrays are our data structures, let’s just add code!
Well yes, we could to that. We’d have to define all the steps explicitly and run them sequentially on our data. This is called “imperative programming” and it’s how Fortran, Pascal, C, C++ and so on work. Nothing wrong with that.
However, neural networks are intrinsically parallel beasts: inside a given layer, all outputs can be computed simultaneously. Independent layers could also run in parallel. So, in order to get good performance, we’d have to implement parallel processing ourselves using multithreading or something similar. We know how that usually works out. And even if we got the code right, how reusable would it be if data size or network layout kept changing?
Fortunately, there is an alternative.
Dataflow programming
“Dataflow programming” is a flexible way of defining parallel computation, where data flows through a graph. The graph defines the order of operations, i.e. whether they need to be run sequentially or whether they may be run in parallel. Each operation is a black box: we only define its input and output, without specifying its actual behaviour.
This might sound like Computer Science mumbo jumbo, but this model is exactly what we need to define neural networks : let input data flow through an ordered sequence of operations called “layers”, with each layer running many instructions in parallel.
Enough talk. Let’s look at an example. This is how we would define E as (A*B) + (C*D).
What A,B,C and D are is irrelevant at this point. They are symbols.
No matter what the inputs are (integers, vectors, matrices, etc.), this graph tells us how to compute the output value — provided that operations “+” and “*” are defined.
This graph also tells us that (A*B) and (C*D) can be computed in parallel.
Of course, MXNet will use this information for optimisation purposes.
The Symbol API
So now we know why these things are called symbols (not a minor victory!). Let’s see if we can code the example above.
>>> import mxnet as mx
>>> a = mx.symbol.Variable('A')
>>> b = mx.symbol.Variable('B')
>>> c = mx.symbol.Variable('C')
>>> d = mx.symbol.Variable('D')
>>> e = (a*b)+(c*d)
See? This is perfectly valid code. We can assign a result to e without knowing what a, b, c and d are. Let’s keep going.
>>> (a,b,c,d)
(<Symbol A>, <Symbol B>, <Symbol C>, <Symbol D>)
>>> e
<Symbol _plus1>
>>> type(e)
<class 'mxnet.symbol.Symbol'>
a, b, c and d are symbols which we explicitly declared. e is different: it is a symbol as well, but one that is the result of a ‘+’ operation. Let’s try to learn more about e.
>>> e.list_arguments()
['A', 'B', 'C', 'D']
>>> e.list_outputs()
['_plus1_output']
>>> e.get_internals().list_outputs()
['A', 'B', '_mul0_output', 'C', 'D', '_mul1_output', '_plus1_output']
What this tells us is that:
- e depends on variables a, b, c and d,
- the operation that computes e is a sum,
- e is indeed (a*b)+(c*d).
Of course, we can do much more with symbols than ‘+’ and ‘*’. Just like for NDArrays, a lot of operations are defined (math, formatting, etc.). You should take some time to explore the API.
So now we know how do define our computing steps. Let’s see how we can apply them to actual data.
Binding NDArrays and Symbols
Applying computing steps defined with Symbols to data stored in NDArrays requires an operation called ‘binding’, i.e. assigning an NDArray to each input variable of the graph.
Let’s continue with the example above. Here, I’d like to set ‘A’ to 1, ‘B’ to 2, C to ‘3’ and ‘D’ to 4, which is why I’m creating 4 NDArrays containing a single integer.
>>> import numpy as np
>>> a_data = mx.nd.array([1], dtype=np.int32)
>>> b_data = mx.nd.array([2], dtype=np.int32)
>>> c_data = mx.nd.array([3], dtype=np.int32)
>>> d_data = mx.nd.array([4], dtype=np.int32)
Next, I’m binding each NDArray to its corresponding Symbol. Please note that I have to select the context (CPU or GPU) where execution will take place.
>>> executor=e.bind(mx.cpu(), {'A':a_data, 'B':b_data, 'C':c_data, 'D':d_data})
>>> executor
<mxnet.executor.Executor object at 0x10da6ec90>
Now, it’s time to let our input data flow through the graph in order to get a result: the forward() function will get things going. It returns an array of NDArrays, because a graph could have multiple outputs. Here, we have a single output, holding the value ‘14’ — which is reassuringly equal to (1*2)+(3*4).
>>> e_data = executor.forward()
>>> e_data
[<NDArray 1 @cpu(0)>]
>>> e_data[0]
<NDArray 1 @cpu(0)>
>>> e_data[0].asnumpy()
array([14], dtype=int32)
Let’s apply the same graph to four 1000 x 1000 matrices filled with random floats between 0 and 1. All we have to do is define new input data: binding and computing are identical.
>>> a_data = mx.nd.uniform(low=0, high=1, shape=(1000,1000))
>>> b_data = mx.nd.uniform(low=0, high=1, shape=(1000,1000))
>>> c_data = mx.nd.uniform(low=0, high=1, shape=(1000,1000))
>>> d_data = mx.nd.uniform(low=0, high=1, shape=(1000,1000))>>> executor=e.bind(mx.cpu(), {'A':a_data, 'B':b_data, 'C':c_data, 'D':d_data})
>>> e_data = executor.forward()>>> e_data
[<NDArray 1000x1000 @cpu(0)>]
>>> e_data[0]
<NDArray 1000x1000 @cpu(0)>
>>> e_data[0].asnumpy()
array([[ 0.89252722, 0.46442914, 0.44864511, ..., 0.08874825,
0.83029556, 1.15613985],
[ 0.10265817, 0.22077513, 0.36850023, ..., 0.36564362,
0.98767519, 0.57575727],
[ 0.24852338, 0.6468209 , 0.25207704, ..., 1.48333383,
0.1183901 , 0.70523977],
...,
[ 0.85037285, 0.21420079, 1.21267629, ..., 0.35427764,
0.43418071, 1.12958288],
[ 0.14908466, 0.03095067, 0.19960476, ..., 1.13549757,
0.22000578, 0.16202438],
[ 0.47174677, 0.19318949, 0.05837669, ..., 0.06060726,
1.01848066, 0.48173574]], dtype=float32)
Pretty cool, isn’t it? This clean separation between data and computation aims at giving us the best of both worlds:
- data is loaded and prepared using the imperative programming model that we’re all very familiar with. We can even use any external library in the process (it’s just good old code!).
- computation is performed using the symbolic programming model, which allows MXNet not only to decouple code and data but also to perform parallel execution as well as graph optimisation.
That’s it for today. In the next article, we’ll look at the Module API, the last one we need to cover before we can start training and using neural networks!