Exploring (ahem) AWS DeepLens
Before you ask: everything in this post in based on publicly available information. No secrets, animals or Deep Learning hardware have been harmed during the writing process :-P
AWS DeepLens was one of the most surprising launches at at re:Invent 2017. Built on top of AWS services such as Lambda and Greengrass, this Intel-powered camera lets developers experiment with Deep Learning in a fun and practical way.
Out of the box, a number of projects can be deployed to the camera in just a few clicks: face detection, object recognition, activity detection, neural art, etc.
The overall process looks like this:
- Train an Deep Learning model in the cloud with Apache MXNet.
- Write a Lambda function using the DeepLens SDK to run inference on images coming from the camera.
- Bundle both in a DeepLens project.
- Deploy the project to your DeepLens camera (using Greengrass, although this is completely transparent)
- Run the project on your DeepLens camera, view the project video stream and receive AWS IoT messages sent by the Lambda function.
This is seriously cool and fun, but I want to know how this really works. Don’t you? Yeah, I thought so :)
Models are not always what they seem
In the DeepLens console, we can easily see that project models are deployed from S3. Here’s how it looks for the face detection project.
Let’s take a look, then.
$ aws s3 ls s3://deeplens-managed-resources/models/SSDFacialDetect/
2017-11-23 08:57:20 54559092 mxnet_deploy_ssd_FP16_FUSED.bin
2017-11-23 08:57:20 127913 mxnet_deploy_ssd_FP16_FUSED.xml
Huh? This is not what an MXNet looks like. As explained before, we should see a JSON file holding the model definition and PARAMS file storing model weights.
Let’s ssh to the camera and try to figure this out.
Exploring DeepLens
After a few minutes, well… bingo.
aws_cam@Deepcam:/opt/intel$ ls
deeplearning_deploymenttoolkit deeplearning_deploymenttoolkit_2017.1.0.5675
intel_sdp_products.db
intel_sdp_products.tgz.db
ism
opencl
Intel Deep Learning Deployment Toolkit. This sounds exciting. A few seconds of googling later, we learn here that this SDK includes:
- A Model Optimizer, which converts our trained model into an optimized Intermediate Representation (IR).
- An Inference Engine optimized for the underlying hardware platform.
This makes a lot of sense. Although it’s perfectly capable to run in resource-constrained environments — as demonstrated by my Raspberry Pi experiment — Apache MXNet is not the best option here. First, it carries a lot of code (training, data loading, etc.) which is useless in an inference context. Second, it simply cannot compete with a platform-specific implementation making full use of dedicated hardware, special instructions and so on.
So now, the model files in S3 make sense. The XML file is the model description and the BIN file is the model in IR form.
What kind of hardware is it optimized for ? Let’s look at the hardware for a second.
Under the hood
The CPU is pretty obvious. It’s a dual-core Atom E3930.
$ dmesg |grep Intel
[ 0.108336] smpboot: CPU0: Intel(R) Atom(TM) Processor E3930 @ 1.30GHz (family: 0x6, model: 0x5c, stepping: 0x9)
This baby comes with an Intel HD Graphics 500 chip, so we have a GPU in there too. This one has 12 “execution units” capable of running 7 threads each (SIMD architecture). 84 “cores”, then: not a monster, but surely better than running inference on the Atom itself.
Now it’s starting to make sense. The Inference Engine is certainly able to leverage specific instructions on the Atom (with Intel MKL, no doubt) as well as the GPU architecture (with OpenCL).
Now what about the model optimizin’ thing?
Model optimization
Says the Intel doc: “(the model optimizer) performs static model analysis and automatically adjusts deep learning models for optimal execution on end-point target device”.
OK. It optimizes. Nice job explaining it :-/ Let’s figure it out.
After a bit of installing (and cursing at python), we’re able to run the optimizer.
Most parameters make sense, but two are a bit intriguing.
--precision
- A precision of the output model. Valid values: FP32 (by default) or FP16. Depending on the selected precision, weights would be aligned accordingly.--fuse
- flag which enables fusion (combination) of layers to boost topology execution. The idea is to join layers to reduce calculations during inference. Valid values: ON (by default) or OFF.
OK, this explains the model name we saw earlier.
mxnet_deploy_ssd_FP16_FUSED.bin
This model uses 16-bit floating values for weights (and probably activation functions too). Obviously, 16-bit arithmetic is both faster and more energy-efficient than 32-bit arithmetic, so this makes sense.
As a side note, it’s possible to train MXNet directly with FP16 precision. My gut feeling tells me that this will probably yield more accurate models than training with FP32 and then converting to FP16, but who knows. More information on this NVIDIA page.
OK, now let’s run this thing on existing models. First, we’ll download Inception v3 and VGG-16 from the MXNet model zoo. Then, we’ll convert them.
Let’s now compare the original models to their optimized version.
As you can see, moving to FP16 definitely halves model size. Less storage, less RAM, less compute!
Now there’s only question? Do these models work? Should we give this Inference Engine a spin? Of course we should.
Predicting with IR models
The toolkit comes with a bunch of samples, let’s build them.
Our models are classification models, so let’s try this.
Let’s grab an image and resize it to 224x224.
How well do our networks do on this one?
At the moment, I haven’t figured out how to get this code to run on the GPU (but I will). My best guess is that the GPU is already busy running the actual DeepLens stuff. Since I’m not going to delete it, I’m running on the CPU instead with FP32 precision. Grumble grumble.
Both networks report category #292 as the top one. Let’s check the ImageNet categories to find out whether this prediction is correct. Line numbers start at one, so category #292 is on line 293 ;)
Good call :)
So there you go. We answered a lot of questions:
- We now know that DeepLens is not running MNXet itself, but the Intel Inference Engine on an optimized model.
- We know how to convert models using the Model Optimizer.
- We know how to run image classification models on the Inference Engine.
Next step? How about deploying these tools on a SageMaker instance, training an MXNet model, converting it and deploying it to DeepLens? This should keep me busy during the holidays (#NoLifeTillDeath).
I hope you liked this crazy post. Thanks for reading!