This is the second part of the series of articles about Computer Vision for mobile and embedded devices. Last time I discussed ways to optimize Image Preprocessing even before you try to get ML model inference directly on the device.
Inbounds of this article I am going to talk about the most crucial step — on-device ML model execution
What is the right ML tool for mobile?
And this is a really good question! The answer to that could be found by going through the next items :
- Performance, in other words, how often are you going to run your ML model on the device?
Some Mobile apps (like photo editor with smart ML effects) could use ML output one time per run session, others required to track ML result one time per minute or even second but if we are talking about Real Computer Vision application it is a good idea to do it 10–30 times per second (10–30 FPS). As often as we have a new frame from our video source.
- ML operations (layers) capability: or ML framework which was used for the server-side model training process.
This item is mostly about ML operations (layers) capability between different Training frameworks. In some cases, it could be a kind complicated task to make your Server-side trained ML model work with mobile (embedded) ML environment due to the absence of necessary operations. And this can be a vital factor in choosing a tool.
- Platform and hardware specifications.
Nowadays efficient on-device ML inference is quite a hardware-specific task and it can bring some obstacles. We have a more or less transparent view of the iOS hardware market, but for Android, it is fragmented. Different vendors provide different GPU and SoC, so the same ML model can show dissimilar results depending on the board
Let’s take a look closer to the two most popular mobile platforms.
Apple gave iOS developers a brilliant gift — CoreMl. But it is not the only solution for this mobile platform. So what options do we have, let’s check it one by one:
Pure Apple solution is CoreMl
- Performance: It works with a high level of performance through Metal shaders, directly on mobile GPU (or specially dedicated for ML operations hardware for latest models of iPhone).
- ML operations (layers) capability: Almost all modern server-side ML frameworks have prepared scripts for converting to CoreMl format. Even if the converter does not support necessary layers you can write these operations by yourself using Metal shaders.
But be careful with that! From the second generation of CoreML tool, it is better to use only layers “from the box” at least for the latest iPhones (iPhone XS, XS Max, and XR). In that case, all the operations will be executed on special hardware, which will lead to fast performance and less power consuming. Custom Metal-shaders operations will bring your ML model back to GPU and you will not get advantages of CoreML 2 and above.
- Hardware specifications: As soon as all iOS phones have almost the same hardware vendor it is a simple topic. We have around 10 specifications which support the same kind of technologies.
Tensorflow Lite associated in our mind with Google (Android) technologies but it could be a solution for iOS platform.
- Performance: CoreML is good for running your models on GPU and special hardware but what about the case when you do not have enough free GPU resources? or you are working with old iOS devices with “weak” GPU?
A good solution for those situations could be TF Lite. It will execute the model on the CPU with a bench of optimization prepared specially for iPhones. Also, it can be suitable for “light” ML models with fast inference time in that case process to copy input Image data to GPU memory could take more time than execution on CPU.
P.S. In the experimental branch of TF Lite lib you can find GPU acceleration support
- ML operations (layers) capability: If you are going to use TF Lite for your mobile app it would be a good idea to use Tensor Flow for the server-side training process. It makes your life easier when you are going to convert it for Mobile. Even the official convertor hardly manage to do it. Talking about other server-side frameworks — converting process to TF lite format could be one of the most painful steps in your development process.
- Hardware specifications: As I mentioned above — most of the iOS devices have similar Hardwear specifications. But for most efficient work it is better to use ARM64-v8 arch CPUs. With that, you should keep in mind that TF Lite libs should be compiled with necessary parameters to take advantages of this architecture.
The third alternative is Caffe2, made in Facebook labs it uses NNPACK and QNNPACK to performs as fast as possible on ARM CPUs
In terms of usage and performance, Caffe2 is similar to TF Lite but much more flexible in the converting process. Using ONNX as the middle format you can easily bring your server-side model to iOS platform.
As I mentioned above — efficient on-device ML inference is quite a hardware-specific task and it makes a lot of troubles for Android devices. Nowadays there are more than 16 000 Google Play Certificated devices (device models), overall more than 24 000! Each model can have its hardware as well as software specifications. So the answer to the question of “the right tool” selection can be application-specific.
Let’s take a look at our options and a short description for them.
Tensorflow Lite is the most promoted by Google Android ML tool and there is a set of reasons for that.
- Performance: Initially this framework was created for ML inference on the embedded and low-end hardware devices. So the main resource for this library is CPU. It means that “big”-capacity models can be executed with low performance and consume a lot of battery. Also such kind of operations can lead to overheating of the phone.
With all the above disadvantages TFLite is almost the only tool which can be used for all variety of Android ARM devices. It uses all possible optimizations to run your model efficiently on-device and it can be enough for many Android ML apps.
P.S. In the experimental branch of TF Lite lib you can find GPU acceleration support through Open GL technologies. It shows good results for the latest phone models.
- ML operations (layers) capability: It looks similar to iOS description. Good idea to use the Tensor Flow framework for server-side training and the official convertor.
- Hardware specifications: Even there are thousands of phones models we still have a limited amount of CPU architectures. 99 percents of the market are ARM-based gadgets. TF Lite uses union, CPU efficient instructions (such NEON) for ML inference.
Qualcomm Neural Processing SDK for AI is a brilliant example of excellent developers support by the hardware vendor. Qualcomm provides us with a set of efficient tools to establish the whole pipeline of ML processing on the device. I am not talking only about their fast-performed ML libs but also about tools for Digital Signal Processing, Video Stream processing, compilation, etc.
- Performance: This solution has a set of hardware as well as software limitations which we will discuss late, but as soon as you can run your model, using this framework, you catch significant performance benefits. With Qualcomm NP SDK you have more hardware acceleration options than with any other Android ML framework. Aside from the GPU board, you can use DSP-accelerated computations which probably more suitable for ML operations. Both approaches use significantly less energy and perform several (even ten) times faster than the CPU. Qualcomm upgrades their DSP chips special for AI proposals.
- ML operations (layers) capability: It has convertors for 4 well-known ML models formats: Caffe, Caffe2, Tensorflow, and ONNX. So as you can see it covers most of the cases and the question of model converting should not be complicated.
- Hardware specifications: This is the dark side of the framework. Qualcomm NP SDK can be used only for Snapdragon devices which support Open CL technologies. It means that a big part of the Android market is out. And the most tricky thing that Google Pixels phones are not on the list. Even they are based on Snapdragon boards they are distributing without Open CL libs!
So check your potential market before making a desition to use it.
HUAWEI HiAI is one more example of hardware-specific solutions. The good thing about this product that it has Android Studio plugin which makes a lot of work for you but bad things — that this plugin has plenty of bugs. Anyway, you have an opportunity to convert your ML model and generate Java code to use it using the UI tool.
- Performance: This tool is designed to work with only one special chip by HUAWEI — NPU. It is relatively new hardware designed especially for ML operations. So it performs well and consumes much less power than CPU. Speed is comparable with GPU execution and can be suitable for a lot of AI/CV mobile applications
- ML operations (layers) capability: It has convertors for 2 different ML models formats: Caffe and Tensorflow. It means that if you are using, for example, Pytorch or Microsoft Cognitive Toolkit as the main training framework it would be hard to export the model to necessary format for HiAI. Also, it looks impossible to have several output layers for your model — this tool just does not support it
- Hardware specifications: As I mentioned this framework can work only with HUAWEI NPU chip which is part of Kirin platform by HiSilicon (part of Huawei). A surprising fact for me that each version of HiAI library is designed only for one Kirin model — so if you add HiAI 2.0 to your app it will work only with Kirin 980 and for an older model you should use the older version of HiAI. I guess it makes troubles on the way to use it in production.
MACE by XiaoMi is a good try to create unified ML solution for ARM-Based devices.
- Performance: it contains several runtimes for different hardware accelerations like GPU (through Open CL), hexagon-DSP, APU (Media Tech AI chip ) as well as CPU and shows good results on it. So probably you can find the right option for your app here.
- ML operations (layers) capability: you can find a converter for TensorFlow, ONNX, and Caffe models formats in the official repo. It works well but you can face with absence of quit popular layers. The overall converting process is acceptable.
- Hardware specifications: Most of the ARM-based devices are doing well but remember about Open CL!? — Devices without that can use only CPU runtime which is not the best one.
Caffe2 is also can be an option for Android devices. Its CPU runtime performs well — in many cases better than other frameworks CPU runtimes. And as we remember it is optimized for ARM-Based devices so it should work for almost all Android phones.
From all the above, you can see that the term “The right mobile ML tool” really depends on the needs of your app. It is a good idea to investigate the potential market, find out mobile hardware and software dominance before you start your mobile ML project.
In the next article, I am going to talk about the potential outcome of ML models in Computer Vision and ways to efficiently process it. Output post-processing can be a tricky thing in terms of application performance.