Deep Neural Networks (DNNs) are the key technology of enabling AI. It has gained widespread attention which is now growing at a rapid pace. But, running computation-intensive DNN-based tasks on mobile edge devices can be challenging due to the limited computation resources. This is one of the challenges presented with Edge AI.
Therefore the traditional cloud-assisted AI inference has been the preferred choice but is heavily affected by the wide-area network (WAN) latency, leading to poor real-time performance, hence ultimately causing poor user experience. Moreover, sending data over the internet to cloud-based servers has raised many privacy and security issues over time.
This is where the concept of Edge AI excels. By pushing inference and occasionally even model training to edge nodes, Edge AI has recently emerged as a promising alternative to traditional cloud-assisted inference. The concept of Edge AI defines that the inference is taken place where data is collected or at the closest point locally such as on a local server.
What exactly is Edge AI?
In “Edge AI”, AI inference is done locally, either directly on the user’s device (on-device ai) or on a server near the device.
What is a Deep Neural Network (DNN)?
Deep Neural networks are simply Artificial Neural Networks with multiple hidden layers between input and output layers.
DNNs are resource-hungry due to the millions of parameters they possess. AlexNet, the first Convolution Neural Network, is a type of Deep Neural network that has over 60 million parameters!
DNNs are powerful as a great approach to AI, but they are very compute-intensive algorithms. So, the uncompressed DNNs are best-suited to cloud-based processors and hardware.
Do you want to find out how to create a simple Deep Neural Network with Keras? Find out here Keras Flatten with a DNN example in Python
Cloud AI Vs Edge AI Inference
As mentioned earlier, when we talk about Edge AI, we are mainly focusing on AI inference, where the prediction part is taken place. To train AI models we are still relying very heavily on could infrastructures as the training can be done offline. Surely we don’t want to train DNNs on Edge AI devices as they will take so long to complete!
The following two figures should give you a clear idea about how cloud AI inference and Edge AI inference are taken place.
Let’s talk about Drawbacks and Challnges of Cloud AI and Edge AI
Drawbacks of Cloud AI
Although cloud AI presents high accuracy levels, we can’t deny the following drawbacks it comes bundled with…
- WAN Latency.
- Reliability issues.
- Privacy, trust concerns.
Challenges of Edge AI
So, our savior is the Edge AI, right? Yes, but again with Edge AI, we have to solve a couple of challenges. I would say challenges rather than calling them issues. What are these challenges though?
- Limited Power.
- Limited Memory
- Processing Capacities.
- Accuracy Vs. Efficiency trade-offs.
- More Research Needed.
Why Edge AI over Cloud AI? (The Great Benifits)
At the beginning of this article, we discussed why we primarily need to bring Cloud AI to the Edge. Furthermore, the following is a list of benefits that we can achieve with Edge AI.
- Faster response times.
- Speeds up decision-making in real-time applications.
- Eliminates possible data reliability issues.
- Ensures user data privacy.
- Scalability of IOT ecosystems.
- Increased security.
- Reduced ISP costs.
Do you remember how your mobile’s google assistant, Siri, Bixby used to work only when your phone is connected to the internet? They usually gave a warning saying that you are currently offline and need to connect to the internet. But nowadays, all these personal AI assistants can work offline. They only need the internet to do a google search or synchronize with their cloud serves to get model updates. But did you ever wonder why?
It’s because they moved from cloud to Edge! (They are Down To Earth applications now, lol) They can do AI inference right on your mobile now!
How to bring AI to the Edge?
Ok, now we know how important is bringing AI to Edge, right? But how exactly are we going to do it? Do we just implement our original DNN models right into our edge device hardware?
No! It’s a very bad idea. If you implement original, uncompressed DNNs into edge devices where there are very limited power, processing capabilities, and available memories you are asking for trouble. Simply it won’t work. You are going to end up with a useless model which does not work as intended or you fail to implement it altogether.
See that DNNs require a massive amount of resources to run, and simply your edge device isn’t able to provide those resources. To make matters worse, often we need real-time applications such as object detection applications installed in our edge, mobile devices!
This is why we need to compress our original deep neural network models, so they can fit and work with edge devices.
How to compress Deep Neural Network models
This area is highly researched at the moment. The pioneers in the field have identified several ways to compress deep neural networks so we can fit them into our resource-constrained edge devices or mobile devices.
- DNN pruning
- DNN qunatization
- knowledge Distillation
Wait, Prunning? just like you prune a tree?
While there are lots of differences between pruning a tree and a deep neural network, the idea is the same!
After all, researchers have confirmed that a lot of parameters in a deep neural network model can be cut down without losing much accuracy. Simply because all those millions of parameters aren’t actually required for the lots of implemented models. But again, this highly depends on the application. If the AI application is relatively small and solving easy problems why using the whole model with millions of parameters right?
Think about your brain. When you solve a mathematical problem, you don’t use your whole brain!
So the idea is that we can cut down some parameters. Especially when you train a model you are left with many parameters which don’t really contribute to the final output. These are mostly close to zero or absolute zero values. These can be weights or even whole sets of neurons!
Deep neural networks usually use 32-bit single-precision floating-point arithmetics (FP32) to do calculations which make them resource hungry. Researchers have found out that you could downgrade the precision of these calculations without sacrificing much model accuracy. This is the idea behind DNN quantization. FP16, FP8, bfloat16 are some of such low precisions arithmetic operations successfully implemented in quantized neural networks.
By reducing the precision you can cut down the amount of processing power and memory intake of these quantized deep neural networks.
This is a very interesting approach to create smaller DNN models which behave like their original counterparts. The smaller deep neural network is called the student model. The student model gets taught by the larger trained teacher model to behave like its neural network.
Accuracy Vs Efficiency trade-offs in Edge AI
As we all know deep neural networks are originally meant to run where they can consume a lot of power and memory. When we provide those resources they always work in our favor, providing the best accuracies they can ever produce. But when we compress DNNs to suit the Edge devices we start losing those top accuracy margins. As the compression increases the accuracy seems to go downhill…
Therefore, the “accuracy” Vs “how efficient the neural network should be” has been a great subject of interest lately. When we develop an Edge AI application we have to always think about how much power and memory we can allocate for it. Hence, we must decide how much accuracy can we afford to lose with those resource constraints.
The following case study research is showing this accuracy drop against the model compression ratio. The research has sourced over 81 papers based on deep neural network pruning. They have categorized these pruning methods under four main categories.
- Global Weight prunning
- Layer Wise Weight Prunning
- Global Gradient prunning
- Layer Wise Gradient Prunning
- Random Prunning
Refer to the following Compression ratio Vs Accuracy chart. The model used in this case is a VGG-16 trained with Cifar 10 dataset.
One thing common to all the pruning methods is that after a certain compression threshold the compressed model accuracy starts to decrease significantly. Moreover, Random Pruning seems to be the worst pruning method to come by.
Do we need more research for Edge AI?
Do we have sufficient research done for Edge AI? The simple answer is No. Although improving accuracy in Edge AI models is one of the highly researched areas, the research gap is still visible as the Increasing demand for real-time edge ai applications seems to keep increasing!
Undoubtedly, there is very successful research out there with very practical solutions, but these solutions highly depend on the platform, the model, and the addressed scenario. For example, one researcher may come up with a highly compressed VGG-16 model for a specific drone model to help with drone surveillance. But this specific implementation may not work the same in another drone model with a completely different set of hardware resources. On the other hand, the model might become completely irrelevant if you want to implement a mobile application with it.
There are impressive compressed deep neural network models such as MobileNet, SqueezeNet. But again, if you want a perfect Edge AI model for a specific scenario with its own set of resources and platforms there’s nothing much to do other than conducting your own research and start from scratch.