Self-driving cars need specialized hardware for AI algorithms to meet performance, power, and cost requirements.
Advances in Artificial Intelligence (AI) and Machine Learning (ML) is arguably the biggest technical innovation of the last decade. Although the algorithms for AI have been in existence for many years, the recent explosion of both data as well as faster compute made it possible to apply those algorithms to solve many real life use cases. One of the most prominent of these use cases is fully automated driverless cars.
The neural networks, which is a special subfield of AI, play a key role in achieving full autonomous drive. But to drive without human intervention requires a sophisticated framework of sensors to capture not only the vehicle data but also that of the surroundings. These sensors include LiDAR, radar, video, cameras, etc. that continuously generate a high volume of data in real time about the environment surrounding the car. Neural networks help synthesize the data and create meaningful information. This then enables the vehicle to react in real-time.
This creates another problem that needs to be solved. Until recent times, most of the ML algorithms were executed in a cloud or a data center with large array of processors or GPUs, and extensive cooling. That is not possible for autonomous vehicle. Because of the real time nature of the problem, object detection as well as other aspects need to happen in the vehicle itself; sending the data to the cloud will not work. Because of this, self-driving cars need specialized hardware which can implement a ML algorithm and meet the performance, power and cost requirements to make it feasible.
Implementing ML algorithms in hardware is a challenge in itself. For example, one common algorithm for object detection is based on CNNs (Convolutional Neural Networks), which aid in ‘Adaptive Cruise Control’ and in ‘Forward/Rear Collision Warning Systems’ – obviously crucial capabilities for achieving a fully autonomous vehicle. The CNN is made up of multiple layers, where each layer performs multiple sets of convolutions. The convolutional filters for each layer are “feature detectors” programmed to look for certain characteristics such as horizontal lines, vertical lines, etc.
It is common for the number of input and output channels to double as data progresses through the layers, resulting in an explosion in the number of convolutional filters as well as the filter weights (coefficients). Figure 1 shows that the first layer has 1 input channel and 16 output channels and it requires 16 different convolutional filter kernels. The second layer has 16 input channels and 36 output channels requiring 36×16 = 576 2-d convolutional filters. In many CNNs the last layer(s) consist of fully connected layers which are usually implemented using matrix multiplication.
Figure 1. Simple 2-layer CNN for character recognition
Implementing ML algorithms in hardware is challenging. To achieve the accuracy, an inference chip for autonomous vehicle needs to address the following challenges:
- Performance: A single high definition camera can capture a 1920×1080 image at 60 frames per second. A car can have 10 or more such cameras.
- Power: AI inference could be a massively power intensive operation especially because the high volume of accesses to remote memories.
- Functional Safety: Need to detect functional safety issues that might creep in because of various faults in the hardware.
The biggest challenge is the turnaround time for the traditional ASIC design flow. It takes somewhere from several months to a year to implement a new ASIC hardware.
Initially, an autonomous system architect or designer relies on tools like TensorFlow, Caffe, MATLAB and theano to aid in capturing, collecting and categorically verifying data in a high-level abstract environment. These high level deep learning frameworks allow exploration of a multitude of parameters to explore, analyze and select the optimal solution for the algorithm (Figure 2).
Once the algorithm is determined, the designer then captures the flow in C++ or SystemC. The next step is to start designing the actual hardware algorithmic block for autonomous applications. The most efficient way is using High-Level Synthesis (HLS) to generate RTL from C++ or SystemC.
HLS separates functionality from implementation with powerful capabilities for targeting and implementation at any time. As a result, HLS accelerates algorithmic design time with a higher level of abstraction resulting in 50x less code than RTL. That means smaller design teams, shorter development time and faster verification.
Figure 2. High Level Synthesis design flow
The next is step is verification which includes formal property checking and linting as part of this flow to ensure that the source code is “clean” for both synthesis and simulation. Along with this, tools are required that can measure code coverage, including line, branch, and expression coverage. The goal is to achieve RTL that is correct by construction by precise consistency of representation and simulation results between the C++ algorithm and synthesized RTL.
The Catapult HLS Platform and PowerPro solutions from Mentor, A Siemens Business, is the industry’s leading HLS platform with proven quality of results. Catapult empowers designers to use industry-standard ANSI C++ and SystemC to describe functional intent and to move up to a more productive abstraction level. The Catapult Platform provides a powerful combination of high-level synthesis paired with PowerPro for measurement, exploration, analysis, and optimization of RTL power and a verification infrastructure for seamless verification of C++ and RTL.
To learn more, download our whitepaper High-Level Synthesis for Autonomous Drive. Find out how algorithmic-intensive designs for autonomous vehicles are a perfect fit for HLS – and how the methodology has been successfully adopted by major semiconductor suppliers like Bosch, STMicroelectronics, Chips&Media in the automotive space.
Anoop Saha manages the ecosystem, growth and strategy for Mentor’s Catapult High Level Synthesis tool. He has been with Mentor for over 12 years working in various capacities ranging from product R&D and product manager to sales and business development. Earlier, Saha was involved in hardware emulation and worked to create Mentor’s emulation solution targeted towards the networking ASIC verification; which enabled Veloce to gain the largest market share in this segment. Saha had also conceptualized SystemVerilog testbench acceleration using hardware emulators. He is based in Fremont, California.
Credit: Google News