When NVIDIA announced breakthroughs in language understanding to enable real-time conversational AI, we were caught off guard. We were still trying to digest the proceedings of ACL, one of the biggest research events for computational linguistics worldwide, in which Facebook, Salesforce, Microsoft and Amazon were all present.
While these represent two different sets of achievements, they are still closely connected. Here is what NVIDIA’s breakthrough is about, and what it means for the world at large.
NVIDIA does BERT
As ZDNet reported yesterday, NVIDIA says its AI platform now has the fastest training record, the fastest inference, and largest training model of its kind to date. NVIDIA has managed to train a large BERT model in 53 minutes, and to have other BERT models produce results in 2.2 milliseconds. But we need to put that into context to understand its significance.
BERT (Bidirectional Encoder Representations from Transformers) is research (paper, open source code and datasets) published by researchers at Google AI Language in late 2018. BERT has been among a number of breakthroughs in natural language processing recently, and has caused a stir in the AI community by presenting state-of-the-art results in a wide variety of natural language processing tasks.
What NVIDIA did was to work with the datasets Google released (two flavors, BERT-Large and BERT-Base) and its own GPUs to slash the time needed to train the BERT machine learning model and then use it in applications. This is how machine learning works — first there is a training phase, in which the model learns by being shown lots of data, and then an inference phase, in which the model processes new data.
NVIDIA used different configurations, producing different results for this. It took the NVIDIA DGX SuperPOD using 92 NVIDIA DGX-2H systems running 1,472 NVIDIA V100 GPUs to train a BERT model on BERT-Large, while the same task took one NVIDIA DGX-2 system 2.8 days. The 2.2 millisecond inference result is on a different system/dataset (NVIDIA T4 GPUs running NVIDIA TensorRT / BERT-Base).
The bottom line is that NVIDIA has helped boost BERT training — compared to what used to be the norm for this — by several days. But the magic here was a combination of hardware and software, and this is why NVIDIA is releasing its own tweaks to BERT, which may be the biggest win for the community at large.
We asked NVIDIA about how and why it chose to address this. NVIDIA spokespeople said they believe conversational AI is an essential building block of human interactions with intelligent machines and applications. However, it’s an incredibly challenging problem to solve both computationally and algorithmically; and this, they added, is what makes it very interesting for them.
This was a cross-company effort, with a number of different teams contributing to making these breakthroughs possible. These teams included NVIDIA AI research, data center scale infrastructure, AI software and engineering. NVIDIA said this shows how it can extend the market-leading performance of its AI platform to emerging use cases.
There are two sides to this. The technical marvel that it is, and its actual applicability. Let’s unpack those.
Optimizing software to take advantage of hardware
As far as training BERT is concerned, NVIDIA clarified that the software optimizations included Automatic Mixed Precision implemented in PyTorch and the use of LAMB large batch optimization technique illustrated in a paper. For more details, there is a blog post on this, and people can also access the code on NVIDIA’s BERT github repository.
To achieve the 2.2 milliseconds latency for BERT inference on NVIDIA T4 Inference optimized GPU, NVIDIA developed several optimizations for TensorRT, NVIDIA’s inference compiler, and runtime. The effort focused on efficient implementations and fusions for the Transformer layer, which is a core building block of BERT (BERT-base has 12 Transformer layers) and state-of-the-art NLU models available today.
TensorRT contains several key functions to enable very high inference throughput, from fusing kernels to automatically selecting precision and more. NVIDIA has further added new optimizations to speed up NLU models, and plans to continue improving libraries to support conversational AI workloads.
What all of that means, in a nutshell, is that you can now train linguistic models that are better and faster than ever, and have them deployed and working in conversational AI applications also faster than ever. Which is great, of course.
In theory, what NVIDIA has done may benefit everyone. Optimizations to BERT are released as open source, and NVIDIA hardware is available for everyone to use. But, the usual caveats apply. Even though being able to train a language model like BERT in what is practically no time, compared to the previous state of the art, is great, it’s not enough.
How can this benefit everyone? Expertise, resources, and data
Even assuming that what NVIDIA released is usable out of the box, how many organizations would be able to actually do this?
First off, getting those open source models from their repositories, getting them to run, feeding them with the right data, and then integrating them in conversational AI applications is not something a lot of people can do. Yes, the lack of data science skills in the enterprise has been mentioned time and again. But it’s useful to keep that in mind — it’s not exactly easy for the average organization.
And then, taken out of their Github box, NVIDIA’s BERT models work with specific datasets. What this means is that if you follow the prescribed process to the letter, and your competitor does the same, you will end up having a conversational AI application that will respond in the same way.
That’s not to say that what NVIDIA released is a toy example. It is however just that: a toolkit, with some examples. The real value comes not just from using these BERT models and datasets, but from adding your own, domain specific and custom data to it. This is what could give your conversational AI application its own domain expertise and personality.
Which brings us back to where we started, in a way. Who’s got the data, and the expertise to feed that data to BERT, and the resources to train BERT on GPUs, and the awareness to do this? Well, a handful of names come to mind: Facebook, Salesforce, Microsoft and Amazon.
They happen to be the same ones who dominate the computational linguistics scene, and the same ones who are working on conversational AI assistants, by the way. Truth be told, they are probably the ones who are rejoicing the most at yesterday’s news from NVIDIA.
Everyone else can marvel, but going from that to applying NVIDIA’s breakthroughs may be challenging. To address this, NVIDIA has created the Inception program. Startups participating in the program are using NVIDIA’s AI platform to build conversational AI services for third parties. As long as they can access the data they need, that may be a good way to diffuse innovation.