Credit: Google News
As AI has become one of the hottest must-have technologies, companies have rushed to build deep learning solutions to almost every problem. The industry has become particularly fixated with monolithic models and end-to-end learning in which algorithms are simply fed a database of training examples and turned loose on their problem domain without ever requiring human assistance. Yet, as Waymo reminds us, when it comes to building truly robust complex deep learning systems that must interact with the chaotic cacophony of the real world and seamlessly operate alongside humans, the most successful systems today combine multiple deep learning models together with traditional hand-coded rulesets.
Like all forms of machine learning, deep learning offers a seductively simplistic beginner experience that requires little effort from newcomers to produce reasonably high-quality results right from the start. The problem lies in the long road of incremental improvements to make those out-of-the-box models sufficiently robust and accurate for production use.
This seductive beginner’s simplicity, coupled with almost a century of science fiction portrayals of machine intelligences that can learn like humans, has led many companies to focus their efforts on building massive singular all-encompassing end-to-end AI models that can do absolutely everything the company needs with one model without ever requiring a moment of human assistance.
The end result has been situations like social media’s failed content moderation efforts in which companies have attempted to build singular models to recognize all “bad speech” worldwide across all content categories, languages, demographics and cultures, instead of building an array of specialized models each focusing on a specific class of content and combining them with explicit human guidance.
Indeed, the most successful applications of deep learning have learned to construct their complex systems as collections of specialized deep learning models combined with traditional hand-coded rulesets and sometimes additional classical machine learning algorithms into ensembles that leverage the unique advantages of each technology and the benefits of specialization.
Take Waymo’s driverless cars. Contrary to public belief, they are not powered by one massive monolithic model connecting sensors to steering wheel.
Instead, their incredible capabilities are provided by a collage of specialized deep learning and hand-coded algorithms working in concert.
For example, a spokesperson cited an example by the company’s CTO Dmitri Dolgov that it makes more sense to explicitly tell a self-driving car to stop at red stoplights, rather than having it try to learn that rule purely from observing expert drivers.
Indeed, this actually matches the way humans learn to drive.
New drivers learn to drive through a combination of memorizing the written rules of the road, passively observing others and supervised experiential learning.
Memorizing the rules of the road ensures new drivers are familiar with how to interpret the myriad signs, markings, intersections and other elements of the roadways they may never have confronted before in their own lives. Some roadway elements are so rare that most drivers will never encounter them organically and thus must learn about them through the written driving rules rather than watching how others navigate them.
In turn, observational and experiential learning helps new drivers learn to generalize the myriad ways in which realworld driving differs from the textbook.
Part of learning to drive is reconciling the rules of the road with the realities of life.
One of the greatest challenges confronting driverless cars is not building systems that can drive perfect straightaways in ideal sunny weather, but rather building systems that can safely navigate all of the things it has never seen, from a deer jumping out in front of the car from behind a tree on a rural road to the rule-breaking road neighbors and general chaos of the urban environment.
A content moderation algorithm that only recognizes what it has seen before and misses everything else might still be useful. A driverless car, on the other hand, must be able to safely navigate the road, no matter how bizarre or unexpected the situations it encounters.
In short, driverless cars place a much higher demand on the ability to handle unexpected circumstances not well defined in their training data, whether by using simulators or using explicit rulesets and other mechanisms.
Moreover, one of the purposes of driverless cars is to improve the safety of the driving experience beyond the limitations of human drivers. This means driverless car algorithms must be able to handle the myriad edge cases that confound even human drivers and for which there is almost no training data.
As the company put it this past December, “simple imitation of a large number of expert demonstrations is not enough to create a capable and reliable self-driving technology. Instead, we’ve found it valuable to bootstrap from good perception and control to simplify the learning task, to inform the model with additional losses, and to simulate the bad rather than just imitate the good.”
For all their hauntingly accurate results, deep learning algorithms are not the general intelligences of science fiction the public often perceives them to be. Today’s deep learning systems are nothing more than powerful inference systems that learn statistical correlations between their inputs and outputs.
In the case of driverless cars, this means that deep learning models cannot build the kinds of generalized causality models that underlie human reasoning. A model can learn that a specific set of inputs typically leads to a given outcome, but it does so purely through recording a correlation, rather than through abstracting to a causal model relating high order concepts and their interactions.
This makes it difficult for monolithic systems to reason about the world in a way that is sufficiently robust for the kinds of complex interaction predictions that underlie the driving experience based purely on collected observational data.
As the company put it, “deep learning identifies correlations in the training data, but it arguably cannot build causal models by purely observing correlations, and without having the ability to actively test counterfactuals in simulation. Knowing why an expert driver behaved the way they did and what they were reacting to is critical to building a causal model of driving. For this reason, simply having a large number of expert demonstrations to imitate is not enough.”
Thus, according to Waymo “the planner that runs on Waymo vehicles today uses a combination of machine learning and explicit reasoning … the bar for a completely machine-learned system to replace the Waymo planner is incredibly high … handling of long-tail situations, understanding causality, and continual lifelong learning are subjects of active research at Waymo, as well as in the broader machine learning community.”
A spokeperson noted that the company’s ultimate long-term goal as technology advances is to ultimately build an end-to-end system that can learn purely from observation with a monolithic model, but that it is very much an active area of research for the foreseeable future.
Putting this all together, industry has become fixated on the ideal of massive monolithic models that can accomplish every task in a single system and learn without any input beyond an unmodified observational training dataset.
The reality is that today’s deep learning algorithms are not quite there yet.
In the end, Waymo reminds us that even the world’s best AI minds still turn to ensembles of deep learning models and traditional code to build their most sophisticated, robust and successful systems.
Credit: Google News