Over the last few years, artificial intelligence (AI) has advanced rapidly, with developers regularly reporting new breakthroughs. AI algorithms work ever faster. Until recently, skeptics argued it would take ages for robots to come anywhere close to moving like humans. It was easier to program a computer to defeat humans in the Chinese game of GO than to construct a machine that could move like us. But the skeptics have been proven wrong. Today, we can see creatures made by Boston Dynamics jump and run and perform acrobatics. Robots have become as agile as we are. In fact, AI has been acting ever more human-like, and not only in robotics. Time to talk to our machines.
For years, developers have been honing computers’ human-speech-processing capabilities. A great deal of thought and effort has gone into devising ways to decode natural language and support man-machine interactions. Intensive research into speech recognition began in the 1980s. The IBM computer used in early experiments could recognize thousands of words but managed to understand only a handful of complete sentences. It was not until 1997 that a breakthrough was made, when the Dragon NaturallySpeaking software surprised everyone with its ability to recognize continuous speech at the rate of a hundred words per minute. The biggest challenge faced by experts seeking to achieve a breakthrough was (and, to a certain extent, still is) the fact that human speech relies not only on inner logic but also on references to external situational contexts and/or emotions.
Today, it is easy for a computer to understand and answer the question “What is today’s weather?”. It is far harder to wrap their processors around the meaning of, “So, I suppose I’m going to need an umbrella again next time I go out? Yes?”. The challenge lies in that question’s irony, allusiveness and the reference to the past. Such rhetorical forms, common in human communications, continue to pose the biggest challenge for smart machines. Yet, the progress being made in the field is absolutely dramatic.
Today’s computers can process voice messages with excellent accuracy (an error rate of merely 5 percent). Their growing capacity to comprehend complex contexts represents a major advance in the development of algorithm-based voice-recognition technology. The huge effort put into training bots by feeding them samples of human speech has made communication with electronic devices considerably more natural. We can now ask a table-top speaker about the weather or command it to adjust room temperature or make a purchase in an online store. Meanwhile, voice-enabled bots are speaking in perfectly structured sentences. It is hard to deny they are graceful and skillful in dealing with complex communication problems. To learn more, check out this video from Google.
One of the key milestones in speech technology has been the development of the Siri smart application from Apple. Soon after Siri demonstrated its capabilities to the general public, it was followed by the launches of Microsoft’s Cortana and Amazon’s Alexa. More recently, Google Assistant has been taking the market by storm. Voice-operated interfaces have been establishing themselves in banking and commerce. Other industries are showing growing interest in jumping on the bandwagon.
Encouraged by this favorable market response, Microsoft, Amazon, Apple, Google, and Facebook have engaged in a race to launch new applications. Google has joined forces with Starbucks to develop an assistant to place orders on behalf of regular customers. Drivers will be able to use a voice assistant to communicate with Google Maps. Amazon is working to develop a system and machines that will enable users to sell and/or buy products by simply talking to their computer. A year ago, Amazon’s sales people realized that the new technology has the potential to astound individual users.
Yet, in 2017, even the biggest voice recognition optimists did not anticipate what would happen on Black Friday. Mainly the day after Thanksgiving, when Americans are traditionally offered huge discounts. On that day, interest in Alexa speakers exceeded all expectations. Consumers ended up buying millions of Alexa and Echo devices. This, admittedly, was partly driven by a large-scale promotional campaign and deep discounts. Nevertheless, the numbers seem to indicate an interest that surpasses the urge to take advantage of a deal.
The 2018 Voice Labs Report estimated that by end of 2017 there were 33 million “voice-first” devices in circulation. According to the investment fund RBC Capital, nearly 130 million devices networked directly to Alexa will operate around the world by 2020. Over the next two years, Alexa sales will generate $10 billion in revenues for Amazon. Google claims that 20 percent of its users rely on voice for searching the internet on mobile devices. Over the next two years, this number is expected to increase by another 10 percent. According to the Mintel Digital Trends report, 62 percent of UK would like to use voice to control devices, and 16 percent have done so already. These numbers reveal a great deal about the underlying trend.
Only two years ago, corporate failures to develop new technologies received more media coverage than successes. In 2016, Microsoft jettisoned its Tay chatbot project after it found the chatbot “fed” on profanities from web users, which it then spread itself. At the time, the media made fun of bots. The web was awash with reports from users complaining about Siri or Echo activating themselves unexpectedly. Some critics point to the danger of smart speakers leaking recorded user conversations online. Such records can be deleted as long as one knows and remembers to do so. This leads us to the issue of personal data protection and the safe use of cameras and speakers.
Other doubts have arisen over the reliability of voice assistants. Could the answers from Alexa, Cortana, or Google Assistant to some of the more complex customer queries be manipulated for marketing purposes? And, speaking of marketing, think about voice-controlled searching. Will those searches and machines be steered to sell products? And what about search engine optimization (SEO) in a voice-controlled environment? Websites that rely on visual/textual and all-textual advertising may lose significant value.
I began this article wondering whether a major change, including a departure from manually-operated controls, was imminent. Considering the technology’s track record over the last few years, that seems likely.
One of the key drivers behind this trend is the increasingly popular idea of “the smart home,” enabled by the Internet of Things. Apple, Google and Amazon — the heavyweights — are all on board, believing the use of voice to operate devices aligns perfectly with the preferences of today’s consumers. What we want from shopping in terms of information access and interaction is convenience, pleasure and quick results. Voice control seems positioned to satisfy all those needs. A model relying on short, quick statements and commands from shoppers and fast-responding applications and assistants is undoubtedly viable.
Given the pace of technology advancement, I don’t see why the next few years could not bring a change as radical as the transformative impact of smart phones. We’ll be able to give our eyes and hands a rest as we increasingly talk (and listen) to our electronic friends.