I had the pleasure of attending the Voice Summit last week here in New Jersey. It’s a bit odd for me to say I enjoyed it given I an the ultimate conference cynic. Too many bad experiences have led me to believe conferences are where you go to hear experts hedge on what’s happening with technologies. There was undoubtedly some of that. Mostly, however, there were lots of knowledgeable people offering great insights and enlightening conversations.
To that end, I found some key takeaways:
People have been trying to get computers to recognize their voices for a very long time. As far back as 1997, software entered the market to help busy professionals do away with the tappity-tap of they keyboard and start dictating their communications. Unfortunately, dictation software was really crappy. Not anymore…
A confluence of things has altered the voice landscape:
- Increasing computing power enables smarter voice recognition algorithms
- Mobile devices make it possible to make voice recognition ubiquitous — providing an ever-growing dataset on which to train recognition algorithms
- Modern artificial intelligence (AI) and machine learning (ML), natural language processing (NLP) make it possible to not just recognize language but also to deduce context and build relationships
- Fast and cheap high-speed networks make it possible to offload processing to cloud servers enabling rapid “understanding” of peoples’ intents
The result is, voice is everywhere. It’s in our phones, on our computers, on our remote controls, in our voice assistants, in our cars, in call centers, in kids toys — everywhere. And, unlike the old dictation days, we’re having conversations. If you’re experiencing this, so are your customers. Voice is happening!
There’s pretty strong consensus amongst all of the cool kids that voice is the next great interface. Makes sense… I much prefer saying, “Hey, Google…” over typing in queries on my phone or tablet. Not having to use one’s hands to get information seems pretty natural. The question becomes, then, so now what? That ends up being the million dollar question along with…
- How do brands actually take advantage of this in a way that actually moves the needle?
- Which technologies are users / customers going to settle on and ultimately be worth the investment?
- How do I raise awareness around our voice products so customers actually engage with them?
- Are customers even ready for this now?
The answers to those questions can vary by brand and industry. There are brands who should jump in with both feet right now (ex. retail and information services, perhaps.) and there are others who can afford to wait a bit. That said, as the innovators and early adopters begin jumping in, there’s an opportunity to capture behavioral and demographic data about actual users. That data could prove to be a very valuable asset as the reliance on voice increases.
This specific point comes from a Mercedes-Benz presentation I saw at Voice Summit. It was one of three points the speaker wanted us to walk away with. It is the one that resonated most with me.
Brands have a tendency to force people to use technologies their way. Brands want us to “Go here. Press this button. Then do this.” For voice to be successful, it can’t work that way. The most successful voice products will seamlessly integrate into both customers’ journeys and their lives.
As I explained above, modern voice is about conversations. Generally, we don’t have to go through steps to have conversations with people. They just happen. When it comes to voice, companies will have to determine how to remove the friction generally associated with engaging their brands. They’ll also have to be aware of context and somehow remember, just as a person would, what the user talked about previously. Being able to support conversations on a whim with a history to support the exchange is the ultimate in seamless integration.
One of the most interesting demonstrations I saw at Voice Summit came during a session by a representative from Microsoft. The woman, a developer, had taken work the Microsoft AI team did with the JFK Files and added a twist. She took audio recordings of J. Edgar Hoover’s voice and used them to create a voice model that sounded like Hoover when reading JFK Files documents — complete with Hoover’s distinctive accent. She did this using only Microsoft’s Azure cloud infrastructure — the same way anyone else could.
Amazon, a sponsor of the event, had numerous sessions educating us on how both their Amazon Web Services (AWS) and, of course, the Alexa virtual assistant can be used to create voice products.
The ability to create intelligent, conversational voice products exists using infrastructure from either of the three well known I/PaaS providers — Amazon, Google and Microsoft. Decide which one is best for you (which one are you using elsewhere in your organization?) and build something.
All of the points above lead to a common near-term solution. Amazon’s Alexa, Google’s Assistant, Apple’s Siri and Samsung’s Bixby (especially the first two, domestically) are great low-barrier-to-entry options for building voice products.
The reasons why are pretty clear:
- The companies have invested a significant amount of resources to provide very strong voice and AI capabilities
- In the case of Amazon and Google, their infrastructures support both building the voice assistant products (‘skills’ for Alexa and ‘actions’ for Google) and hosting the services necessary to effectively use them
- There’s a large user base of people already familiar with voice assistants — Analysts predict over 200 million smart speakers will have been sold by the end of 2019. 2019 started with over 100 million devices sold. Additionally, over one billion devices are voice-enabled out of the box by Assistant, Siri or Bixby.
- The platforms currently offer, arguably, the best opportunity to offer a voice experience to your customers while minimizing the amount of onboarding you must do to make the products valuable. After all, people using smart assistants are already familiar with the voice interface and that familiarity eliminates a huge barrier to usage