Today’s voice recognition technology lets us query our phones and tablets for individual pieces of information or commands (“Play ‘Come Together’ by The Beatles,” for example). In time, these machines’ ability to understand our natural speech patterns and intent will get better (“I’m in the mood for some Beatles”). They’ll know who we are, what our past preferences are, and will be able to “listen” to multiple voices at once and single out the important details.
The development of software that can replicate the cocktail party effect – the human ability to pay selective attention, say, to one’s own name, even in loud room – is tantalizingly close, according to officials at Nuance, the Boston-based company behind speech recognition software such as Dragon NaturallySpeaking.
Vlad Sejnoha, Nuance’s chief technology officer, said that the technology could have an immediate impact on electronic medical records. “We foresee a wide range of applications, including transcribing doctor-patient conversations, and automatically extracting key facts and entering these on the patient’s chart or electronic health record,” Sejnoha said.
The technology would also allow devices to discern between users and allow devices to understand multiple people at once. It could even make decisions about whose voice to heed (imagine a family with four kids sitting on the couch, each of them shouting at their smart TV with a different preference – Mom’s voice wins?).
Beyond Smartphones: Cross-Device Voice Control
The smartphone may be where most people get oriented with voice control, but the technology has quickly expanded beyond phones. Apple’s Siri, for example, has migrated from the iPhone to the iPad. Google is busily integrating voice control into its Android platforms, and Microsoft is expanding its speech recognition functionality.
Not only will voice control work with more devices, but it will function across those devices.
Last month, Nuance introduced its Wintermute project, which features smart cross-device voice recognition that learns its user’s preferences over time. For example, if you ask your phone about a TV or sports game score and then verbally refer to “the game” or “the show” while sitting on the couch later, your Nuance-enabled smart TV will know exactly what you’re talking about. And of course, the possibilities only expand from there.
The technology itself, in addition to improving dramatically, will also become more widespread. Before long, voice control will be standard on $99 smartphones and entry-level tablets. Not long after that, our Internet-connected coffee makers, TVs, cars and electronic window shades will also understand our commands.
Our devices will soon do a better job not only of understanding individual words and phrases, but will know more about our intent, preferences and even identities. Just like today’s search engines, voice recognition technology will get smarter as we use it.
Where things get interesting is with advances in how well computers can comprehend our sentiment, linguistic nuances and even discern between us and other people.
“That means that we are able to act on the spoken input in increasingly sophisticated and useful ways,” said Sejnoha.