"Hey Siri, When Will Voice Assistants Improve?"

Guest Writer

- Last Updated: December 2, 2024

Guest Writer

- Last Updated: December 2, 2024

“Siri, can you take anything off my to-do list?”

“I’m afraid not.”

It’s a frequent and irritating experience that we’ve all had. Siri's purpose—to support our lives in every way possible the way Rosie does for the Jetsons—is one we champion. But in reality, voice assistants are frustrating. The improvement of voice assistants is something we would all love to see. Right now, Siri remains clunky and functionally confined in most of our eyes.

Siri isn't alone. Smart speaker sales may number in the tens of millions. Apple may have declared that 500 million device holders actively use Siri, but there’s currently no human-replacing voice assistant on the market. Natural Language Processing (NLP), the technology driving these products, is still evolving. We’re often still left defeated rather than enthusiastic after our interaction with voice assistants like Siri.

The Technology Powering Siri

Siri, Alexa, and Google Home were all designed to interact with us the way another human would. So why shouldn’t we expect them to behave as a human would? After all, it’s been almost a decade since Siri was first debuted.

When we unravel the steps Siri has to take to execute a simple command, however, we find that the ease with which our brain solves problems or responds to commands is not so easily replicated. Artificial Intelligence (AI) doesn't process information the same way we humans do.

The fact that Siri currently supports 20 languages is pretty incredible, especially when compared to Alexa’s four languages and Google Assistant’s 11. From this perspective, what Siri and her digital assistant kin are capable of executing is quite extensive, notwithstanding limitations.

Dissecting Siri's Challenges

Siri’s constraints lie in the different modules required to make a digital assistant tick. When we give Siri a command—when we say, for example, “schedule a meeting”—a few things must happen to prompt the right action.

First, the audio signal emitted from the user needs to be recorded, digitized, and then transformed into a text representation or a sequence of words. Second, the words need to be analyzed syntactically, and the words' meaning then must be transformed into a semantic representation. Lastly, the semantic representation must be interpreted as a sequence of operations to be performed.

Natural Language

A mistake at any step in this process will cause Siri (or any digital assistant) to misunderstand the user's intent and prompt the wrong response. There are many triggers for such mistakes. Let's focus on the variability of individuals' vocabulary as an example. Each one of us uses slightly different words and expressions with unique pronunciations and certain grammatical errors or nuances that we as humans don’t notice in day-to-day conversation. Siri must understand all of these variances.

Ambiguity

There’s also ambiguity to deal with. Digital assistants interpret “commands” as a sequence of words with no initial or inherent relationship or meaning. They have to figure out what the words actually mean. Ambiguity is the inversion of the variance problem. While one semantic fact can be expressed with hundreds of simple sentences, one expression can refer to millions of semantic entities. For true improvement of voice assistants, Siri and other assistants must be able to interpret this correctly.

Context

Context is yet another challenge. It comes so naturally to us that we often don’t recognize the need for clarification. At any moment, we're integrating an unimaginable number of nested and independent contexts into our conversations. Sometimes this can be negative (prejudice is a form of cultural context), but most often it provides useful clues as to how to interpret a sentence.

The role of context is evident in a phrase as simple as “how are you?” Siri must be aware of the cultural context of this phrase to understand that it is a greeting, rather than a question, to properly engage in a conversation.

The idea that women are bad drivers, on the other hand, is a form of cultural context at its worst; this is not a fact but a prejudiced belief. Siri wouldn't interpret this cultural context the way a human would—actually, recent research shows that 71 percent of all vehicle crash deaths in 2014 were males.

How Can We Improve Voice Assistants?

Our brains, for the most part, have no trouble processing these concepts. Siri, on the other hand, does not behave like our brains. She's unable to decipher context. She doesn't know that “Friday” could refer to three weeks from now because the user is currently planning the week after his vacation.

Technology has provided some workarounds, but for now, they're limited. Google Assistant knows that “what is the quickest way home?” refers to the home of the user and can probably look that address up in the user’s Google Maps history. But that comes with some loss of privacy because it requires access to the user’s app and transaction data.

Machine learning techniques help to address variance and ambiguity in spoken language. Siri uses algorithms trained on a large database of speech and corresponding text representation to build a model for picking words out from sounds. Statistics (not semantics and context) are used to make predictions, so she knows that it's much more likely that "schedule a..." will be followed by "meeting" and not by "meat thing.” It’s a limited solution to a vast challenge that will only improve with time.

It has taken immeasurable amounts of data, countless designs, and technology upgrades to get to where we are today. And it will take countless more to make improvements. Today Siri can perform just a tiny fraction of the tasks that we humans can accomplish. More data and more user engagement will fuel NLP technologies, just as increased search queries aided Google’s search.

There's still hope for the improvement of voice assistants—Siri and all. We as users need some patience and appreciation for the impact of time. There will probably never be a monumental voice-revolution. Instead, we’ll continue to see small incremental updates to Siri, Alexa, Google Home, and the likes. In five years, they’ll be drastically different than they are right now.

There may not be one “smartest” voice assistant in existence today, but the race towards more human-like qualities is still to be had. The future is bright for NLP and it should not be underestimated.

Written by Johannes Stiehler, CTO at ayfie.