By looking at all the coverage of the Voice User Interfaces (VUIs), we could easily think that some sort of a massive shift to this new way of human-computer interaction has already happened.

But it didn't and we still have a long way to go before we can throw out the mouse, keyboard, or touch screen and replace them completely with VUIs.

For some of us it will take a very long time to get used to the voice-based interfaces, specifically the lack of a screen. But this is the natural way we communicate and when the technology becomes seamless enough, voice communication could easily become the best way to interact with computers.

However, we shouldn't look at voice interaction as a replacement to graphical user interfaces (GUIs) but rather as an alternative, no matter the inflated promise behind the likes of Amazon's Alexa, Apple's Siri, and Google Home.

Visual interfaces aren't going anywhere and here's why:

VUI is super useful for issuing simple commands, but once the system needs to return complex information, the graphical interface ends up being a far better option.

So, our task as UX designers and product makers is to figure out the relationship between GUIs and VUIs and how these interfaces should interact.

The good and the bad about Voice Interfaces

The AI technology that's enabling voice user interfaces is advancing, but still not at the game-changing pace. Voice recognition and interfacing is pushed forward with machine learning technologies that enable the system to recognize and process a person's specific speech patterns.

The idea of building systems that feel more human-like is strong enough to keep the industry focused on figuring out the challenges associated with Voice UIs. And if we look at this from the UX design perspective, voice is simply another way to reduce friction in using digital products, whether we're talking about VUIs for apps or smart assistants. It's all about getting more done faster and easier.

But...do users even know what VUIs can and cannot do?

This seems to be one of the main issues with voice interfaces - people don't really understand their capabilities because they don't see what's available and can't easily explore options like with a graphical UI.

So most people use their devices to perform simple tasks or ask simple questions, like asking Siri to tell them the weather. This is something a graphical interface could do with the same efficiency and speed.

The limitations of not knowing what Siri or a device like Google Home can and cannot do stops users from using them more. Not to mention that if you want to go into more details, the system wouldn't be able to parse information at that level of granularity. It's designed for linear responses and the AI behind it is still not mature enough to make the system truly conversational.

So, we're not 'speaking' with machines yet, we're just using voice-based tech to get answers to basic questions or perform simple and mundane tasks. At the end of the day, it seems like voice is still just a supplementary technology.

How can voice-based technology become smarter?

The answer is understanding user's intent, their context and sentiment. It may sound like something that we should've figured out so far, but in reality things have been advancing at a slower pace.

Why? Because language, more specifically the way we humans use it, is incredibly complex. Just think about it, there's so many ways to articulate a basic intent via voice so interpreting a user's intent at each point in the conversation to deliver the right response is incredibly hard. Here's how AI technologies can help:

  • Natural Language Processing - break down a statement into context to further understand the user's intent and entities they're referring to
  • Sentiment Analysis - drills down to the choice of words to help understand more subtle meaning of what the users are saying.

At SpiceFactory, we've developed several advanced AI chatbot solutions for different industries, like events and banking, and we've tackled these challenges head-on. You can read more about how our team leverages NLP and sentiment analysis to build intelligent chatbots here.

Information still needs to be structured

When you think about creating a frictionless user experience with a Voice UI, there's a lot that can be tackled with technology, e.g. better speech recognition algorithms, but some proven UI design practices also play a key role.

Similarly to visual interfaces, if you want to create a smooth user flow with a VUI, you need to start with the basics - structuring the information in a logical, clear, and easy-to-understand way.

But how do you build the information architecture for a voice app when the user cannot see anything? You need to adapt your process for conversational interactions by:

  • Understanding user goals and intent
  • Mapping the conversation flows_ _
  • Using the right context to sound more 'human' and build user trust

If a voice-based system fails to emulate human-to-human conversation with the user, it will simply fail to differentiate itself from customer service voice menus and similar systems that are only activated by user's voice.

Conclusion

By the looks of things we're going to see voice interactions proliferate across software and hardware products.
However, many of the old models we've used to build graphical UIs don't even apply to this new medium. So we need to embrace and start refining this new technology by working to better understand our users' intent at every step so we can provide the experience that they expect and deserve, which means to combine the benefits of voice and graphical UIs together.