A few days ago, Facebook CEO Mark Zuckerberg blogged about his experiences building Jarvis, a digital butler that he built as part of his personal yearly challenge. Among the many lessons learned, Zuckeberg highlighted the importance of supporting both voice and text models as conversational interfaces. I found that point particularly interesting s there is not a lot of content about the differences between voice and text when comes to designing conversational interfaces.
Most conversational interface platforms support both text and voice as cognitive input-output mechanisms. However, from the user experience and design standpoints, there are very important differences between text and voice interactions. At a high level, here are some of the key differences that should be taken into consideration when designing voice and text conversational interfaces.
1-Text is Better for Richer UX
Textual conversations allow the usage of rich UI formats such as HTML to display information. From that perspective, text conversation such as the ones that take place with chatbots in a messaging client are better at displaying complex information structures than the equivalent voice models. Facebook Messenger is a great examples of a platform that enables rich UX in textual conversations.
2-Voice is Better to Communication Emotions
Voice bots are more efficient communicating and interpreting emotions as calm, excitement, sadness, etc.By efficiently using elements such as tone, speed, accent and others, voice vots can be very effective establishing an emotional connection with users.
3-Text is Better to Avoid Repetitions
A user interacting with a chatbot in a messaging application can access the entire conversation thread by simply scrolling up and down which avoid unnecessary repetition which are so common in voice conversations. That model of rapidly accessing past information in absent of voice conversations.
4-Voice is Better for Group Conversations
Voice bots such as the ones powered by Amazon Alexa are very effective interacting with multiple users at the same time. That characteristic makes voice bots better equipped to handle real time collaboration scenarios such as dialogs in a meeting room.
5-Text is Better to Implement Complex User Actions
By leveraging rich UX artifacts such as login forms, links or data lists, chatbots are a better mechanism to model complex user actions such as login into a back-office system, booking a flight, etc.
6-Voice is a Better Form of Authentication
Voice bots are intrinsically more secure as the unique characteristics of the human voice as be used as an authentication mechanism. Additionally, voice bots can effectively single-sign-on into back office systems using the user voice as the initial identity.
7-Text is Better for Long, Data-Centric Communications
Text is a robust channel to deliver long texts or data payloads. Attempting to accomplish that with voice bots can result on very complex conversations model subjected to behavior such as interruptions, subject changes and other frequent dynamics in human conversations.
8-Voice is Better for Succinct, Real Time User Actions
Voice is a great channel to communicate succint, real time actions such as buying share of a public stock or requesting an Uber.
9-Translating Voice to Text Misses Context and Emotion
Although simple, the process of translating voice to text communication in both often misses element such as emotion, tone of voice or other intrinsic aspect of voice conversation which have no direct representation in a textual dialog.
10-Translating text to Voice Misses UI Elements or Data Structures
Converting textual conversation to voice often misses UI structures which are very hard to translate into plain text narratives. As a result, text to voice translations are only effective for short forms of text.