Reading Time: 3 minutes
“I can type faster than you can talk” Not Likely.
There have been numerous academic papers and research that shows speech has become robustly more efficient than the classic smartphone keyboard entry modality. Researchers Sherry Ruan, Jacob O. Wobbrock, Kenny Liou, Andrew Ng, James Landay crafted a great empirical study to prove just how efficient voice is. The researchers at Stanford University demonstrated that with modern voice recognition systems, voice has a 3 times higher efficiency when compared to typical typing on a smartphone . Although this was an impartial academic study, the researchers actually expected that voice would have just a slight advantage. It turned out that speech recognition has improved to such a degree over text input the researchers were rather shocked by the results:
“In our study, we found that text entry speeds, in words per minute (WPM), using speech were about 3.0 times faster than the keyboard for English (161.20 vs. 53.46 WPM) and about 2.8 times faster than the keyboard for Mandarin Chinese (108.43 vs. 31.31 WPM). Total error rates were also favorable to speech, with speech error rates being 20.4% lower than the keyboard error rates in English (2.93% vs. 3.68%), and 63.4% lower in Mandarin (7.51% vs. 20.54%). Thus, speech was demonstrably faster and more accurate than the keyboard.”
There has been an aggrieve argument from what I call Voice First deniers that posits “Keyboards are faster and more efficient” and they use this unsupported postulation as a basis of why Voice First is “decades away” even while the explosive growth of Echo/Alexa and the commanding rise of Siri has contradicted these ideas.
There are many reasons that just about every postulation about Voice First is decades away is similar to the idea that we will only need 10,000 computers was presented in the later 1960s was incorrect. This study presents that just one element, efficiently and error free voice input is clearly demonstrated with current technology:
“With laptops and desktops, the dominant method of text entry is the full-size keyboard; now with the ubiquity of mobile devices like smartphones, two new widely used methods have emerged: miniature touch screen keyboards and speech-based dictation. It is currently unknown how these two modern methods compare. We therefore evaluated the text entry performance of both methods in English and in Mandarin Chinese on a mobile smartphone. In the speech input case, our speech recognition system gave an initial transcription, and then recognition errors could be corrected using either speech again or the smartphone keyboard. We found that with speech recognition, the English input rate was 3.0x faster, and the Mandarin Chinese input rate 2.8x faster, than a state-of-the-art miniature smartphone keyboard. Further, with speech, the English error rate was 20.4% lower, and Mandarin error rate 63.4% lower, than the keyboard. Our experiment was carried out using Deep Speech 2, a deep learning-based speech recognition system, and the built-in Qwerty or Pinyin (Mandarin) Apple iOS keyboards. These results show that a significant shift from typing to speech might be imminent and impactful. Further research to develop effective speech interfaces is warranted.” 
Accuracy and efficiency is just one element of why Voice will be the fundamental input modality over the arc of the next 10 years. This academic empirical study goes a very long way to explain just how superior Voice has become. It is likely not to stop many of the Voice First deniers, especially the ones on Sand Hill Road who have intellectually invested in a future where Voice was not prominent. However at some point, you can not stop a revolution who’s time has come.