Reading Time: 5 minutes
On April 1st, 2016 I was asked on Quora about Apple creating Voice First devices similar to the Amazon Echo system. It was controversial and contrarian. It will turn out I was also inaccurate, I said 12 months it will be about 14 months. As we enter into a new phase of history, I think it is a good time in this calm before the Voice First storm to reflect how far we have journeyed.
April 1st, 2016:
How long will it take for Apple to have an Amazon Echo-like product?
In the next 12 months with very little effort. In fact I assert Apple is already 90% along the way.
I Have Seen The Future And It Has A Voice
Today (12 noon April 1st, 2016) I created a simple yet crude hardware and software hack to have Apple TV with Siri order and pay for a Pizza from a local restaurant (twitter video):
It always starts with Pizza 🍕 pic.twitter.com/sKQ3BdK6uH
— Brian Roemmele (@BrianRoemmele) April 1, 2016
I didn’t ask permission and this is exploratory frontier research. However the results have been rather robust.
I call this Voice Interaction using Quasi – Voice First™ platforms, Voice Commerce and Voice Payments are a subset. These platforms are a certainty to become the most important way to interface with a computer over the next 10 years. Simply put 80% of common computer use, ultimately can be distilled to a yes or no spoken answer.
Specimen of early breadboarding hardware extension of Apple TV using Raspberry PI.
Of course Apple already has an always-on for Siri as an appendage to iOS devices like the iPad and iPhone. However this is not a standalone system, nor is it cost effective. The lowest cost modern iOS device is the current version of the Apple TV with the Voice enabled remote. I choose this device as it most closely resembles the atypical device Apple should invent. With this in mind, I have been successful in modifying the new Apple TV remote to become a powered always-on system using Siri voice recognition and voice synthesis (with some significant artificial limitations). Additionally I have been able to construct test apps (with tremendous parkour) that can be selected via voice and with external effort, connected together in a series to complete just about any task. In fact, as it stands today Apple TV is an order of magnitude more powerful then the Echo platform from a programmatic standpoint.
Specimen of Apple TV remote with cover removed.
I have also been building and modifying hardware on Amazon Echo since November 2014. Recently with the ASK system Amazon has opened up the Echo and Alexa to development. Clearly Amazon is ahead of Apple, however we are in very early days of Voice Commerce.
Amazon also has recently virtualized Alexa to the very cost effective Raspberry PI with astounding results thus far in my research. This one movement by Amazon will unleash a torrent of creativity.
Specimen of Amazon Echo deconstructed.
Specimen of my research on Amazon Echo and Alexa on Raspberry PI.
Apple has not moved as rapidly as they could have with Siri. Frankly Siri did more before Apple acquired the company. Apple has taken a very slow and methodical approach to voice interfaces. Today they are a number of steps behind Amazon. I went about doing something about it.
Apple’s roots start with two people in a Cupertino garage that did not ask permission and had their will with chips that were designed for calculators. In many ways Apple should unleash the creativity that will be the next computer revolution.
With this sprit I have focused on Apple TV for Siri research for a number of fundamental reasons. It is clear that this is the lowest cost modern iOS platform available today. My Voice Commerce thesis is also based around a rich and robust developer community prospect. Thus this informed the foundation of my research.
I have discovered that just with the existing Apple TV as the basis of an echo platform Apple could be in a position to surpass Amazon in a number of months. The paradigm Apple would use is based on the existing App economy and the vibrant developer community.
Apple would only need to take the basic electronics of Apple TV and use (when needed) AirPlay for a screen, perhaps the main TV screen, to present any visual information. My voice thesis presents the concept of no keyboard or touch screen and an on-demand, perhaps ephemeral screen.
The Rise Of The Voice First™ Device
If Apple were so inclined in about 1 month of work (perhaps with my research notes) and a simple WiFi/Bluetooth Speaker and multi-axis microphone extension, Apple TV offers all that is needed to create an Amazon Echo platform today.
This is not an Apple vs. Amazon situation. The market is the entire computer industry. In fact, I assert that these companies will have a “voice first device” in the next 24 months:
- Many more..
Indeed those companies that understand this shift early, will have first mover advantage. However it is very early days as I have stated above.
Thus all that is needed by Apple is the vision and the mandate to lead into the next paradigm shift of computers in 50 years: voice. This is such a massive shift, I have predicted in 2012 and more intensely today, that 50% of computer interactions will be via voice in the next 10 years.
In the next 10 years, your voice is not going to navigate your device, it is going to replace your device. The screen will be ephemeral and situational.
Unofficially Apple is already there.
IMPORTANT: Any reproduction, copying, or redistribution, in whole or in part, is prohibited without written permission from the publisher. Information contained herein is obtained from sources believed to be reliable, but its accuracy cannot be guaranteed. We are not financial advisors, nor do we give personalized financial advice. The opinions expressed herein are those of the publisher and are subject to change without notice. It may become outdated, and there is no obligation to update any such information. Recommendations should be made only after consulting with your advisor and only after reviewing the prospectus or financial statements of any company in question. You shouldn’t make any decision based solely on what you read here.