Apple Is Solving The Far-Field Voice First Identity Problem.


Reading Time: 5 minutes

 

 

The Apple Head Fake

“No one wants to watch movies on a small screen”—Steve Jobs, 2002

Today it was announced [1] Apple acquired the Israeli facial recognition AI company RealFace.  RealFace is the creator of the Pickeez app: a fun, innovative way to enjoy photos. This used the companies RealFace recognition software automatically chooses the best photos from every platform in which the user’s photos are stored.

This acquisition completes a trifecta of of Israeli facial recognition AI companies by Apple in the last few years including:

PrimeSense-PrimeSense’s technology was originally applied to gaming but was later applied in other fields. PrimeSense was best known for licensing the hardware design and chip used in Microsoft’s Kinect motion-sensing system for the Xbox 360 in 2010

LinX-LinX’s technology uses software to extract depth information for each pixel to create a depth map for that can also be used for 3D image reconstruction.

Just like with the acquisition of TouchID, Apple is just about cornering the market with far-field and near-field [2] facial recognition AI technology.  Just like with the with the acquisition of TouchID most will only see a limited use case.  This new technology will become central to just about all Apple product in existence and unreleased.

There are many reasons for these acquisitions including the ability to unlock you iPhone just by turning it and looking at the screen, Apple has patents covering this technology [3].  However I think the surprising use of this rather strong facial recognition technology in Voice First Echo-ish type of device will be the ultimate way for Apple to enter into the far-field Voice First category.  Although TouchID will be in future iPhone and used foe a number of reasons, there will also be image base recognition.

~—~

~—~

There Are No Passwords In The Voice First World

One of the largest challenges in the far-field Voice First user environment is “who am I talking to?”.  This is a huge problem for an array of reasons, from security to financial protection.  Today, anyone can ask the current far-field devices a question about personal aspects of your device or worse yet place orders and move money from your bank.  This problem will only get worse.  I identified this exact issue in my Voice Manifesto from 1989 and have spent a considerable amount of time thinking about it and doing research in hardware and software to solve it.  There are about 30 solutions, Apple has some powerful ones with the acquisitions mentioned above.

It is unacceptable to go to a screen device to put in a password or get cleared by fingerprint to use all the features of a Voice First far-field device. There are three fundamental ways to determine who is speaking to or using a device in the far-field:

  1. Voice pattern identification-Audio
  2. Facial recognition-Video
  3. Distant biometric identification-Video

Today Apple has technology that covers the video based identity recognition using facial and biometric data.  The interesting aspect is Apple can use both video technologies to validate a user with nearly 100% certainty, even better than TouchID in theory.

For the Voice First revolution to become omniscient and omnipresent there must be identity verification.  Otherwise as the AI gets to know you more and more over the years, your personal data would be presented for the world, just for the asking.

You Look Upset, Whats Wrong?

Apple also owns the very important technology from Emotient.  Apple acquired Emotient in 2016. Here is how they described their technology:

Emotient’s proprietary algorithms make it possible to discern the most subtle expression or changes in expression and translate that into a defined emotional reaction. With a camera-enabled device or external webcam, our system can quickly process facial detection and automated expression analysis in real-time, or, for non time-sensitive requirements, it can scan images and videos in batch mode to deliver in-depth analysis of single-subject and multiple-subject videos. FACET leverages machine learning in large datasets in order to achieve optimal speed and utility.” —January, 2014

Building around the work of Paul Ekman, Ph.D.[4] a pioneer in the study of emotions and facial expressions, and a professor emeritus of psychology in the Department of Psychiatry at the University of California Medical School (UCSF) where he has been active for 32 years, Emotient used AI to machine learn his ground breaking research in micro-emotions. This system is called FACET (Facial Action Coding System) and Emotient applied machine learning methods to high-volume datasets that were carefully constructed by a team of behavioral scientists, including Dr. Ekman, who later joined Emotient’s Advisory board with Dr. Terry Sejnowski, Ph.D.

Specimen of the data output stream from the FACS of a live video.

I have been a student of Dr. Ekman’s work for many years and also a researcher in elements of AI including emotional intent extraction systems. I took immediate note when Emotient first began to demonstrate it’s technology as well as the patents the company filed. It was ground breaking. I had since tested their systems up to the moment Apple acquired the company on January 7th, 2016. I wrote about this in detail in 2016 [5].

Apple Already Owns The Near-Field Voice First Segment

Apple has clearly invented the leading near-field Voice First device, AirPods they just may be on a very conservative path to transition to a full Voice First system.  The elements to make AirPods as useful as Alexa is in the far-field are already at hand.  The use cases for near-field devices are clear, fundamentally near-field is about more personal interactions and information.  Examples are messages and financial interactions that you may not want to share in an open room.  Yet with this new identity technology, Apple may find the fine line for how to deal with the complications of the far-field with identity.

The near-field will allow for a wide range of new and not yet thought of applications. This will span from areas that are consumer level to commercial and industrial level.  Just in the commercial area I have surfaced over 1,200 direct use cases after studying this for over 30 years.  The right technology has finally converged.  One example of a commercial use case is for service workers to have instant access to any question simply by saying: “Let me ask our Voice AI system and check”. It is not hard to imagine any service situation where this can be applied.

It is in Apple’s DNA not necessarily to be the first, but to come into to a new device market with a considered and deliberate entry that has defined Apple products from MP3 players to Smartphones.

The Classic Apple Head Fake

With Apple executives and main stream media outlets promising Apple has no plans for a far-field Voice First device [6], but Voice First is the future [7] contradiction,  Apple is setting up the classic perfect storm for like “No one wants to watch movies on a small screen” perceptions to smokescreen development in plain sight.

With Apple’s facial recognition and emotional facial reading technology, Apple will have an Apple worthy far-field Voice First device.

I expect a far-field Voice First Echo-ish device from Apple in the next 24 months.  It will be like no other far-field Voice First device. It will be insanely great.

 

 

____

[1] Apple Acquires RealFace

[2] The Near-Field AndFar-Field Voice First Devices

[3] Apple Patent

[4] Paul Ekman

[5] Apple AI Secret: How App will decode 43 muscles and read your emotions

[6] Apple May Not Make AN Amazon Echo Type Device

[7] Eddie Cue Of Apple Talks About The Power Of A Voice First OS

 

 

Get the limited edition Steve Jobs patent t-shirt today!

Products from Amazon.com

~—~

~—~

 

Of interest: