Voice – the next big paradigm shift

By June 14, 2016 One Comment

I just worked my way through 213 slides of Mary Meeker’s annual Internet Trends publication. If you’re at all into understanding how the tech world is evolving I recommend you find the time to read it yourself (embedded below). This time round the most thought provoking slides for me were slides 115-133 where she talks about voice as a computing input.

The way I see it, in the next five years voice will replace typing to become the dominant input method for quick commands and short passages of text. Here’s why:

  • Speaking is much quicker than typing – according to Mary Meeker it’s 3.75x, and the difference will be greater on mobile
  • Speech recognition is starting to work now – as evidenced by the success of Amazon’s Echo and the fact that 20% of Google Android app searches in the US are now voice
  • The keys to success are accuracy and latency, with both improving fast. Google, Baidu and Hound are reporting 90%+ accuracy rates and speculating that adoption will sky rocket when accuracy reaches 99%. Moore’s law is taking care of latency.

However, audio out sucks in comparison with text on screens, so I think we will see ‘voice in – text out’, most obviously on mobile phones where we have a great screen already. Amazon’s Echo and Google’s Home are interesting in this regard. At the moment they feel a bit uncomfortable to use because they don’t offer much feedback that shows whether they are understanding what you are saying and audio out isn’t great to confirm purchases or other actions. However, they are are designed to be operated at distances that preclude reading screens so the solutions are simple. A bar of LEDs on the side of the device  that light green when comprehension is high and red when it’s low would help solve the first problem.

‘Voice in – text out’ throws up some interesting design challenges, and big rewards will accrue to the companies that crack them first.

Additionally, for some time voice dictation will remain inferior to typing for longer form documents where the precise choice of words is important. I might be able to speak at 150 words per minute, but I can’t read and correct what I’m saying at that pace.

Amazon, Google and Apple are the ones pushing voice the hardest, and that’s perhaps not surprising as it will favour large companies and platform owners. Convenience is one of the big benefits of voice and following voice searches we will collectively choose default options much more often than we do today. I can imagine asking my Echo to buy flowers for my mum and simply saying ‘yes’ when I’m offered something similar in price and style to my last purchase. That gives Amazon amazing power to decide who gets my business, and that’s power they will leverage to take a bigger cut. That’s best for Amazon and second best for large existing brands. It’s bad news for startups who will struggle to get promoted due to a preference for recognisable brands and concerns over their ability to handle volume. (This effect will be less pronounced if Amazon do put a screen on the side of future versions of the Echo – then I’m more likely to browse through multiple options.)