Deep Learning Powers BING Voice Input

Written by Mike James

Tuesday, 18 June 2013

Deep neural networks seem to be entering a new phase and AI is going mainstream. Microsoft has implemented its BING voice search using artificial neurons and the system is faster and more accurate.

Following on from Google's use of Deep Neural Networks DNNs to implement image search in Google+, Microsoft has now rolled out a DNN-based voice recognition system.

voice

As you probably can guess, the groundwork has been done by Microsoft Research:

"Over the past few years, Frank Seide, senior researcher at Microsoft Research Asia, and Dong Yu, senior researcher in the Conversational Systems Research Center at Microsoft Research Redmond, have been at the forefront of this advance, working with scientists and engineers from the Bing Speech team to provide vast improvements in the speed and the accuracy of Bing Voice Search."

It is claimed that the use of DNNs has halved the recognition time and dropped the word error rate by 15%. In addition, it is less sensitive to background noise. The type of DNN deployed is a context sensitive DNN that learns a Hidden Markov Model (HMM). The HMM has a long history of almost-success in speech recognition, but when coupled with a DNN it seems to deliver on its promise. It isn't clear exactly what design of DNN is being used, but you can read about some of the candidate systems in Conversational Speech Transcription Using Context-Dependent Deep Neural Networks,

As Dong Yu reports, the results were impressive and exciting:

“I first realized the effect of the DNN when we successfully achieved significant error-rate reduction on the voice-search data set after implementing the context-dependent deep-neural-network hidden Markov model.

I was so excited that I did not sleep that night. I realized that we had made a breakthrough and called Qiang Huo [a Microsoft Research Asia research manager who also has worked on speech recognition] late at night—daytime in China—to describe the ideas and results.”

<a title="Bing Makes Voice Search on Windows Phone More Accurate and Twice As Fast" href="http://www.bing.com/videos/browse?mkt=en-us&vid=5c9155cc-c40d-45ed-9ee0-64327142e1e5&from=shareembed-syndication&src=v5:embed:syndication:" target="_new">Video: Bing Makes Voice Search on Windows Phone More Accurate and Twice As Fast</a>

This is the same approach to speech recognition that was used in the demo of live voice translation at the Computing in the 21st Century Conference which saw Rich Rashid, Microsoft's chief research officer, speak in English which was recognized and translated to Chinese in realtime.

One odd finding is that training the system using speech in one language improves recognition in another language. This is not only of practical importance where the availability of digitized audio for a language is low, but it must be saying something about the universality of human speech.

Where will DNNs make the next breakthrough?

More Information

Conversational Speech Transcription Using Context-Dependent Deep Neural Networks,

DNN Research Improves Bing Voice Search

Google Explains How AI Photo Search Works

Near Instant Speech Translation In Your Own Voice

Google's Deep Learning - Speech Recognition

The Triumph Of Deep Learning

A Neural Network Learns What A Face Is

Speech Recognition Breakthrough

The Paradox of Artificial Intelligence

McCulloch-Pitts Neuron

Neural networks

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.

Comments

or email your comment to: comments@i-programmer.info

Supersimple - Deep Insights From Data
02/04/2024

Announcing $2.2 Million in pre-seed funding, the Estonian startup Supersimple has launched an AI-native data analytics platform which combines a semantic data modeling layer with the ability to answer [ ... ]

+ Full Story

ZLUDA Ports CUDA Applications To AMD GPUs
18/04/2024

ZLUDA is a translation layer that lets you run unmodified CUDA applications with near-native performance on AMD GPUs. But it is walking a fine line with regards to legality.

+ Full Story

More News

Last Updated ( Tuesday, 18 June 2013 )

More Information

Related Articles

Comments