Google's Deep Learning - Speech Recognition
Written by Alex Armstrong   
Monday, 13 August 2012

There is a revolution going on and no one seems to be taking much notice. The revolution is based on deep learning and it is creating a resurgence in the use of neural networks, but this time to solve real problems - and one of the front runners in the race is Google.


Neural networks have long seemed the right solution to the AI problem. They are an analog of what the human brain does and they learn all on their own. To make a neural network useful all you have to do is train it. This has to be the right way forward and yet there was a problem - the ideas didn't work out too well in practice. What actually happened was that you spent hours of training to end up with a network that learned what you told it but it only did the job well if you spent a lot of time tuning it. Neural networks worked but not well enough in most cases.

Now it turns out that they probably did work all along but we weren't doing things in quite the right way and we had no clear idea of the scale needed. To make neural networks fulfill their promise you need to first give then some deep structure and not rely on a random or simplistic architecture. Next you need to train big systems with big data - lots of it. Until quite recently finding enough data in the right form, and finding the large amounts of computer power to do the training, was a difficult problem. The data problem has been eased by the growth of the web and the computing problem by the growth of cloud computing.

The result is that neural networks are starting to work like never before.

Google recently made some headlines from work that involved letting a deep neural network teach itself what a face was. Unfortunately the data came from video stills from YouTube and as you might expect the network also taught itself how to recognize cats' faces. As you can also guess' this resulted in headlines about AI and Kitten videos rather than "breakthrough in AI".




A recent Google blog starts out with:

"The New York Times recently published an article about Google’s large scale deep learning project, which learns to discover patterns in large datasets, including... cats on YouTube! 
What’s the point of building a gigantic cat detector you might ask?"

Sad isn't it.

It goes on to explain how the same techniques have resulted in a neural network that you might well be using now, assuming you own an Android that runs Jellybean.

"With the launch of the latest Android platform release, Jelly Bean, we’ve taken a significant step towards making that technology useful: when you speak to your Android phone, chances are, you are talking to a neural network trained to recognize your speech."

If you want the back story you need to turn to a survey paper by Geoffrey Hinton et al. at the University of Toronto that is currently in press.

Neural networks have been used for speech recognition before but they never developed to the point of being practical. After the initial burst of activity there was a 20 year dry-spell where other more ad-hoc and specifically engineered approaches took over. Now with the help of the much larger repository of audio available, and with computational facilities like Google Compute Engine, all this is changed.

Neural networks that were applied to speech recognition problems were typically small with a single layer of neurons. Today's multi-layered networks solve the problem much better thanks to the help of new deep training algorithms that work layer by layer though the network. To give you some idea of the size of the task - the network used four layers with 2500 nodes per layer. The data was derived from 6000 hours of recorded voice search data and 1400 hours from YouTube. Much of the work was done on a Google cluster using a map/reduce algorithm.  The result was a 20% increase in accuracy compared to other methods.

A paper on the research is to be presented at Interspeech 2012 in September, but there is already a University of Toronto report, co-authored by Navdeep Jaitly and Google researchers Patrick Nguyen, Andrew Senior and Vincent Vanhoucke that you can download that has some of the details.

This is an exciting time for AI and not just for kittens.

More Information

Application of Pretrained Deep Neural Networks to Large Vocabulary Conversational Speech Recognition (pdf)

Deep Neural Networks for Acoustic Modeling in Speech Recognition (pdf)

Google Research Blog

Related Articles

A Neural Network Learns What A Face Is

The Paradox of Artificial Intelligence

Neural networks


kotlin book



or email your comment to:


To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.



Microsoft Goes All Out On Generative AI

Over recent days, Microsoft has announced both the official OpenAI library for .NET and the AI Toolkit for Visual Studio Code.

BusyBeaver(5) Is 47,176,870

The thing about the BusyBeaver function is that it is very easy to understand, but very difficult to compute. We now know its value up to 5, which isn't much progress for more than 50 years work.

More News

Last Updated ( Monday, 13 August 2012 )