Microsoft Research has joined in the deep learning gold rush and has produced a photo classifier based on neural networks. Project Adam is claimed to be fifty times better than its rivals - but is it enough to be claimed as a highly disruptive breakthrough.
The goal of Project Adam is to:
"create a brain-scale neural network to power the applications of tomorrow, inspired by the brains of today"
As explained in its promo video, deep learning draws inspiration from how the brain learns to train software to perform human activity, in particular to recognize speech, interpret and classify images and read documents, which it claims,
are some of the first steps towards true artificial intelligence:
In common with otherdeep learning initiatives, Microsoft research has concentrated on a photo classifier and has produced one, based on an asynchronous technique based on a distributed system, that it is faster, more efficient and more than twice as accurate.
According to Trishul Chilimbi, one of the Microsoft researchers who spearheaded the Project Adam effort:
“We wanted to build a highly efficient, highly scalable distributed system from commodity PCs that has world-class training speed, scalability, and task accuracy for an important large-scale task. “We focused on vision because that was the task for which we had the largest publicly available data set ... Our system is general-purpose and supports training a wide variety of deep-neural-network [DNN] architectures. It also can be used to train large-scale DNNs for tasks such as speech recognition and text processing.”
To exhibit its prowess Microsoft Research points to the fact that not only can it distinguish between types of dog it can tell the difference between two breeds of Welsh Corgi - the Pembroke Welsh Corgi and the Cardigan Welsh Corgi.
As with Google's Deep Neural Net that we learned about in June 2012, the key to this ability is the use of "convolutional layers". Chilimbi says:
“What we found was that as you add levels to the DNN, you get better accuracy until a certain point. Going from two convolutional layers to three convolutional layers to five or six seems the sweet spot. And the interesting thing is that people have done studies on the human visual cortex, and they’ve found that it’s around six layers deep in terms of layers of neurons in the brain.
“The reason it’s interesting is that each layer of this neural network learns automatically a higher-level feature based on the layer below it. The top-level layer learns high-level concepts like plants, written text, or shiny objects. It seems that you come to a point where there’s diminishing returns to going another level deep. Biologically, it seems the right thing, as well.”
Explaining Project Adam’s ability to identify corgis, the layers appear to work like this:
The first layer learns the contours of a dog’s shape. The next layer might learn about textures and fur, and the third then could learn body parts - shapes of ears and eyes. The fourth layer would learn about more complex body parts, and the fifth layer would be reserved for high-level recognizable concepts such as dog faces. The information bubbles to the top, gaining increasingly complex visual comprehension at each step along the way.
Project Adam isn't just about dog breed classification. In the video two examples are provided of what might materialize from this type of research. The first is that you might photograph your food to get an instant analysis of its nutritional information; the second is that photographing a skin lesion and sending it and basic lab results for early diagnosis.
So is this a game changer? Chilimbi thinks so:
“Computers until now have been really good number crunchers. Now, we’re starting to teach them to be pattern recognizers. Marrying these two things together will open a new world of applications that we couldn’t imagine doing otherwise."
Recently Google's neural network team demoed a system that can recognize a lot of different things in photos. In this year's ILSRVC competition they have a neural network that can recognize multiple t [ ... ]