|Microsoft Wins ImageNet Using Extremely Deep Neural Networks|
|Written by MIke James|
|Tuesday, 15 December 2015|
While just about everyone else is forming foundations and institutes to further AI, some researchers are actually getting on with doing it. This year's ImageNet competition has been won by Microsoft, which comes as something of a surprise.
It is a surprise because overall it is Google that makes the most noise about AI and in the popular mind at least Google is miles ahead of the competition. In truth all of the big companies engaged in the race to bring AI to the masses are really just fine tuning the same basic approach to the problem - the Deep Neural Network.
So how did Microsoft do it?
The main ImageNet competition is just about who can turn in the best, i.e.. lowest, error rates on a 100,000 photo database classified into 1000 object categories. A side task it to locate the object in the picture. Microsoft manages an error rate of 3.5% and a localization error of 9%. Google's previously winning network turned in a similar figure for the error rate but for localization the difference was larger with a 19% error.
In previous years neural networks with 30 or so layers came in first. This year the same neural network approach yielded improvements by going deeper. Microsoft's network was really deep at 150 layers. To do this the team had to overcome a fundamental problem inherent in training deep neural networks. As the network gets deeper training becomes more difficult so you encounter a seemingly paradoxical situation that adding layers makes the performance worse.
The solution proposed is called deep residual learning. While the general idea of deep residual learning is motivated by reasonable assumptions, it seems that the reason it actually works is still vague.
The idea is that if an n-layer network learns a task reasonably well, adding more layers should produce at least as good a performance - because that's what you get if the extra layers are set to the identity transformation.
The proposed method changed the learning task to make it easier for the standard learning algorithm to learn an identity transformation. Of course, in practice it is unlikely that an identity transformation is optimal, but the method seems to work more generally and finds better solutions.
To quote from the paper explaining the work:
"In real cases, it is unlikely that identity mappings are optimal, but our reformulation may help to precondition the problem."
The new architecture can be implemented using existing systems and the team even explored even deeper networks - up to 1000 layers - but the results weren't as good presumably due to overfitting. For this size of model the dataset was comparatively small.
So it seems we are entering the era of not just Deep Neural Networks but of Extremely Deep Neural Networks.
One of the recurring themes of the development of neural networks, a point often made by Geoffrey Hinton is that we have had the answer all along. The neural network invented back in the 1970s was just not deep enough. Since then each breakthrough has involved finding ways of effectively training ever deeper networks - and so the trend continues.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
or email your comment to: firstname.lastname@example.org
|Last Updated ( Tuesday, 15 December 2015 )|