Never Ending Image Learner Sees, Understands, Learns
Written by Mike James   
Monday, 25 November 2013

The real progress in AI will only occur when the different areas come together to work on understanding and interacting with the world. NEIL is a program that scans the web 24/7 and looks at photographs to build up common sense knowledge of the world. It's machine vision meets semantic graph.




It seems that machine vision has reached the point that it can be used as a tool to enable a program to begin understanding the world at a very basic level. There have been many attempts to teach computers about the common sense relationships we take for granted. Perhaps the best known is Cyc, a project to build a knowledge base by typing in information started in 1984 by Douglas Lenat. The problem is that this is labour intensive and it is very difficult for a human to know what a computer needs to know. 

A much better idea would be to allow the program to learn concepts on its own, but how can it interact with the world that we take for granted?

The most direct way that we interact with the world and discover common sense is via vision. We also have the advantage that we can interact with what we see, but there is a lot of knowledge to be gained about what there is in the world and how things relate simply by looking.  

This is what NEIL - Never Ending Image Learner - is doing. This is the creation of  Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta of Carnegie Mellon University with funding from  the Office of Naval Research and Google. By looking at pictures stored on the web the program is attempting to extract not only objects but their relationships, and from these the underlying concepts. To do this needs the ability to recognize particular objects - car, aeroplane, person and so on. Training such recognizers is time-consuming, but again the web can be used. The detectors are trained by using Google Image search to return photos tagged with a particular label. These are then used to train classifiers for the object. 

This is a fairly automatic process in that if you want to add a new object all you do is search for it and train on what Google returns. For example, if you want to recognize a hat you could search for pictures labeled "hat" and train the classifier using them. In practice things are a bit more complicated and the overall method includes a clustering step on the returned images to select groups that really do represent good interpretations of the label. 

The actual classifiers used are based on the recently proposed use of Linear Discriminant Analysis - a classical technique, but until recently far too expensive to compute. 

Once the classifiers are trained they are used to examine what objects, attributes and scene types occur in general images downloaded from the web. The program doesn't try to understand every image it is more interested in the statistical relationships. It currently extracts object-object relationships - "Eye is part of baby", "BMW is a kind of car" and "swan looks similar to goose". Object Attribute relationships - "Pizza has Round Shape". Scene-Object relationships - "Bus is found in Bus depot" and Scene-Attribute relationships "Ocean is blue".




You can readily see how these relationships can be extracted if you have enough images and if your object detection is good enough. The final trick is that all of this can be put together to improve and extend the object recognition. For example, if the car detector detects something new that it labels as a car but it has no wheels and isn't found on a road then it is unlikely to be a car. The whole thing starts to feedback to extend and improve the object recognition. In particular the detectors can be tuned to detect sub-categories of object such as particular makes of car. 

The big problem is that this is a big data processing task. The object recognizers use 3912 dimensional feature vectors and a range of 1152 object categories and growing. It has also examined 5 million images to date and extracted 3000 common sense relationships. You can view the current state of things on the NEIL website. To do all this it runs on two clusters of 200 processing cores.

When AI starts to find ways of improving its performance and extending its abilities this is when the real payoffs of the approach start to become apparent. NEIL teaches itself common sense that grows in sophistication as it is exposed to new images of the world. 




More Information


NEIL: Extracting Visual Knowledge from Web Data


Related Articles

Google Has Another Machine Vision Breakthrough?

A Billion Neuronal Connections On The Cheap

Deep Learning Powers BING Voice Input

Google Explains How AI Photo Search Works

Near Instant Speech Translation In Your Own Voice

Google's Deep Learning - Speech Recognition

The Triumph Of Deep Learning      

A Neural Network Learns What A Face Is

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, FacebookGoogle+ or Linkedin,  or sign up for our weekly newsletter.


C book



or email your comment to:



Kotlin 2 Released With Multiplatform K2 Compiler

Kotlin 2.0 has been released. The new version is a major update with improvements including a stable K2 compiler, which is multiplatform from the ground up, and according to JetBrains understands your [ ... ]

The Mycelial SQLite For Beginners Course

There's a self-paced. YouTube-based course by Mycelial on
the ins and outs of SQLite. It's short, succinct and free and a must watch for anybody wanting to get started with  SQLite.

More News



Last Updated ( Monday, 25 November 2013 )