Microsoft Research has been hard at work doing impossible things again. This time it's a library of code that converts a 2D video or still image into a 3D depth image.
This is all about taking a simple 2D image and working out how far away from you each of the objects in it are. The algorithm attempts to construct a depth map of the sort the Kinect creates but without using a Kinect.
Once you have a depth map you can apply it to the single image and create a stereo pair which can be viewed as a stereoscopic image.
You might at first think that all of this is an impossible task because you need stereo vision and hence a pair of stereo photos to work out depth. After all, we have two eyes to be able to judge depth. However, try looking at a scene with one eye closed you will still be able to judge distances and even more impressive, you can judge distances in a single photo.
Single frames converted to depth maps - darker is closer
So how is this achieved?
The new software does the job by keeping a database of objects with known depths that it can recognize in the photo. It then estimates warping functions, which indicate how the object is different in the target photo. The matched object is assumed to be at the same depth as the library object. A label based smoothing procedure is then used to improve the depth estimates. If the input is a video then motion flow is used to improve the depth estimates - basically pixels that are in motion have to be in front of background pixels.
The method was trained and tested using a set of videos that were gathered in stereo using a Kinect to measure depth where possible. You can see the arrangement used in the photo above.
You can see the algorithm in action in the following video:
From video to depth map to 3D stereo images
You can download both the Matlab implementation of the algorithm and the training data.