Microsoft has released details of the PC version of the Kinect depth camera that is so much part of the new Xbox One.
The bad news is that the Kinect for Windows sensor isn't going to be available until some time after the Xbox One is released to the game playing world. How long after, we will have to wait and see; all Microsoft will say is that it will be released in 2014. There is also going to be a new SDK, but what software is to be included is even more sketchy than the hardware details of the new Kinect.
The key advance in the new sensor is resolution. The whole principle of operation of the depth camera has been changed. The original, i.e. current, Kinect works using structured light. A pattern of dots is projected onto the scene and the depth is obtained by measuring how much movement of each dot occurs due to parallax. This is a good method. It is cheap to reproduce once designed, but it suffers in that its accuracy is limited by the quality and stability of the optics. For example, it needs thermal stabilization to stop expansion from distorting the pattern mask.
For the first generation Kinect, Microsoft bought in structured light technology from PrimeSense. Now they have implemented their own depth measurement system based on a "time of flight" camera. This works by measuring the time it takes for light to make the round trip from the emitter to the sensor.
That is, it works like a radar (or more accurately a Lidar) but it doesn't send out a single pulse and then wait for it to return. It sends out a modulated light beam and then uses a 2D sensor to image the returning light. The phase of the modulation received at each pixel gives the time of travel of the light and hence the distance it has traveled. In principle, this can provide higher depth and spatial resolution. Interestingly the Kinect blog says:
Also included is Microsoft’s proprietary Time-of-Flight technology,
Note the use of the word "proprietary". This strongly suggests that there won't be a competing device based on the same technology for a while.
Microsoft hasn't released any exact details of the new depth sensor, but it is clear that it is much better than the original Kinect - to quote the Kinect blog:
"All of this means that the new sensor recognizes precise motions and details, such as slight wrist rotation, body position, and even the wrinkles in your clothes."
You can see that this appears to be no exaggeration from the action photos:
As well has having increased resolution, the depth camera also has a greater field of view. In this case field of view means the volume of space it can work with. Again we have no exact data, but it seems that you can get to within 3 feet (1 meter) of the sensor and it carries on working. This is important because it means that it could be used as a gesture input device that works with desktop machines. In this role it competes with the Leap, the PrimeSense Capri and Intel's perceptual computing project.
The improved resolution is also used to provide better skeletal tracking. The AI algorithms used in the original tracking must have been improved to take account of the finer detail because now it can track more joints including the tip of the hand - notice that this isn't quite saying that it can track finger movements. We will have to wait for more data to say just how good it is. It can also track up to six users at the same time.
The improvements to the depth sensor are clearly the most important news, but the new IR camera is also intriguing. Not only can it see in the dark, it also provides temperature measurements of objects in the view. It is difficult to know exactly what this will allow, but the Kinect blog makes some interesting claims including that it will be able to recognize facial features, hand positions and more. Exactly how isn't clear. At the Xbox launch it was claimed that the IR camera can detect the user's pulse and this can presumably be used to estimate something to do with excitement in a game. It also raises the question of what sort of image processing is being used to detect a pulse - could it be Eulerian viewing?
So the new Kinect brings the interesting prospect of a multi-modal input device - just right for robots and all sorts of new intelligent applications. You have a hi-def RGB feed, an accurate depth camera, and an IR view of the same scene. These could be used together to extract more information than the original Kinect allowed. Now all we need is more information and some actual devices. Of course there is no information on pricing or on what support we can expect for the existing Kinect. We have to hope that the new SDK supports it as well as the new device.
More information is promised at some special sessions at BUILD 2013 at the end of June.