What the team did next was to train a type of classifier called a decision forest, i.e. a collection of decision trees. Each tree was trained on a set of features on depth images that were pre-labeled with the target body parts. That is the decision trees were modified until they gave the correct classification for a particular body part across the test set of images. Training just three trees using 1 million test images took about a day using a 1000-core cluster.
The trained classifiers assign a probability of a pixel being in each body part and the next stage of the algorithm simply picks out areas of maximum probability for each body part type. So an area will be assigned to the category "leg" if the leg classifier has a probability maximum in the area The final stage is to compute suggested joint positions relative to the areas identified as particular body parts. In the diagram below the different body part probability maxima are indicated as colored areas:
Notice that all of this is easy to calculate as it involves the depth values at three pixels and can be handled by the GPU. For this reason the system can run at 200 frames per second and it doesn't need an initial calibration pose. Because each frame is analysed independently and there is no tracking there is no problem with loss of the body image and it can handle multiple body images at the same time.
Now that you know some of the detail of how it all works the following Microsoft Research video should make good sense:
The Kinect is a remarkable achievement and it is all based on fairly standard classical pattern recognition but well applied. You also have to take into account the way that the availability of large multicore computational power allows the training set to be very large. One of the properties of pattern recognition techniques is that they might take ages to train but once trained the actually classification can be performed very quickly. Perhaps we are entering a new golden age when at last the computer power needed to make pattern recognition and machine learning work well enough to be practical.
What to do with a Kinect
You can use a Kinect to just play games - but there is more fun and perhaps even profit to be had from putting it to other uses.
So what else can you do with a Kinect?
A device that measures the depth of every point in a scene may not seem to have much potential but this isn't the case. Inventive uses of a Kinect fall into a number of different categories.
You can use it to track a human and then respond to movements and gestures as a free from user interface.
You can use it to create something artistic by responding to depth or human movements.
You can use it with robot as a navigation and device.
You can use the depth map to create virtual and augmented reality applications.
You can use it to make 3D measurements and model construction.
Because a depth image can be compressed to a smaller size than a color video you can attempt to use it to send realtime images over low bandwidth connections.
And no doubt there are many more and we haven't really discussed the fact that the Kinect has a powerful audio capability that mostly goes ignored.
The skills you will need to master the Kinect and build something useful, fun or exciting are many. You do need to program and for this book you need to be able to program in C#. Ideally you also need to have some idea of how 2D and 3D graphics work - the more you know the more you are likely to see how the Kinect might be used in new ways.
If you really want to do something groundbreaking then some idea of how artificial intelligence works and how artificial vision in particular works would be an advantage. The big problem here is that to implement anything that is completely new you are going to need lots of computer time to train any recognition algorithms. So if you have plans to design say a hand recognition algorithm be prepared to find out as much as you can about AI and pattern recognition.
You probably could also make use of some hardware skills - not to hack the Kinect but to build devices that can be controlled by it. Some skill with a development board such as the Arduino would be ideal but this isn't the only way to do things.
In short to do creative things with a Kinect system you need lots of different skills. This is one of the reasons that a Kinect makes a good resource in education.
What ever you plan to do with your Kinect project the following chapters will take you though using its video, depth, skeleton detection and tracking and finally using its audio input.
The distinction between a static compiler and an interpreter is one that can cause controversy. One programmer's compiler is another's interpreter and the whole subject gets very murky when you throw [ ... ]