Page 1 of 4
The ability to extract and track the positions of the human body is a remarkable feature of the Kinect, but how do you use it? It seems more complicated than the more basic video and depth outputs. Fortunately, once you understand the structure of the data returned, it isn't much more involved. We look at the simplest possible example.
Practical Windows Kinect in C#
- Introduction to Kinect
- Getting started with Microsoft Kinect SDK 1
- Using the Depth Sensor
- The Player Index
- Depth and Video Space
- The Full Skeleton
- A 3D Point Cloud
In this chapter, we make explicit use of the basic information described in chapter two, in particular the how to display a video image and some of the ideas in chapter 5 on converting between co-ordinate systems.
Working with skeletons
One of the big attractions of the Kinect is that it not only provides raw video and depth data, but also the way that it processes it to produce player indexes - see chapter 4 - or even complete skeletons of each player.
Using the skeleton engine seems more difficult than using the other facilities simply because what is detected is a skeleton - an apparently complex data structure. In fact calling it a skeleton is most of the problem. As will become clearly it is in fact much simpler than you might think.
To example how it all works we are going to construct the simplest possible example. There are a number of example programs that show you how to display a complete skeleton complete with different color coding for different limbs. This is impressive but it doesn't make it easy to see what the operating principles are. The example program in this article does just one thing - it tracks a players head. This might not seem as impressive but it is easy to follow and once you can track a head the rest of the body, the complete skeleton becomes easy.
First the video
We first need to construct a program that displays the video from the camera so that we can mark the position of the player's head. This just the basic video display program that was introduced in Chapter 2, so if you need detailed explanations of how it all works read that chapter.
Start a new C# Windows Forms project. A WPF-based project would be more or less the same, apart from the way way the bitmap was processed for display.
For simplicity we will use Windows Forms.
Make sure you have loaded a reference to the Kinect DLL
To the start of the program. In the Form's constructor we create a Runtime object so that we can use the Kinect:
sensor = KinectSensor.KinectSensors;
The sensor variable is global allowing us to get a the Kinect from anywhere in the program - not good design but simpler for an example.
Next you have to initialize it to use video and the depth camera and to use SkeletalTracking :
The new AllFramesReady event can be used to trigger code when all of the frame types you hve requested are ready to be processed. So we can simply use a single event handler:
sensor.AllFramesReady += FramesReady;
Finally we can set the sensor running:
Processing the Video data - GDI
When the FramesReady event handler is called both the depth and the video frame are ready to be processed:
void FramesReady(object sender,
First we retrive the video data. The idea is that we are going to mark the location of the head on the video data as the tracking follows the player around the frame.
ColorImageFrame VFrame = e.OpenColorImageFrame();
if (VFrame == null) return;
byte pixeldata =
We are going to want to draw a cross on the video data. In the previous chapters we have just used direct manipulation of the bit array to set pixels. This is fine when you only want to work with a few pixels and it has the advantage of not involving any other objects. However once you need to start drawing lines to form a skeleton then things are too difficult to work with via direct manipulation.
At this point you have to use whatever graphics facilities the framework you are using provides. The problem is that there is a split between Windows Forms and WPF. For this example we are going to use Windows Forms and the GDI because it is closer to the same facility in C++. In the next chapter we will look at using WPF graphics.