Kinect SDK 1 - Skeletons
Written by Mike James   
Article Index
Kinect SDK 1 - Skeletons
Using the GDI
Joints
From 3D to 2D

The ability to extract and track the positions of the human body is a remarkable feature of the Kinect, but how do you use it? It seems more complicated than the more basic video and depth outputs. Fortunately, once you understand the structure of the data returned, it isn't much more involved. We look at the simplest possible example.

Practical Windows Kinect in C#
Chapter List

  1. Introduction to Kinect
  2. Getting started with Microsoft Kinect SDK 1
  3. Using the Depth Sensor
  4. The Player Index
  5. Depth and Video Space
  6. Skeletons
  7. The Full Skeleton
  8. A 3D Point Cloud

In this chapter, we make explicit use of the basic information described in chapter two, in particular the how to display a video image and some of the ideas in chapter 5 on converting between co-ordinate systems.

Working with skeletons

One of the big attractions of the Kinect is that it not only provides raw video and depth data, but also the way that it processes it to produce player indexes - see chapter 4  - or even complete skeletons of each player.

Using the skeleton engine seems more difficult than using the other facilities simply because what is detected is a skeleton - an apparently complex data structure. In fact calling it a skeleton is most of the problem. As will become clearly it is in fact much simpler than you might think. 

To example how it all works we are going to construct the simplest possible example. There are a number of example programs that show you how to display a complete skeleton complete with different color coding for different limbs. This is impressive but it doesn't make it easy to see what the operating principles are. The example program in this article does just one thing - it tracks a players head. This might not seem as impressive but it is easy to follow and once you can track a head the rest of the body, the complete skeleton becomes easy.

First the video

We first need to construct a program that displays the video from the camera so that we can mark the position of the player's head. This just the basic video display program that was introduced in Chapter 2, so if you need detailed explanations of how it all works read that chapter.

Start a new C# Windows Forms project. A WPF-based project would be more or less the same, apart from the way way the bitmap was processed for display.

For simplicity we will use Windows Forms.

Make sure you have loaded a reference to the Kinect DLL

Microsoft.Kinect.dll. 

and add:

using Microsoft.Kinect;

To the start of the program. In the Form's constructor we create a Runtime object so that we can use the Kinect:

public Form1()
{
InitializeComponent();
sensor = KinectSensor.KinectSensors[0];
}

The sensor variable is global allowing us to get a the Kinect from anywhere in the program - not good design but simpler for an example.

KinectSensor sensor;

Next you have to initialize it to use video and the depth camera and to use SkeletalTracking :

sensor.ColorStream.Enable(
ColorImageFormat.RgbResolution640x480Fps30);
sensor.DepthStream.Enable(
DepthImageFormat.Resolution320x240Fps30);
sensor.SkeletonStream.Enable();

The new AllFramesReady event can be used to trigger code when all of the frame types you hve requested are ready to be processed. So we can simply use a single event handler:

sensor.AllFramesReady += FramesReady;

Finally we can set the sensor running:

 sensor.Start();
}

Processing the Video data - GDI

When the FramesReady event handler is called both the depth and the video frame are ready to be processed:

void FramesReady(object sender,
 AllFramesReadyEventArgs e)
{

First we retrive the video data. The idea is that we are going to mark the location of the head on the video data as the tracking follows the player around the frame.

ColorImageFrame VFrame = e.OpenColorImageFrame();
if (VFrame == null) return;
byte[] pixeldata =
new byte[VFrame.PixelDataLength];

We are going to want to draw a cross on the video data. In the previous chapters we have just used direct manipulation of the bit array to set pixels. This is fine when you only want to work with a few pixels and it has the advantage of not involving any other objects.  However once you need to start drawing lines to form a skeleton then things are too difficult to work with via direct manipulation.

At this point you have to use whatever graphics facilities the framework you are using provides. The problem is that there is a split between Windows Forms and WPF. For this example we are going to use Windows Forms and the GDI because it is closer to the same facility in C++. In the next chapter we will look at using WPF graphics.