Getting started with Microsoft Kinect SDK - Skeletons
Written by Mike James   
Monday, 19 December 2011
Article Index
Getting started with Microsoft Kinect SDK - Skeletons
Depth image

The ability to extract and track the positions of the human body is a remarkable feature of the Kinect, but how do you use it? It seems more complicated than the more basic video and depth outputs. Fortunately, once you understand the structure of the data returned, it isn't much more involved. We look at the simplest possible example.

UPDATE: A new version of the entire series for SDK 1.0 is being prepared and should be published soon.

The first part is Getting started with Windows Kinect SDK 1.0

If you don't want to miss it subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook or sign up for our weekly newsletter.

Other Articles in this Series

  1. Getting started with Microsoft Kinect SDK
  2. Depth
  3. Player index
  4. Depth and Video space
  5. Skeletons (this article)
  6. The Full Skeleton

This is the fifth installment of a series on getting started with the Kinect SDK. In Part 1 we covered the initial steps of how to power your Kinect, downloading and installing the SDK, and using the video camera, Part 2 we looked working with the raw depth data, in Part 3 we used the player index. In this part we take a break from looking at how to extract data from the cameras and concentrate on how to relate data from the depth camera to the video camera.4 looked at the problem of relating the different co-ordinate systems used by the depth and video cameras.

In this part, we make explicit use of the basic information in Part 1, in particular the how to display a video image and some of the ideas in Part 4 on converting between co-ordinate systems.

Working with skeletons

One of the big attractions of the Kinect is that it not only provides raw video and depth data, but also the way that it processes it to produce player indexes - see Part 3 of this series - or even complete skeletons of each player.

Using the skeleton engine seems more difficult than using the other facilities simply because what is detected is a skeleton - an apparently complex data structure. In fact calling it a skeleton is most of the problem. As will become clearly it is in fact much simpler than you might think. 

To example how it all works we are going to construct the simplest possible example. There are a number of example programs that show you how to display a complete skeleton complete with different color coding for different limbs. This is impressive but it doesn't make it easy to see what the operating principles are. The example program in this article does just one thing - it tracks a players head. This might not seem as impressive but it is easy to follow and once you can track a head the rest of the body, the complete skeleton becomes easy.

First the video

We first need to construct a program that displays the video from the camera so that we can mark the position of the player's head. This just the basic video display program that was introduced in Part 1 of this series so if you need detailed explanations of how it all works read the first part.

Start a new C# Windows Forms project. A WPF-based project would be more or less the same, apart from the way way the bitmap was processed for display. For simplicity we will use Windows Forms.

Make sure you have loaded a reference to the Kinect DLL and add:

using Microsoft.Research.Kinect.Nui;

To the start of the program. In the Form's constructor we create a Runtime object so that we can use the Kinect:

public Form1()
nui = Runtime.Kinects[0];

The nui variable is global allowing us to get a the Kinect from anywhere in the program - not good design but simpler for an example.

Runtime nui;

Next we have to setup the Kinect and this is more or less the same for each program, differing in only the facilities we are going to use. In this case we make use of all of the facilities:

RuntimeOptions.UseDepthAndPlayerIndex |
 RuntimeOptions.UseSkeletalTracking |

Next we open the video stream:

ImageStreamType.Video, 2,

The Kinect will now return video images from the video camera and skeleton data. To process this data the simplest thing to do is use the two events associated with data ready in each case:

nui.SkeletonFrameReady += new 
nui.VideoFrameReady += new

Now to display the image we simply need to write the code for the FrameReady event handler. However we are going to want to modify the video returned from the camera by adding a small cross at the position of the players head. To do this we will store the video frame in a global variable and allow the SkeletonFrameReady event handler to actually do the displaying of the video frame.

So the FrameReady method is:

void FrameReady(object sender, 
ImageFrameReadyEventArgs e)
videoimage= e.ImageFrame.Image;

The videoimage variable is just a global PlanarImage:

PlanarImage videoimage;

Now we need to define the SkeletonFrameReady event handler to simply show the video.  It has to check first to make sure that an image has been stored in the videoimage:

void SkeletonFrameReady(object sender,
 SkeletonFrameReadyEventArgs e)
if (videoimage.Bits == null) return;

We can't display a PlanarImage in, say, a PictureBox unless we first convert it to an Image object. How to do this was covered in Part 1 of this series, so the method that does the job PImageToBitmap is simply quoted:

Bitmap PImageToBitmap(PlanarImage PImage)
Bitmap bmap = new Bitmap(
BitmapData bmapdata = bmap.LockBits(
new Rectangle(0, 0, PImage.Width,
IntPtr ptr = bmapdata.Scan0;
PImage.Width *
PImage.BytesPerPixel *
return bmap;

For an explanation of how this works see Part 1 - but essentially what it does it take a PlanarImage and returns an equivalent Bitmap object.

Last Updated ( Monday, 06 February 2012 )