Page 1 of 3
In the third chapter of our e-book for Version 1.0 of the Kinect SDK together with the Windows version of the Kinect hardware, we look at how to use the raw depth data that the Kinect provides to display and analyze a scene and create a live histogram of depth.
Practical Windows Kinect in C#
- Introduction to Kinect
- Getting started with Microsoft Kinect SDK 1
- Using the Depth Sensor
- The Player Index
- Depth and Video Space
- The Full Skeleton
- A 3D Point Cloud
In the previous chapter, Getting started with Windows Kinect SDK 1.0, we covered how install the SDK, and using the video camera.
If you haven't completed any of these steps then at least skim them. In addition many of the basic operations of working with the Kinect via the SDK are the same no matter if you are working with the video or depth camera.
So while we will go over some of the same ground see Chapter Two for a detailed explanation of the initial steps.
The raw depth
The first thing to say is that the Kinect does a lot of processing of the depth data to give you more than a basic image that tells how far away each pixel in the image is.
It can take the depth field and label each pixel with which "player" it is part of as well as performing a complete skeletonization to show where the players limbs are. In this example we are going to concentrate on working with the basic depth data.
The basic depth data is important in building any really useful Kinect application. If you are going to write a program that detects your pet or steer a robot, then the raw depth is what you need to work with.
Start a new project - it doesn't matter really if it is a Windows Forms or WPF project but to make things easier let's start with Windows Forms. Information about how to make a WPF version work is included as we go on.
Note: if you have used an earlier Beta to create a project make sure you remove the reference to the old DLL.
Next you need to include a reference to
To avoid having to type fully qualified names add:
to the start of the code.
To get at the raw depth information you first have to create an instance of the KinectSensor class, initialize it to use the type of camera and open a data stream of the correct type. In this case we have:
sensor = KinectSensor.KinectSensors;
As in the case of the video or any Kinect stream the next thing to do is to set up an event handler that will be called when a frame of the appropriate type is ready. In this case we just need a single event handler:
sensor.DepthFrameReady += DepthFrameReady;
Now we are ready to start processing the data and this is a matter of writing the DepthFrameReady event handler. We also have to remember to start the sensor working using:
Now the tricky work begins.
Setting up the Kinect to deliver the data is easy but converting the data to the correct format to be useful is often hard.
As with the video data the raw depth data is packaged in the event argument. To get at it, you need to go through a set of standard steps which are more or less the same for each type of data. In this case:
- use the OpenDepthImageFrame method to retrieve a DepthImageFrame object.
- use the DepthImageFrame object's CopyPixelDataTo method to retrieve a short array of pixel data.
The DepthImageFrame also contains some useful properties and methods that tell you about the size and other properties of the data.
Also notice that the data is presented as an array of short integers, i.e. each element is 16 bits, which is different from the byte array used in the VideoImageFrame. In the beta version of the SDK, a byte array was used for all raw data so this is an important change.
In the case of the depth data, the format is just an array of short, i.e. 16-bit, pixel values stored in row order without any padding.
That is, if you want to get the data at x,y in the image this is stored in the 16-bit element:
The high 13 bits of the 16-bit value give you the measured distance in millimeters. The first three bits code for the player that the device has identified, but this is only activated if skeleton tracking is enabled. If you don't enable skeleton tracking, the three bits are set to zero.
You can set the operating range for the Windows Kinect using the DepthImageStream's Range property to either Default or Near. If you are using a modified XBOX Kinect then you can only use Default.
The fact that the result is returned as a short causes some interesting problems, but you can use some bit manipulation to convert it into any form you require. The only thing you have to take care over is the fact that short is a signed representation. This means that if you have a value like 0xfff8, the maximum 13-bit distance, than this will be treated as -8 in signed representation.
If you want to treat the value as positive then you have to perform a logical, not arithmetic, shift right to move the top 13 bits down into the correct position. If you do an arithmetic shift using the >> operator you will find that you get the result -1, i.e 0xffff. To perform a logical shift you have to first cast to ushort as in:
which gives the result 8191, i.e 0x1fff, the maximum distance.
It would have been much simpler if the SDK returned a ushort, not a standard short, value.
In many cases you want to convert the depth value into a standard int.
Suppose you want the distance of the pixel in the middle of the image. i.e.:
void DepthFrameReady(object sender,
DepthImageFrame imageFrame = e.OpenDepthImageFrame();
if (imageFrame != null)
short pixelData = new short[imageFrame.PixelDataLength];
int x = imageFrame.Width / 2;
int y = imageFrame.Height / 2;
int d = (ushort) pixelData[x + y * imageFrame.Width];
Of course the obvious thing to do is to package this as a function:
int getValue(DepthImageFrame imageFrame, int x, int y)
short pixelData = new short[imageFrame.PixelDataLength];
pixelData[x + y * imageFrame.Width])>>3 ;
The only change is the way the cast and the shift have been combined on a single. Using this function you can easily write a small rangefinder program that shows the distance of any selected object in the frame.
However it is important to note that this is not an efficient function. It retrieves the entire pixel data each time a single value is required. The correct way to do the job is to retrieve the short array of pixel values just once per frame.