Page 1 of 3
How to use the raw depth data that the Kinect provides to display and analyze a scene and create a live histogram of depth.
UPDATE: A new version of the entire series for SDK 1.0 is being prepared and should be published soon.
The first part is Getting started with Windows Kinect SDK 1.0
If you don't want to miss it subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook or sign up for our weekly newsletter.
Other Articles in this Series
- Getting started with Microsoft Kinect SDK
- Depth (This article)
- Player index
- Depth and Video space
- The Full Skeleton
This is the second installment of a series on getting started with the Kinect SDK. In this part we look at the basic use of the depth camera.
In the previous article, Getting started with Microsoft Kinect SDK, we covered how to power your Kinect, downloading and installing the SDK, and using the video camera. If you haven't completed any of these steps then at least skim read it. In addition many of the basic operation of working with the Kinect via the SDK are the same no matter if you are working with the video or depth camera. So while we will go over some of the same ground see Part One for a detailed explanation of the initial steps.
Note: this tutorial uses Beta 2 of the SDK
The raw depth
The first thing to say is that the Kinect does a lot of processing of the depth data to give you more than a basic image that tells how far away each pixel in the image is. It can take the depth field and label each pixel with which "player" it is part of as well as performing a complete skeletonization to show where the players limbs are. In this example we are going to concentrate on working with the basic depth data. The reason for this is that it is the basic depth data that is most useful in building any really novel used of the Kinect camera. If you are going to write a program that detects your pet or steer a robot then the raw depth is what you need to work with.
To get at the raw depth information you first have to create an instance of the RunTime class, initialize it to use the type of camera and open a data stream of the correct type. In this case we have:
Runtime nui = Runtime.Kinects ;
You can initialize more than one type of camera at a time simply by writing the options you want separated by | (vertical bar) in a single Initialize call. You can also open multiple data streams for different types by using more than one Open call. In this case we will attempt to keep things simple. All of the parameters used in the Open method should be self explanatory apart from the "2" which gives the number of frames that are buffered if your program doesn't process them fast enough.
As in the case of the video or any Kinect stream the next thing to do is to set up an event handler that will be called when a frame of the appropriate type is ready. In this case we just need a single event handler:
Now we are already to start processing the data and this is a matter of writing the DepthFrameReady event handler.
Now the tricky work begins. Setting up the Kinect to deliver the data is easy but converting the data to the correct format to be useful is often hard.
The data is returned to teh event handler as the ImageFrame.Image property of the event arguments as a PlanarImage structure. This is essentially just a Byte array plus some other useful fields. In the case of the depth data the format is just an array of two byte pixel values stored in row order. That is if you want to get the data at x,y in the image this is stored in the two bytes:
with the low order byte stored in the first array element. The first 13 bits of the 16-bit value give you the distance in millimeters.
You can use these data mapping functions to access the distance as as a 16-bit number using a little-bit manipulation. Suppose you want the distance of the pixel in the middle of the image i.e.:
PlanarImage PImage = e.ImageFrame.Image;
int x = PImage.Width / 2;
int y = PImage.Height / 2;
Then first we need the high order byte:
int d = PImage.Bits[x * 2 +
y * PImage.Width * 2+1];
then we shift it up 8 bits to become the high byte of the 16-bit word:
d = d << 8;
Finally we add the low order byte to give the final value:
d =d+ PImage.Bits[x * 2 +
y * PImage.Width * 2 ];
Of course the obvious thing to do is to package this as a function:
int getValue(PlanarImage PImage,
int x, int y)
int d = PImage.Bits[x *
y * PImage.Width *
PImage.BytesPerPixel + 1];
d <<= 8;
d += PImage.Bits[x *
y * PImage.Width *
The only changes are to use <<= and += to make the expressions simpler and the BytesPerPixel property to make the routine more general. This will return the value of the pixel at x,y in any PlanarImage object not just depth maps. Using this function you can easily write a small rangefinder program that shows the distance of any selected object in the frame.