Getting started with Microsoft Kinect SDK - Depth and Video space
Written by Mike James   
Monday, 01 August 2011
Article Index
Getting started with Microsoft Kinect SDK - Depth and Video space
Problems with masks

The Kinect has two cameras - one video and one depth - and they have slightly different viewpoints on the world. Relating them together is the subject of this Kinect SDK article and we create a background remover along the way.

UPDATE: A new version of the entire series for SDK  1.0 is being prepared and should be published soon.

The first part is Getting started with Windows Kinect SDK 1.0

If you don't want to miss it subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook or sign up for our weekly newsletter.

Other Articles in this Series

  1. Getting started with Microsoft Kinect SDK
  2. Depth
  3. Player index
  4. Depth and Video space (this article)
  5. Skeletons
  6. The Full Skeleton

This is the fourth installment of a series on getting started with the Kinect SDK. In Part 1 we covered the initial steps of how to power your Kinect, downloading and installing the SDK, and using the video camera, Part 2 we looked working with the raw depth data, in Part 3 we used the player index. In this part we take a break from looking at how to extract data from the cameras and concentrate on how to relate data from the depth camera to the video camera.

The Kinect has two cameras, one video and one depth. In both cases they return a bitmap corresponding to what they see. In the case of the video camera the bitmap is usually 640x480 and for the depth map it is 320x240, i.e. exactly half the resolution. A very standard requirement is to pick out video pixels that correspond to particular depth pixels. This sounds fairly easy as you might think that the pixel at x,y in the depth image corresponds to the four pixels at 2x,2y in the video image. Unfortunately this simple mapping doesn't work and this can be the cause of much wasted time in trying figure out why your perfectly reasonable program doesn't quite do what you expect. Fortunately, the solution is fairly easy - once you know how.

As a simple example of using the data from the depth camera with the video camera we construct a demo of how to use the player index (see Part 3) to create masks that can be used to remove the background. from a user's image.

Getting started

If you don't know how to setup or the basic processes of getting data from the Kinect you need to read these two article first. In this article it is assumed you know how to get started. Also it is assumed that you know how the depth plus player index data is processed as explained in part three.

Start a new C# Windows forms project.

The program starts off in the usual way with the creation of a Runtime object:
nui = Runtime.Kinects[0];

Next you have to initialize it to user the depth camera and to use SkeletalTracking, video  and depth and player index:

RuntimeOptions.UseDepthAndPlayerIndex |
 RuntimeOptions.UseSkeletalTracking |

Next you need to initialize the streams:


And finally we setup the event handlers to process the frames as they become ready:

nui.DepthFrameReady += new EventHandler
nui.VideoFrameReady += new EventHandler

The depth image

In this case we are going to use the depth image, specifically the player index to derive a mask that can be applied to the video stream. This causes something of a problem in that we need both the depth image and the video image to complete the process. The simplest thing to do is to store the most up-to-date depth image and process everything in the video stream's event handler.

void nui_DepthFrameReady(
object sender,
ImageFrameReadyEventArgs e)
depthimage = e.ImageFrame.Image;

Of course you also need to remember to define the depthimage variable ready to be used:

PlanarImage depthimage;

A Simple mask

The video stream event handler starts off simply enough. All we have to do is return if there isn't a depthimage to process:

void nui_ColorFrameReady(
object sender,
ImageFrameReadyEventArgs e)
if (depthimage.Bits == null) return;

Next we need to retrive the video Frame and Image:

ImageFrame VFrame=e.ImageFrame;
PlanarImage VImage = VFrame.Image;

In the first version of building  a mask we will assume that the mapping between depth and video images is simple and just a matter of x-> 2x and y->2y. This will give you some idea of why this doesn't work.

First we need a flag that can be used for the player index:

byte player;

Next we scan row by row the depth image and extract the player index as explained in part three - Getting started with Microsoft Kinect SDK - Player index.

for (int y = 0;
 y < depthimage.Height; y++)
for (int x = 0;
x < depthimage.Width; x++)
player = (byte)(depthimage.Bits[
indexOfPixelinBytes(x, y,
depthimage.BytesPerPixel)] & 0x07);

Now we have the player index in the player variable. Notice that to find the location in the byte array corresponding to the pixel at x,y the function indexOfPixelinBytes has been used rather than an explicit mapping function as used in previous parts. This function is just:

int indexOfPixelinBytes(int x, int y,
int width, int bpp)
return (x + y * width) * bpp;

To convert the x,y coordinates into the location in byte array of the first of the two bytes associated with a pixel you would use:

indexOfPixelinBytes(x, y, 

Now that we have the player index at the pixel at x,y we can covert it into either zero or all ones i.e. 0xFF depending on whether or not ther is a player - any index other than zero - at the location:

if (player != 0) player = 0xFF;

Last Updated ( Monday, 06 February 2012 )