Kinect SDK 1 - Depth and Video Space
Written by Mike James   
Article Index
Kinect SDK 1 - Depth and Video Space
Problems with masks
Converting from depth to video


Using the player index

First we need a flag that can be used for the player index and two variables to store the video co-ordinates:

byte player;
int vx, vy;

Next we scan row by row the depth image and extract the player index as explained in the previous chapters. As the depth data is stored in a short array, the data for the pixel at x,y is stored in element:


All we have to do is to use two for loops to scan the array and compute the player index at each pixel:


for (int y = 0; y < DFrame.Height; y++)
 for (int x = 0; x < DFrame.Width; x++)
player = (byte)(depthimage[
x + y * DFrame.Width] &

This is just the method used in the previous chapter applied to each pixel in turn.

Now that we have the player index at the pixel at x,y we can covert it into either zero or all ones i.e. 0xFF to use as a byte mask  depending on whether or not ther is a player - any index other than zero - at the location:

if (player != 0) player = 0xFF;

At this point we need to use player as a mask that will set all pixels that don't correspond to a player to zero. First we need to convert from depth space to video space:

vx = x*2;
vy = y*2;

Next we need to mask out the pixel using the player mask. Ignoring the details of retrieving the pixel values this can be done with a simple logical operation;

pixel at vx,vy = pixel at vx,vy & player;

or more succinctly

pixel at vx,vy &= player;

There is a small problem here in that the pixel in the depth image at x,y corresponds to four pixels in the video image;  

(vx,vy) (vx+1,vy) (vx,vy+1)(vx+1,vy+1). 

The reason for this is simple - the video image has twice the resolution of the depth image and each pixel in the depth image corresponds to four in the video image.

Also each pixel corresponds to four bytes in the array and the pixels are stored in row order. Thus the two pixels at vx,vy and vx+1,vy are stored next to each other i.e. the eight pixels starting at vx,vy,

If you think about this for a moment it should be obvious that we need to process the two sets of eight bytes one starting at 2x,2y and the other at 2x,2y+1:

for (int k = 0; k < 8; k++)
pixeldata[(vx + vy*VFrame.Width)*
VFrame.BytesPerPixel + k] &= player;
pixeldata[(vx  + (vy +1) *
VFrame.Width) * VFrame.BytesPerPixel
 + k] &= player;

The first line ands the player mask with the eight bytes in the row corresponding to vy and the second processes the eight bytes corresponding to vy+1.

Finally, when the for loops are complete, we can display the result in a PictureBox

 pictureBox1.Image = ByteToBitmap(pixeldata,
 VFrame.Width, VFrame.Height);

The ByteToBitmap function is a small modification to the similar functions used in earlier chapters:

Bitmap ByteToBitmap(Byte[] pixeldata, 
int w, int h)
Bitmap bmap = new Bitmap(w, h,
BitmapData bmapdata = bmap.LockBits(
new Rectangle(0, 0, w, h),
IntPtr ptr = bmapdata.Scan0;
Marshal.Copy(pixeldata, 0, ptr,
return bmap;

If you try this out you will discover that it does work - sort of. The area of the video image that is masked out does correspond to the shape of a player but it is shifted to one side and the shift varies as you move closer or further away.

What you are observing is a depth parallax effect identical to the two views you get from two separated cameras as used for 3D imaging.

Clearly we need to make the connection between the pixels in the video and depth image in a more sophisticated way.