Intro to Kinect + Processing

Intro to the Kinect & Installation

The Microsoft Kinect sensor is a peripheral device designed for the Xbox and Windows PC’s that functions much like a standard webcam and can be used as such. However, in addition to providing a standard RGB image, it also provides a depth map using the infrared sensor. This means that for every pixel that the Kinect sensor detects, the Kinect also measures the distance of that pixel in relation to the sensor. In contrast to a standard webcam, this makes a variety of computer vision problems like background removal, blob detection easier.

The Kinect sensor itself only measures color and depth. So the first step is to get this data into your computer and understand how to work with it. Once this is accomplished – lots more can be done in an environment like Processing: i.e.Skeleton Tracking using external libraries.

Hardware

First you need a stand-alone kinect, and a mac or pc with the required specifications. You do not need to buy an Xbox.

  • Standalone Kinect Sensor v.1414 . It comes with a power supply and a usb adapter + cables. This version of the Kinect works with usb 2.0 or greater.
  • Standalone Kinect Sensor v2. When you buy the stand alone version, you need to make sure to also buy the Kinect Adapator – to acquire the required power supply and usb adapter + cables. Note: your computer must have support for usb 3.0 interface.

Kinect V1 specs :
The Kinect (v.1414) is a stereo camera (actually triple camera including the IR sensor) that has some pretty sophisticated firmware algorithms that can spit out a wide variety of depth and motion tracking data.
Hardware specs:
Array of 4 microphones supporting single speaker voice recognition
Depth Camera 640×480 pixel resolution @30FPS
Color VGA Motion Camera 640×480 pixel resolution @30FPS

Kinect V2 specs :
Functions in the same way as the Kinect V1 but has enhanced features and capabilites including:

  • Improved body tracking
  • Higher depth fidelity
  • 1080p HD video
  • Unity Pro support
  • New active infrared capabilities

Software

If you are working on a mac – you will need to download and install specific drivers (dependent on the Kinect version you are using) in order for the machine to be able to even recognize the Kinect. Lets start with the easier option 🙂 – working on Windows with the Kinect v2:

  • If using the Kinect V2: download and install the native sdk
  • Install Processing 3+ on the machine and download one of the following external libraries for Processing: Thomas Lengling’s windows-only Kinect v2 processing library or Daniel Schiffman’s OpenKinect library.

To get the Kinect working on a mac (regardless of the version) – you should follow the first part of this tutorial.

  • Install XCode + command line tools
  • Install MacPorts
  • Install CMake (needing for compiling the next set of tools)
  • Install Libtool and Libusb
  • Now: if you are using the kinect v1414 download libfreenect. If using the Kinect v2 then download libfreenect2.

At this point – you can use tools that came with the installation process above to test that the mac recognizes the kinect. Depending on what your requirements are, and which Kinect you are using, the setup with Processing will differ:

For Kinect V1 + SimpleOpenNi:
You will need to continue with the tutorial above and install:

  • OpenNi
  • SensorKinect
  • PrimeSenseNite
  • Processing v2 (not supported in Processing 3)
  • SimpleOpenNi external library

Phew! – Please note that the Computation lab has available 2 mac minis which have all of this software installed and ready to go. So, if you are unwilling to follow the installation process – please come and speak to the Computation Lab Coordinator.

For Kinect V1 + Open Kinect:
If you are wanting to use the Kinect without the skeleton tracking and gesture recognition capabilites you may just want to use Daniel Schiffman’s OpenKinect Library.

For Kinect V2 + Open Kinect:
Daniel Schiffman’s OpenKinect Library also works with the Kinect v2 on the mac. However, unfortunately SimpleOpenNi does not – so if you are looking for access to more advanced capabilities (i.e. skeleton tracking) – it is suggested that you work on a windows pc with the Kinect V2.

Setting up the first Processing Sketch

For the purposes of this tutorial we are using Kinect v1414, the SimpleOpenNi library, Processing v2 and a mac.

Alright, lets open Processing up and build something!

Setup our and run our first sketch

First, you want to import your kinect libraries and create a reference object to your kinect:

1
2
//Import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;

Then, make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)

1
SimpleOpenNI kinect;

Declare two PImage variables to hold and display the pixel data from the Kinect and define the void setup() and void draw()

1
2
3
4
5
6
7
8
9
10
11
PImage depthCam;
PImage colorCam;
void setup()
{
    // our setup code
}
 
void draw()
{
   // our looping code
}

Now, in your setup() – you need to declare the size of your sketch. The kinect v1 will output for both the RGB and Depth Camera an image size of 640 *480. For the purposes of this tutorial we will display both the depth and RGB image – so we need to declare a size of 640*2 by 480.

1
2
3
4
5
6
7
void setup()
{
 
  size(640*2, 480);
  // set a background color
  background(0);
}

Finally, in the setup, we need to make an instance of the SimpleOpenNI object and invoke the appropriate methods to allow access to both the RGB and Depth camera.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void setup()
{
 
  size(640*2, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 // invoke the method from the lib to allow access to the depth camera
 kinect.enableDepth();
 //invoke the method from the lib to allow access to the color camera
 kinect.enableRGB();
}

Ok – we have access to the cameras from the Kinect, so now we want to retrieve the data from the Kinect and display the output in the PImage containers that we previously defined.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var  (using the lib)
  depthCam = kinect.depthImage();
  // get the rgb image and assign to the PImage var (using the lib)
  colorCam = kinect.rgbImage();
 
  //display depth image
  image(depthCam, 0, 0);
  //display color image
  image(colorCam, 640, 0);
}

Ok – we can now run the sketch – Note: you may need to restart the sktech a couple of times …

Notes regarding this example

  • A single image is made up of pixels (smallest unit) arranged as a (grid)
  • The grid has a specified number of columns (the width) and rows (the height)
  • Each pixel is a single solid unit of color (Processing supports 8 bit color), depending on the type of image:
    • in a black and white image -> pixels are grey scale 0->255
    • in a color image -> rgb (0,255) for each red, green and blue component. Each component ranges from black (0) to full intensity for that color (255)
  • The RGB camera output consists of a grid of pixels, where each pixel has an r,g and b value
  • The depth camera output consists of a grid of pixels, where each pixel has a grey scale value. However – these values do not just represent a particular value of grey, but also the distance value of that pixel in relation to the kinect -> where 0 is very far away and 255 is close.
  • Constraints associated with the resultant depth image:
    • min range: the depth camera has a min range of 20 inches. Any closer than that and the Kinect cannot accurately calculate distances: it will give a depth val of 0
    • noise at the edges : If you look at moving depth image, you may see splotches (holes of black), appearing and disappearing at objects edges (should be grey). The Kinect can only calculate depth where the dots from its infrared projector are reflected back to it. So, the edges of objects may deflect some of the dots away at odd angles ( and then the kinect cannot pick them up as they are not sent back) -smoothing algorithms are needed
    • reflections (i.e. from mirrors, windows ) can cause distortions
    • Occlusion problems: The Kinect cannot see through or around objects, so, there will always be parts of the scene that are occluded or blocked from view. Therefore, there will be parts of the scene that we don’t have any depth data associated (and will result as black holes). What parts of the scene will be occluded is determined by the position and angle of the Kinect relative to the objects in the scene.
    • Misalignment between depth and color camera: The two cameras do not capture the frame from the exact same point of view. The two cameras on the Kinect are seperated from eachother by a couple of inches and so the point of view is slightly different (like difference between 2 eyes). So – if you need both you will need to align the images (i.e.for overlaying).

Analyzing the pixels in the Image from the RGB camera

This next example will demonstrate how we can access a single pixel in the color image (using the mousePressed callback) – as well as extracting the color values for that pixel.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage colorCam;
 
//3:: setup
void setup()
{
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the color camera
  kinect.enableRGB();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the rgb image and assign to the PImage var (using the lib)
  colorCam = kinect.rgbImage();
  //display color image
  image(colorCam, 0, 0);
}
 
// lets access a pixel in the image -> based on where we press the mouse ... 
void mousePressed()
{
  // in order to access the pixels: Processing has a helpful structure: the pixels array::
  // every PImage object has its own pixels array which gets fillesd when we instatiate it with image data
  // the pixels array is 1 Dimensional-> 640*480 = an array with 307,200 elements indexed from 0-307,199
  // so to access the left most pixel : = colorCam.pixels[0] and to access the bottom right = colorCam.pixels[307199].
 
  // if we have mouseX and mouseY we can calculate the location in the 1D array by loc =x+y*width;
  int loc = mouseX+(mouseY*colorCam.width);
  //lets get the pixel:
  color currentLocColor = colorCam.pixels[loc];
  // extract the single color components of the pixel using the inbuilt methods from Processing
  println("pixel location in array:: "+ loc);
  println("r:: "+red(currentLocColor)+ " g:: "+ green(currentLocColor)+ " b:: "+ blue(currentLocColor));
 
}

The only real difference in this example is that we are now ONLY capturing the RGB image (@ 30 FPS) and we have added the code to access the pixel (wherever the mouse is Pressed) using the inbuilt pixels array.

When you click around the resulting image – you extract the r,g,b value associated. However, are these color values what we think they should be? No – what we think as being green or blue or any color is relative to the human eye, whereas the computer may look at a green object and its green values may not be 255 (what we think of as pure green), and it may have higher reds and blue values then what we expect.

  • Brightness :: in a color image this is the sum total of the r,g,b values. If all are 255 the resultant color is white (lightest), and if all are 0 then we have black (darkest).
  • Color :: in a color image this comes from there being a difference difference between the individual color components. The more equal the components are, the closer we will be to a shade of grey.
  • Given the relationship between brightness and color – it is very difficult to analyze the color of something in darkness /low light (i.e. a red object in darkness will have a very low r value).
  • The digital image also tends to have higher reds and blues then what the human eye perceives – and these color shifts will also have an impact when we try to analyze the image.

In short – conventional color images are ill suited to being processed, i.e. how to determine the “true” color of an object as seen by human eye and determined by the mind. Heuristics and sampling would be needed.Further more, if we wanted to determine distances – where an object begins and ends WILL NOT be solved easily using conventional color images -> so we need an alternative
more efficient solution … ENTER the depth image…

Analyzing the pixels in the Image from the Depth camera

This next example will demonstrate how we can access a single pixel in the depth image (using the mousePressed callback) – as well as extracting the associated for that pixel.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//3:: setup
void setup()
{
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
  //display depth image - give borders arround
  image(depthCam, 0, 0);
}
 
// lets access a pixel in the image -> based on where we press the mouse ... 
void mousePressed()
{
  // in order to access the pixels: Processing has a helpful structure: the pixels array::
  // every PImage object has its own pixels array which gets fillesd when we instatiate it with image data
  // the pixels array is 1 Dimensional-> 640*480 = an array with 307,200 elements indexed from 0-307,199
  // so to access the left most pixel : = depthCam.pixels[0] and to access the bottom right = depthCam.pixels[307199].
 
  // if we have mouseX and mouseY we can calculate the location in the 1D array by loc =x+y*width;
  int loc = mouseX+(mouseY*depthCam.width);
  //lets get the pixel:
  color currentLocColor = depthCam.pixels[loc];
  // extract the single color components of the pixel using the inbuilt methods from Processing
  println("pixel location in array:: "+ loc);
  println("r:: "+red(currentLocColor)+ " g:: "+ green(currentLocColor)+ " b:: "+ blue(currentLocColor));
 
}

What doe we notice?
In contrast to the color image: all component values are the same for a given pixel.
Since the depth image is greyscale,its pixels don’t have color (the difference between the single components is 0).

But a grayscale pixel DOES have brightness value:

Note how when clicking on the color image – values may not change that much depending on the distance of the object
from the Kinect or from another object – but with the depth image, the values do change depending on the distance the object /part of an object is from the kinect (255 == very close and 0 is far away) -> we can determine spatial measurements 🙂

We will see now that we can achieve some quite complex and effective applications by just considering the brightness values given to us from the depth image.

Using the Raw Depth Values

The depth image gives us the brightness of the pixel which we know is indicative to the
distance a pixel is from the kinect (255==very close -> 0 == far).

But HOW far away is a pixel with brightness values of 90, 112, 255?
And HOW can we map the brightness values to actual distance measurements?
Is it the case that pixels with a value of 255 correspond to 20 inches and pixels with a value of 0 correspond to 25 feet (max range)?For example: if a pixel has the value 96 and 96/255 equals.37. Is this pixel 37% between 20 inches and 25 feet == approx 14.5 ft? Not really.

The relationship between brightness value and actual distance it represents is a little more complicated
then a simple ratio. The Kinect depth readings have mm precision and they need to report not just a number ranging from 0 to a few hundred inches, rather a number from 0 to approx 8000 mm , which is a much larger range then what can fit in the bit depth range (0-255) from the depth image.

In actual fact, the Kinect captures the raw depth values at a HIGHER resolution. -> 11 bits (0-2047)
Recall, all images in Processing have 8 bit depth -> (human eye would not really see the higher res differences), and so when rendering the depth image we down sample the raw values.

If we want to access the data for computation then the more accurate values are more helpful.

The following example will have the same functionality as before – but use the raw values:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
  //display depth image - give borders arround
  image(depthCam, 0, 0);
}
 
// lets access a pixel in the image -> based on where we press the mouse ... 
void mousePressed()
{
  //NEW:: let us access the higher resolution, raw depth values from the kinect using the mehod from
  //the library:: the size of this array is again 640*480 elements, but insead of having values from 0 to 255
  // we have values from 0-2047.
 int [] depthValues = kinect.depthMap();
  // if we have mouseX and mouseY we can calculate the location in the 1D array by loc =x+y*width;
  int loc = mouseX+(mouseY*depthCam.width);
  //lets get the pixel:
  color currentLocColor = depthCam.pixels[loc];
  // extract the single color components of the pixel using the inbuilt methods from Processing
  println("pixel location in array from depth image:: "+ loc);
  println("r:: "+red(currentLocColor)+ " g:: "+ green(currentLocColor)+ " b:: "+ blue(currentLocColor));
 // in contrast:: lets print out the value from the depth MAP... NOTE the inversion--> 0 is close 8000 is far
  // why are numbers larger then 0-2047? -> beacuse we are receiving the converted raw depth values into mm 
  //(converted directly by the SimpleOpenNI library)...  
  println("value from depth MAP:: "+ depthValues[loc]);
  //to convert to inches /25.4
  println("inches from kinect:: "+ depthValues[loc]/25.4);
 
}

We can see from the data given by the depth map that now we get different values –> around 450 for brightest and around 8000 for darkest (note the INVERSION) – smaller values are closer and larger values are further away.

Note:: The SimpleOpenNI library will automatically (using a standard formula) give you the raw value in mm (converted from 0-2047).
As seen in the example – we can then take this value at convert it to inches.

Tracking the closest pixel

This is a very general algorithm that can be refined an altered according to the context.
The goal: we want to provide a simple and effective way to track user movement, and give the user
a the feeling that they are controlling something on the screen.

Solution:: track the point closest to the Kinect:
Imagine someone standing in front of the Kinect with nothing between them and the Kinect, then if they extend their arm, their hand/finger will be the closest point, and if we track this point, then the user can possibly feel as if they are controlling something on the screen.

The algorithm:

  • Go through each value in the depth map and determine which one has the smallest value
  • Compare the current value to the current minimum value and if this current value is less than the current min value then we have a new minimum which will be used to compare against the next value
  • Repeat until we have gone through the entire map.
  • At end the current minimum is the closest

The following example implements this algorithm and will display an ellipse where the closest pixel resides.
Note: you need to also ensure that we do not take into account values that are too close (as they have a reading of 0).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
  //variables to store closest x and closest y;
  int closestX=0;
  int closestY=0;
 
 
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
  //set the current min to the highest possible val...
  int currentMin = 8000;
 
    //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // get the location in the depthVals array
      int loc = x+(y*depthCam.width);
      //check if currentVal is less than current min AND ensure that the depthVal is>0
      // rememeber if we are TOO close then the depth reading will be 0
      if (depthVals[loc]>0 &&  depthVals[loc] < currentMin)
      {
        //if the condition is true then assign current val to current min
        currentMin = depthVals[loc];
        // and becuase this is the current min we need to assign the current x and y
        //to the closestX and closestY
        closestX =x;
        closestY= y;
      }
    }
  }
  //display depth image - give borders arround
  image(depthCam, 0, 0);
  //lets display the closest tracked point
  fill(255,0,0);
  ellipse(closestX,closestY,10,10);
}

This example is not bad – however we can see that the ellipse is a bit jumpy – is too sensitive.

Tracking the closest pixel with smoothing (using averages)

The following example will implement a basic smoothing technique whereby we let the closest sample be the average of the last n samples. Specifically, we have an array to hold 3 values and each time we store the closestX and closestY in the array (at next index), then -> the resultant closestX and closestY (used for the ellipse) is based on the average of the 3 values in the array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
  //variables to store closest x and closest y;
  int closestX=0;
  int closestY=0;
 
int[] recentXValues = new int[3];
int[] recentYValues = new int[3];
int currentIndex =0;
 
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
  //set the current min to the highest possible val...
  int currentMin = 8000;
 
    //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // get the location in the depthVals array
      int loc = x+(y*depthCam.width);
      //check if currentVal is less than current min AND ensure that the depthVal is>0
      // rememeber if we are TOO close then the depth reading will be 0
      if (depthVals[loc]>0 &&  depthVals[loc] < currentMin)
      {
        //if the condition is true then assign current val to current min
        currentMin = depthVals[loc];
        // and becuase this is the current min we need to assign the current x and y
        //to the closestX and closestY
        //closestX =x;
        //closestY= y;
        recentXValues[currentIndex] =x;
        recentYValues[currentIndex] =y;
 
      }
    }
  }
  currentIndex++;
  if(currentIndex>2){currentIndex=0;}
 
  //make closestX and closestY the average of the last 3 samples
  closestX = (recentXValues[0]+recentXValues[1]+recentXValues[2])/3;
  closestY = (recentYValues[0]+recentYValues[1]+recentYValues[2])/3;
  //display depth image
  image(depthCam, 0, 0);
  //lets display the closest tracked point
  fill(255,0,0);
  ellipse(closestX,closestY,10,10);
}

Tracking the closest pixel with smoothing (using linear interpolation)

Another smoothing technique is instead of jumping from one value to the next in every frame – we do simple linear interpolation – Processing has a function for that 🙂
The idea is that in each frame we have a variable which holds the interpolated value between the previous value and the current value (the target)
(i.e. go a 3rd of the way) – then we update the previous value to be the current interpolated value, and use the current interpolated value to draw the ellipse.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
  //variables to store closest x and closest y;
  float closestX=0;
  float closestY=0;
  float previousX;
  float previousY;
 
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
  //set the current min to the highest possible val...
  int currentMin = 8000;
 
    //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // get the location in the depthVals array
      int loc = x+(y*depthCam.width);
      //check if currentVal is less than current min AND ensure that the depthVal is>0
      // rememeber if we are TOO close then the depth reading will be 0
      if (depthVals[loc]>0 &&  depthVals[loc] < currentMin)
      {
        //if the condition is true then assign current val to current min
        currentMin = depthVals[loc];
        // and becuase this is the current min we need to assign the current x and y
        //to the closestX and closestY
        closestX =x;
        closestY= y;
 
 
      }
    }
  }
 // "linear interpolation", i.e.
  // smooth transition between last point
  // and new closest point
  // a third of the way between previous and closest
  float interpolatedX = lerp(previousX, closestX, 0.3f); 
  float interpolatedY = lerp(previousY, closestY, 0.3f);
  // update the previous vals
  previousX =interpolatedX;
  previousY=interpolatedY;
 
 
  //display depth image - give borders arround
  image(depthCam, 0, 0);
  //lets display the closest tracked point
  fill(255,0,0);
  ellipse(interpolatedX,interpolatedY,10,10);
}

Tracking the closest pixel with smoothing and thresholding

We can further refine the algorithm – to only check pixels within a certain threshold. This will allow you to only check pixels that are within the specified distance range…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//variables to store closest x and closest y;
float closestX=0;
float closestY=0;
float previousX;
float previousY;
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
  //set the current min to the highest possible val...
  int currentMin = 8000;
 
  //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // get the location in the depthVals array
      int loc = x+(y*depthCam.width);
 
      // if that pixel is the closest one we've seen so far (min)
      // extension::****
      // only look for the closestValue within a range
      // 610 (or 2 feet) is the minimum
      // 1525 (or 5 feet) is the maximum
      if (depthVals[loc] > 610 && depthVals[loc]< 1525 && depthVals[loc] < currentMin)
      {
        //if the condition is true then assign current val to current min
        currentMin = depthVals[loc];
        // and becuase this is the current min we need to assign the current x and y
        //to the closestX and closestY
        closestX =x;
        closestY= y;
      }
    }
  }
  // "linear interpolation", i.e.
  // smooth transition between last point
  // and new closest point
  // a third of the way between previous and closest
  float interpolatedX = lerp(previousX, closestX, 0.3f); 
  float interpolatedY = lerp(previousY, closestY, 0.3f);
 
  // update the previous vals
  previousX = interpolatedX;
  previousY= interpolatedY;
 
  //display depth image - give borders arround
  image(depthCam, 0, 0);
  //lets display the closest tracked point
  fill(255, 0, 0);
  ellipse(interpolatedX, interpolatedY, 10, 10);
}

A drawing tool

We can implement a very simple (albeit rough) drawing tool just using the closest pixel algorithm. In essence, we use the previous example with some additional features. We track the closest pixel and then draw a line between the pixel just before and the current closest pixel.
We then update the previous position variables AFTER we draw the line. For this example, we will display both the depth image and the line being drawn.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//variables to store closest x and closest y;
float closestX=0;
float closestY=0;
float previousX;
float previousY;
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640*2, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  //background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
  //set the current min to the highest possible val...
  int currentMin = 8000;
 
  //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      int loc = x + (y * depthCam.width);
 
      // if that pixel is the closest one we've seen so far (min)
      // extension::****
      // only look for the closestValue within a range
      // 610 (or 2 feet) is the minimum
      // 1525 (or 5 feet) is the maximum
      if (depthVals[loc] > 610 && depthVals[loc]< 1525 && depthVals[loc] < currentMin)
      {
        //if the condition is true then assign current val to current min
        currentMin = depthVals[loc];
        // and becuase this is the current min we need to assign the current x and y
        //to the closestX and closestY
        closestX =x;
        closestY= y;
      }
    }
  }
  // "linear interpolation", i.e.
  // smooth transition between last point
  // and new closest point
  // a third of the way between previous and closest
  float interpolatedX = lerp(previousX, closestX, 0.3f); 
  float interpolatedY = lerp(previousY, closestY, 0.3f);
 
 
  //display depth image - give borders arround
  noStroke();
 
  //display depth image with ellipse for ref
  fill(0);
  rect(640, 0, 640, 480);
  image(depthCam, 640, 0);
  fill(255, 0, 0);
  ellipse(interpolatedX+640, interpolatedY, 5, 5);
 
 
 
  //lets display the closest tracked point
  noFill();
  strokeWeight(3);
  stroke(255, 0, 0);
  line(previousX, previousY, interpolatedX, interpolatedY);
  // update the previous vals after drawing line
  previousX = interpolatedX;
  previousY= interpolatedY;
}
void keyPressed()
{
  if (key == ' ')
  {
    background(0);
  }
}

A drawing tool with mirroring

Note with all the examples so far – that if you move your hand right, the line moves left and vice versa. We need to implement a simple mirroring technique whereby the x value will be reversed – (if the x value is on the left side of the screen move it to its equivalent right side and vice versa. This is done by simply taking the current x in the for loop and subtract it from the width. We then use this reversed value to find the correct location in the depth map array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//variables to store closest x and closest y;
float closestX=0;
float closestY=0;
float previousX;
float previousY;
//3:: setup
void setup()
{
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  //background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
  //set the current min to the highest possible val...
  int currentMin = 8000;
 
  //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // reverse x by moving in from // the right side of the image 
      int reversedX = depthCam.width-x-1; 
      //use reversed
      int loc = reversedX + (y * depthCam.width);
 
      // if that pixel is the closest one we've seen so far (min)
      // extension::****
      // only look for the closestValue within a range
      // 610 (or 2 feet) is the minimum
      // 1525 (or 5 feet) is the maximum
      if (depthVals[loc] > 610 && depthVals[loc]< 1525 && depthVals[loc] < currentMin)
      {
        //if the condition is true then assign current val to current min
        currentMin = depthVals[loc];
        // and becuase this is the current min we need to assign the current x and y
        //to the closestX and closestY
        closestX =x;
        closestY= y;
      }
    }
  }
  // "linear interpolation", i.e.
  // smooth transition between last point
  // and new closest point
  // a third of the way between previous and closest
  float interpolatedX = lerp(previousX, closestX, 0.3f); 
  float interpolatedY = lerp(previousY, closestY, 0.3f);
 
 
  //display depth image - give borders arround
  //image(depthCam, 0, 0);
  //lets display the closest tracked point
  strokeWeight(3);
  stroke(255, 0, 0);
  //ellipse(previousX,previousY,10,10);
  line(previousX, previousY, interpolatedX, interpolatedY);
  //point( interpolatedX,interpolatedY);
  // update the previous vals
  previousX = interpolatedX;
  previousY= interpolatedY;
}
void keyPressed()
{
  if (key == ' ')
  {
    background(0);
  }
}

Revisit the threshold

In this example we go slightly furthur with the threshold idea – to exemplify that we can also only extract those parts of the depth map that are within a certain distance range. The idea is that we have a second image container, whose pixels will be black if the corresponding depth values from the depth map are out range and white if they are within the range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// EX 11:: furthur use with thresholds.... 
 
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//4: declare PImage variable to hold only the pixel data in range ...  
PImage result;
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
 
  // create an empty PImage container
  result = createImage(width,height,RGB);
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
 
  // load the pixel array of the result image
  result.loadPixels();
 
 
  //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // get the location in the depthVals array
      int loc = x+(y*depthCam.width);
      // if the depth values of the sampled image are in range
      if (depthVals[loc] > 610 && depthVals[loc]< 900 )
      {
        //let the pixel value in the result image be white
        result.pixels[loc] = color(255);
      }
      else
      //otherwise let the pixel value in the result image be white
       result.pixels[loc] = color(0);
 
    }
  }
  // update
  result.updatePixels();
  //display the result
  image(result, 0, 0);
}

This example is useful – for understanding ranges – as well as starting to be able to extract objects that are within a specified range for further processing – resulting in very simple background removal.

Tracking the average pixel

This next example uses the background removal technique and computes the average x and y location of all pixels which are in front of a given depth threshold (displayed as an ellipse). We need 3 more variables: to hold the sum of x values within the threshold, the sum of y values within the threshold, and the count for how manyare pixels are within the threshold. We then go through our depth map – test the range and accumulate our sums. At the end we compute the average (if there is one) – and finally display the ellipse.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
//1: import libraries:: -> sketch ->Import Library
import SimpleOpenNI.*;
//2: make a variable to to hold the SimpleOpenNI object (to be able to access data from the kinect)
SimpleOpenNI kinect;
 
//3: declare PImage variable to hold and display the pixel data from the kinect
 
PImage depthCam;
 
//4: declare PImage variable to hold only the pixel data in range ...  
PImage result;
//3:: setup
void setup()
{
 
  // make the sketch size a that of kinect sample
  size(640, 480);
  // set a background color
  background(0);
  // instantatiate the SimpleOpenNI object
  //paremeter : is the current context
  kinect  = new SimpleOpenNI(this);
  //put this in setup so that we can tell the lib in advance what type of data we want
 
  //invoke the method from the lib to allow access to the depth camera
  kinect.enableDepth();
 
  // create an empty PImage container
  result = createImage(width, height, RGB);
}
 
//4: our drawing loop
void draw()
{
  //reset the background
  background(0);
  // get the next frame from the kinect
  kinect.update();
  // get the depth image and assign to the PImage var (using the lib)
  depthCam = kinect.depthImage();
 
 
  // get the depthMap (mm) values
  int[] depthVals = kinect.depthMap();
 
  // load the pixel array of the result image
  result.loadPixels();
 
  //declare variables to hole the accumulate sums and how many samples we have
  // these variables are needed to calculate the average
  float sumX = 0;
  float sumY = 0;
  float count = 0;
 
  //go through the matrix - for each row go through every column
  for (int y=0; y<depthCam.height; y++)
  {
    //go through each col
    for (int x =0; x<depthCam.width; x++)
    {
      // get the location in the depthVals array
      int loc = x+(y*depthCam.width);
      // if the depth values of the sampled image are in range
      if (depthVals[loc] > 610 && depthVals[loc]< 900 )
      {
        //let the pixel value in the result image be white
        result.pixels[loc] = color(255);
        //lets accumulate our sum .. 
        sumX += x;
        sumY += y;
        count++;
      } else
        //otherwise let the pixel value in the result image be white
      result.pixels[loc] = color(0);
    }
  }
  // update
  result.updatePixels();
  //display the result
  image(result, 0, 0);
 
  //display the ellipse ONLY when we have a valid average
  if (count != 0) {
    float avgX = sumX / count;
    float avgY = sumY / count;
    fill(255, 0, 0);
    ellipse(avgX, avgY, 16, 16);
  }
}

Skeleton Tracking with SimpleOpenNI

The last part of this tutorial will focus on the skeleton tracking features made available by the SimpleOpenNI library.

OpenNI has the ability to process the depth image for us in order to detect and track people. Instead of having to loop through depth points to check their position, we can simply access the position of each body part of each user that OpenNI tracks for us. Once OpenNI has detected a user, it will tell us the position of each of the user’s visible “joints”: head, neck, shoulders, elbows, hands, torso, hips, knees, feet.

The following example successfully demonstrates how to use the more advanced features of the SimpleOpenNI library in order to first detect a user , then the position of the hands belonging to the identified user. We can then use the hand positions to draw an ellipse at each one. The result is far more accurate then what we have seen before – as it not only relies on distances but also on form.
Note:: In order for OpenNI’s algorithm to begin tracking a person’s joints, it needs the person to be standing in a known pose. Specifically, you have to stand with your feet together and your arms raised above your shoulders on the sides of your head< /span>.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
//"Code from Making Things See" by Greg Borenstein(MAKE)
//Copyright 2012 Greg Borenstein, 978-1-449-30707-3
 
import SimpleOpenNI.*;
SimpleOpenNI  kinect;
void setup() {
  size(640, 480);
  kinect = new SimpleOpenNI(this);
  kinect.enableDepth();
  // turn on user tracking
  kinect.enableUser();
}
void draw() {
  kinect.update();
  PImage depth = kinect.depthImage();
  image(depth, 0, 0);
  // make a vector of ints to store the list of user ids
  IntVector userList = new IntVector();
  // write the list of detected users
  // into our vector
  kinect.getUsers(userList);
  // if we found any users
  if (userList.size() > 0) {
    // get the first user
    int userId = userList.get(0);
    // if we’re successfully calibrated
    if ( kinect.isTrackingSkeleton(userId)) {
      // make a vector to store the left hand
      PVector rightHand = new PVector();
 
      // for right hand:
       PVector leftHand = new PVector();
      // put the position of the left hand into that vector
      float confidence = kinect.getJointPositionSkeleton(userId, 
      SimpleOpenNI.SKEL_RIGHT_HAND, 
      rightHand);
 
       float confidence2 = kinect.getJointPositionSkeleton(userId, 
      SimpleOpenNI.SKEL_LEFT_HAND, 
      leftHand);
      // convert the detected hand position
      // to "projective" coordinates
      // that will match the depth image
      PVector convertedRightHand = new PVector();
      kinect.convertRealWorldToProjective(rightHand, convertedRightHand);
 
       PVector convertedLeftHand = new PVector();
      kinect.convertRealWorldToProjective(leftHand, convertedLeftHand);
      // and display it
      fill(255, 0, 0);
      ellipse(convertedRightHand.x, convertedRightHand.y, 10, 10);
 
       // and display it
      fill(0, 255, 0);
      ellipse(convertedLeftHand.x, convertedLeftHand.y, 10, 10);
    }
  }
}
/* old version user-tracking callback
void onNewUser(int userId) {
  println("start pose detection");
  kinect.startTrackingSkeleton(userId);
}
 
void onEndCalibration(int userId, boolean successful) {
  if (successful) {
    println("  User calibrated !!!");
    kinect.startTrackingSkeleton(userId);
  } else {
    println("  Failed to calibrate user !!!");
    kinect.startTrackingSkeleton(userId);
  }
}
void onStartPose(String pose, int userId) {
  println("Started pose for user");
  kinect.stopTrackingSkeleton(userId);
  kinect.requestCalibrationSkeleton(userId, true);
}*/
 
void onNewUser(SimpleOpenNI curContext,int userId)
{
  println("onNewUser - userId: " + userId);
  println("\tstart tracking skeleton");
 
  kinect.startTrackingSkeleton(userId);
}
 
void onLostUser(SimpleOpenNI curContext,int userId)
{
  println("onLostUser - userId: " + userId);
}
void onVisibleUser(SimpleOpenNI curContext,int userId)
{
  //println("onVisibleUser - userId: " + userId);
}

Explanation::

  • We tell OpenNI that we want to turn on user tracking: using kinect.enableUser();
  • When the sktch first starts running – no users will be tracked. The functions it provides for accessing a user will return an empty list. The process begins when a user enters the scene. At that point, OpenNI detects that the user is present and is a candidate for tracking. It calls our sketch’s onNewUserfunction.Once we have detected a user, the list of users within our draw function will begin to be populated.
  • At this point if there are users, OpenNI will attempt the calibration process in order to track the user’s skeleton (we call in the onNewUser function kinect.startTrackingSkeleton(userId);
  • If the skeleton is being successfully tracked – we can now attempt to extract the joints. Note: each new user has a unique id associated with it.
  • Next we declare a PVector to store the position of each hand – and then we call the getJointPositionSkeleton() function for each hand – the PVetor is passed into the function and when the function is complete, that PVector will hold the coordinates for the corresponding hand. We also need to pass the constants SimpleOpenNI.SKEL_LEFT_HAND and SimpleOpenNI.SKEL_RIGHT_HAND to tell the library which joint we are looking for. NOTE:: OpenNI thinks of the orientation of the joints from the screen’s point-of-view rather than the user’s. Since the user is facing the Kinect, her right hand will show up on the left side of the screen. – So the right hand will actually be one’s left and vice versa.
  • At this point we have the positions of the hands – however – they are in a different coordinate system to those of the depth image – we will not go into detail here – but basically, the hand coordinates are in Real World coordinates – a Pvector that describes the hand position in relation to the 3D world constructed by the Kinect. However the depth image has projected coordiantes. The coordinate system which represents the 3D plane as seen from a particular point of view as a 2 – Dimensional Image. There is a relationship between these 2 systems – and so SimpleOpenNI provides us with a function to convert from one to the other: d convertRealWorldToProjective
  • And finally – we can now take the converted coordinates, use them to draw the ellipses over the depth image

Adding the -z

Very briefly – we can extend the example by making the ellipse bigger as it comes closer and smaller when far away.
Our converted PVectors do contain also a z- coordinate (the position on the projected z-axis). So we can simply use these values – map them into a size that works for an ellipse:
The example code above will be modified with the following:

1
2
3
4
5
6
7
8
9
 // note these PVectors are 3D - so we can use the z - to change the size of the ellipses:
 float ellipseSizeRed = map(convertedRightHand.z, 700, 2500,  50, 1);
 float ellipseSizeGreen = map(convertedLeftHand.z, 700, 2500,  50, 1);
// and display it
fill(255, 0, 0);
 ellipse(convertedRightHand.x, convertedRightHand.y, ellipseSizeRed, ellipseSizeRed);
// and display it
fill(0, 255, 0);
ellipse(convertedLeftHand.x, convertedLeftHand.y, ellipseSizeGreen, ellipseSizeGreen);

The User Map

Recall how previously we used basic thresholding to do background removal? Well – now it is even easier using other SimpleOpenNI functions.
The key to removing the background is a function provided by SimpleOpenNI called getUsersPixels. This function returns a map of the scene. This map takes the form of an array of integers. The array has one element for each pixel in the depth image. The value of each element tells us whether the corresponding pixel in the depth image is part of a user or part of the background.

So – this next example will take advantage of these values to draw the user’s silhouette. If a pixel is part of a user, we’ll draw it as a solid blue. Otherwise, we’ll leave it black.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import SimpleOpenNI.*;
SimpleOpenNI  kinect;      
int [] userMap;
 
void setup()
{
  size(640,480);
  kinect = new SimpleOpenNI(this);
  if(kinect.isInit() == false)
  {
     println("Can't init SimpleOpenNI, maybe the camera is not connected!"); 
     exit();
     return;  
  }
 
  // enable depthMap generation 
  kinect.enableDepth();
 
  // enable skeleton generation for all joints
  kinect.enableUser();
  background(0);
  }
 
void draw()
{
  // update the cam
  background(0);
  kinect.update();
  // draw the skeleton if it's available
  int[] userList = kinect.getUsers();
  if(userList.length>0)
  {
    userMap = kinect.userMap();
    // load sketches pixels
    loadPixels();
   for(int i=0; i<userMap.length; i++)
    {
      if(userMap[i]!=0)
      {
         pixels[i] = color(0,0,255);
      }
      else
      {
        pixels[i] = color(0);
      }
    }
   updatePixels();
 
  }  
}
// -----------------------------------------------------------------
// SimpleOpenNI events
 
void onNewUser(SimpleOpenNI curContext, int userId)
{
  println("onNewUser - userId: " + userId);
  println("\tstart tracking skeleton");
 
  curContext.startTrackingSkeleton(userId);
}
 
void onLostUser(SimpleOpenNI curContext, int userId)
{
  println("onLostUser - userId: " + userId);
}
 
void onVisibleUser(SimpleOpenNI curContext, int userId)
{
  //println("onVisibleUser - userId: " + userId);
}

The example is similar to before:

  • We call kinect.enableUser() within setup to turn on user tracking.
  • We then call getUsers() to check to see if any users have been detected. The onNewUser callback is used for new user detection. However – we do not need to track the skeleton (joints) for this example.
  • Once user tracking has begun, we’re ready to grab the data indicating which pixels belong to users and which are part of the background. At the top of the sketch, we declared userMap as an array of integers. Here, we load it with the map of the user pixels by calling kinect.userMap();
  • We now have an array of integers -which is the same size as the sketch. Therefore, we iterate through this map – and if the value at the index is not 0 – this means we have a user pixel and we color it blue – otherwise we color it the same as the background.

A final example

This last example will exploit the fact that were just able to remove the user from the background. We can now use the Kinect’s RGB camera to extract exactly those pixels that correspond to the user map pixels.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
//user map
 
import SimpleOpenNI.*;
SimpleOpenNI  kinect;      
int [] userMap;
 
void setup()
{
  size(640,480);
  kinect = new SimpleOpenNI(this);
  if(kinect.isInit() == false)
  {
     println("Can't init SimpleOpenNI, maybe the camera is not connected!"); 
     exit();
     return;  
  }
 
  // enable depthMap generation 
  kinect.enableDepth();
 
  // enable RGB camera
  kinect.enableRGB(); 
 
  // enable skeleton generation for all joints
  kinect.enableUser();
 
  // turn on depth-color alignment 
  kinect.alternativeViewPointDepthToImage();
  background(0);
  }
 
void draw()
{
  // update the cam
  background(0);
  kinect.update();
 
  // get the Kinect color image
  PImage rgbImage = kinect.rgbImage(); 
  // prepare the color pixels 
  rgbImage.loadPixels();
 
 
  int[] userList = kinect.getUsers();
  if(userList.length>0)
  {
    userMap = kinect.userMap();
    // load sketches pixels
    loadPixels();
   for(int i=0; i<userMap.length; i++)
    {
      if(userMap[i]!=0)
      {
         // set the sketch pixel to the color pixel
          pixels[i] = rgbImage.pixels[i];
      }
      else
      {
        pixels[i] = color(0);
      }
    }
   updatePixels();
 
  }  
}
 
 
 
// -----------------------------------------------------------------
// SimpleOpenNI events
 
void onNewUser(SimpleOpenNI curContext, int userId)
{
  println("onNewUser - userId: " + userId);
  println("\tstart tracking skeleton");
 
  curContext.startTrackingSkeleton(userId);
}
 
void onLostUser(SimpleOpenNI curContext, int userId)
{
  println("onLostUser - userId: " + userId);
}
 
void onVisibleUser(SimpleOpenNI curContext, int userId)
{
  //println("onVisibleUser - userId: " + userId);
}

This example is very similar in structure to the one previously – additions:

  • We enable the RGB Camera
  • In the draw() – we capture the RGB image
  • Recall – how at the beginning of the tutorial we found out that the depth camera and the RGB camera do not display exactly the smae image? – Well we can easily correct that by calling kinect.alternativeViewPointDepthToImage() . Now both images will be aligned.
  • Finally, instead of turning a user pixel blue – we assign the corresponding pixel on the screen to be that of the RGB image.

We now have successfully placed ourselves in the frame – eventhough a little clean up is required.

The examples covered in this tutorial are sourced from Greg Borenstein’s Making Things See book and Daniel Schiffman’s very informative Processing + Kinect tutorial -thank you.