Parsable Space

Telling stories in moving pictures is the most powerful technology ever developed. Unfortunately in the usual progression toward an even more powerful medium that can be parsed and then recombined liked text, motion pictures are stuck in the age of scribes.  Film and video have been kept from having their “movable type moment” because scenes never get broken down beyond the level of the frame.  Each frame is merely a collection of colored pixels instead of a collection of separable and recombinable elements.  We are seeing hint of progress here with elements being captured and manipulated separately in  video games and special effects.  This week are going to take some steps to find some structure in video images.

Getting an Image

From a URL or a  File.

Many of you have already done this.  If you use use a filename rather than a url, you first have to put the file inside your sketch folder.  You may even want to create an assets folder inside of your sketch folder and store all your images there.  Make sure you declare the variable at the top so you can load it in setup but use it in draw.  If you load it over and over in draw() your program will be slow. Very slow.  The image function can take two coordinates for the position of the image but optionally you can supply two more scaling the width and heigh of the image.


From the Camera

We are going to make a “capture” variable for pulling in the images from your webcam. You should declare it at the top so it can be created in setup() and used in draw().

In setup(), use the createCapture() function to store your camera’s images in our capture variable.  Declare the size of your capture using capture.size().

Finally you want to show the images in draw.  You use the image() function just as if it were a still image.  From this point on video is just treated the same as a still image.

Also notice I’ve commented out the capture.hide() function.  Uncomment to see what it does.

From a Movie

You can also bring movies into P5.  The process is nearly identical to playing sound.


Getting the Pixels Array from an Image

Images are not very computationally interesting.  They are really take it or leave it, all or nothing entities with very little opportunity for interactivity.  We want to break down beyond the frame into the pixels.  Regardless of where your images came from, the internet, a file, a camera, a movie, in the draw loop they are all just images.

Behind every image there is an array of numbers with 4 integers (R,G,B,A) for each pixel of  the picture.  It is very easy to use dot notation to get at the array of pixels from inside the image object.  We will use this notation to ask about a pixel color or to set a pixel color.

pixels[0] // the Red value of the first pixel in the array

pixels[1]  // the Green value of the first pixel in the array

pixels[2]  // the Blue value of the first pixel in the array

pixels[3]  // the Alpha value (transparency) of the first pixel in the array


Load Pixels and Update Pixels

Before you start operating on the pixels of an image object you should use the  loadPixels() function of that object to make sure the pixels are fresh.  On the other side once you are done changing the individual pixels of an image object you should use updatePixels() to make sure that array refreshes the image when when it is displayed.  You can get away without doing this quite a lot.

Visit All the Pixels

There are a lot of pixels so you will need a repeat to visit each pixel.   This is a pretty power thing to be able to visit hundreds of thousands of pixels every 30 milliseconds (30 frames/sec) and ask each one a question (what color are you?) and set it to a new color accordingly.


But we can do more that swap color.  We can begin representing pixel information with other objects as well.  Lets look at each pixel and see how bright it is.  Then for each pixel lets map that brightness to the size of an ellipse.  Since we will be changing the sizing of each ellipse, its overkill to look at each individual pixel, we should step over a certain amount of pixels based on the maximum potential size of our ellipses:


Image Processing vs Computer Vision

As powerful as that repeat loop is, it is not good enough.  The code above does image processing and then the results are delivered to your eyeballs to interpret.  We want the machine to do a little more interpretation itself, for instance separate an object from the background and tell me the position of the object in the scene.  Rather than delivering the results only to your eyeballs I  want it to deliver the results in numbers that can then be used in your code for other things.  For instance you could have an if statement that says if their x position is greater than 100 pour a virtual bucket of water on them. Or play a higher pitched not as your hand moves up.   Your next goal to should be to track the position of something in the camera’s view.

Art School vs Homeland Security

You should contrive your the physical side of your tracking situation as much as possible to make the software side as easy as possible.  Computer Vision is pretty hard if you are trying to find terrorists at the super bowl.  But your users have a stake in the the thing working and will play along.  Rather than tracking natural things it is okay to have the user wear a big orange oven mitt.

The Classic: Rows and Columns and Questions

Unfortunately the pixels are given to you in one long array of numbers but we think of images in terms of rows and columns.  Instead of a single for loop we will have a “for loop” inside  a “for loop.”  The first loop will visit every column and the second will visit each row in the column (same to do the reverse visit each row and than each column in the row).  Then you pull out the colors of the pixel and ask a question about them (more on the possible questions below).  This “for loop” within a “for loop” , extracting color components followed by an if statement question will be the heart of every computer vision program you ever write.

 Formula for Location in Pixel Array for a Given Row and Column

There is just one trick in there for finding the offset in the long linear array of numbers for a given row and column location.  The formula is:

You perform this formula every time you try to figure out how many people are in a theater.  Pretending people filled all the seats in order you would find the number seats in a row and multiply times how many rows are full and then add how many seats are in the last partially full row.

Comparing Colors

As we will see, after you extract the colors of the pixel you might want to compare it to a color you are looking for, the color of the adjacent pixel, the same pixel in the last frame etc….  You could just subtract and use absolute value to find the difference in color.  But it turns out in color space as in real space finding the distance as the crow flies between two colors is better done with the pythagorean theorem.  Coming to the rescue in Processing is the dist() function.

Things You Can Ask Pixels

Color Tracking: Which Pixel Is Closest to A Particular Color?

In this code we will use three variables, one to keep track of how close the best match yet is.  And two more to store the location of the best match so far.

You click on a pixel to set the color you are tracking.  Notice we used this formula again in mousePressed() find the offset in the big long array for the particular row and column you clicked at.  Remember it is usually better to insert some patch of unnatural color, say a bright colored toy, into your scene than to make software that  can be very discerning of nature.


Smoother Color Tracking: What is the Average Location of of Pixels That are Close Enough Particular Color?

The key part of this code is the fact that I am not looking for a single pixel but instead using a threshold to find a group of pixels that are close enough.  In this case I will average the locations of all the pixels that were close enough.  In the previous example a bunch of similar pixels would win the contest of being closest in different frames so it was kind of jumpy.  Now the whole group of close ones win so it will be smoother.

Another thing to notice in this code is the key pressed function.  That is used to change variable, especially the threshold variable dynamically while the program is running.  This is important in computer vision because light conditions can change so easily.

This is not very important, don’t read it, but I using a timer to test the performance.  As fast as computers are these days, you will really be testing them with computer vision where you need to visit millions of pixels per second.  If you add a third repeat loop in there you will really need to start checking performance.  We have seen this method before of using a variable to store the millis() and then checking the millis() against that variable to see how much time has passed.  There are about 33 milliseconds in a frame of video at 30 frames/second.

Okay here is some smoother tracking.



Looking for Change: How does this Pixel Compare to the background?

For this we need to store another set of background pixels to compare the incoming frame to.  If you use the previous frame as the background pixels you will be tracking change.  If you set the background in setup or in mousepressed() (as in this example) you will be doing background removal.



Skin Rectangles: Is this pixel skin colored and is it part of an existing group of pixels?

Okay this example is combining two new things.  The most important thing is we are looking for multiple groups of pixels so find a single average for the whole frame as we were doing won’t be enough.  Instead we are going to ask each pixel first if they qualify against a threshold and then second if they are close to an existing group or whether they need to start a new group.  This could be used for pixels that qualify for things other than  skin color, for example the pixels could qualify for being a bright or being part of the foreground.

This example also happens to be about skin color.  It is a heart warming fact that regardless of race we are all pretty much the same color because we all have the same blood.  We are however different brightnesses.  Because brightness is not a separable notion in the RGB color space we using a “normalized” color space where we are looking for percentages of red and green (the third color will just be the remain percentage).

Okay there is the code in two parts.  There is also a rectangle object for holding the the qualifying pixels.



Other Examples:

Face detection

Kinect For P5

Kinect For Processing

Network Camera

Related reading:
Learning Processing, Chapters 15-16

Pixels Project

Track a color and have some animation or sound change as a result.  Create a software mirror by designing an abstract drawing machine which you color according to pixels from live video.
Create a video player. Consider combining your pcomp media controller assignment and build a Processing sketch that allows you to switch between videos, process pixels of a video, scrub a video, etc.
Use the kinect to track a skeleton. Can you “puppeteer” an avatar/animation with the kinect?


Leave a Reply