Image Processing
- Inside every image is an array of number that describe the color of all the pixels.
- There are many conventions for describing colors using numbers. They vary on the colorspace (eg RGB, HSB among many others), order of colors, bits used for each color, compression (jpg, png among many others).
- Things will be pretty easy for us with arrays using the RGB color space uncompressed with one integer to describe each pixel.
- The only problem is we have one integer per pixel but we want to keep track of at least three things RED, GREEN and BLUE. In Java integers are made up of four bytes so we "pack" the separate colors into byte sized compartments of each integer (the fourth byte is used for "Alpha", usually describing transparency of the pixel and the color space is often referred to as ARGB). Unpacking the four parts of the integer is made easy by the red() green() blue() functions in processing and packing them into an integer is done with the color(r,g,b) function.
- There is an existing variable called "pixels" that holds the color values for the screen. Other images in Processing will also offer you their pixels as a field, for instance myPImage.pixels.
PImage myImage = loadImage("http://www.google.com/images/logo.gif");
int[] listOfPixels = myImage.pixels;
- If you are working with video, for our purposes that is treated just as a very fast succession of still images. The Capture object is also a PImage so you can ask for its pixels. You have
Capture video = new Capture(this, 640, 480);
video.read();
int[] listOfVideoPixels = video.pixels;
- If you are using the video camera their is a lot going on as the pixels make their way up the pipeline to your software. It is sometime helpful to us the Caputure object to look at the default settings, especially the source setting. I usually include something like this in my code to bring up a dialog box when 's' is pressed.
public void keyPressed() {
if (key == 's') {
video.settings();
}
- One oddity of Processing is that you have to explicitly load the colors into the array with the loadPixels() function before you operate on them. After you are finished playing with the pixels you are supposed to use the updatePixels() function before you can expect them to be printed to the screen. I think the reason for this is fairly obscure (rumors about opengl?) but I am sure it is ultimately a good one.
- Now you have to go in a repeat loop and visit every pixel, that is every integer in the array. Typically each time you meet a pixel you ask it a question and then change the pixel depending on the answer.
loadPixels();
for(int i = 0; i < video.pixels.length; i++){
int thisPixel = video.pixels[i];
if (red(thisPixel) > threshold){
pixels[i] = color(255,0,0); //0xff0000;
}
}
updatePixels();
- The question you ask can compare each pixel against some threshold, or against a pixel in another image (background removal) or against neighboring pixels (blur,edge detection).
- You could just use subtraction (and absolute value) to find the difference between two colors. A better way is to treat the colors as points in a color space and use the Pythagorean theorem h2= x2 + y2 + z2. Better yet Processing's dist function can be used for getting the "distance" between colors.
- To check each pixel against itself at a previous time you will need another PImage to keep a copy of the old pixels. This is what you do noticing change or separating foreground and background.
PImage background = new PImage(width,height);
public void grabBackground(){
backgroundImage.copy(video,0,0,video.width,video.height,0,0,video.width,video.height);
backgroundImage.updatePixels();
}
- Quite often you will will want to call such a grabBackground function in setUp() or on mousePressed when you are sure nobody is standing in front of the camera. This will form a reference of what the background looks like so any differences from that reference will be considered a foreground pixel.
public void draw() {
if (video.available()) {
video.read();
loadPixels();
int numberOfChanges = 0;
for (int row = 0; row < video.height; row++) {
for (int col = 0; col < video.width; col++) {
int offset = row * width + col;
int fgColor = video.pixels[offset];
int bgColor = backgroundImage.pixels[offset];
float fgR = red(fgColor);
float fgG = green(fgColor);
float fgB = blue(fgColor);
float bgR = red(bgColor);
float bgG = green(bgColor);
float bgB = blue(bgColor);
float diff = dist(fgR, fgG, fgB, bgR, bgG, bgB);
if (diff > threshold) {
pixels[offset] = fgColor;
numberOfChanges++
}
}
}
if (numberOfChanges > video.pixels.length/2) uploadPicture();
grabBackground(); //you decide grab after every frame or just at setup and mousedown
updatePixels();
}
- }
- This same code can be used to look for movement as opposed to foreground if you use the very previous frame to be the reference to check against. Using the previous frame gives you a very fresh reference as opposed to the picture you took much early in setup() which due to changing ambient conditions may not resemble the background as much. As long as the subject is moving you will get very robust results.
- In the above code you will also see a variable called numberOfChanges. This steps a little past image processing into image analysis, the topic of next week, but maybe you could combine that with last week's assignment to only upload a picture when there is a significant change.
- You will often want to scan the image in rows and column in cases where your are consulting the neighboring pixels. Also beyond image processing where the ultimate answer is a transformed image, you might move to tracking applications where the answer comes in the form of coordinates within the image.
- Row Column scanning is usually done with a repeat loop inside a repeat loop.
- The array you are looping through is still one long list so to convert from row and column to the offset in the the linear array, you use this formula currentPlaceInLinearArray = currentRow*widthOfRows + currentColumn. This is pretty easy to think about and is the same math you would use count the number of people in an auditorium.
for(int row = 0; row < video.height; row++){
for(int col = 0; col < video.width; col++){
int offset = row*width + col;
float d = dist(mouseX, mouseY, col, row);
if (d > threshold){
video.pixels[offset] = 0;
}
}
}
image(video, 0, 0);
- In questions that require consulting your neighbors (aka convolution) , you might end up adding yet another repeat loop or two at each x,y location. For instance to blur an image you can get the values of the surrounding pixels, average them and use that value for each location. edge detection marks locations that are very different from their neighbors.
- Repeat loops within repeat loops look ugly and they are very slow to execute so luckily you can built in filtering functions in processing which only take up a single line of code and are probably more optimized than the code you would write.
- Visiting your neighbors, aka convolution, is so common that people make the additional repeat loops generic and using a matrix to change the specific marching orders. You can do this manually in processing. You can also look into the the Java ConvolveOps class.
- See code samples under the Image Processing Heading