By Dan O'Sullivan
Having computers fast enough to analyze video on a pixel by pixel basis is relatively new. I find that people who experiment with this stuff experience a mind shift from which they never recover. Treating video as an input to your software instead of the usual output of your software lots of things open up to you. It solve one big problem, your software is boring your body. Your body is capable of so much subtle movement and response that is really just lost on the mouse and keyboard. A video camera with its millions of pixels every second, now that is a volume and a speed that is a worthy match for your body. After you get a hold of the pixels you start pulling objects out the video image, perhaps just removing a foreground from the background. Pretty soon you are treating video more like a mutable vector than a "take it or leave it" bitmap which opens up all sorts of interactive possiblities. You should know that you will never win, there are just too many of them and they change too often. They will just beat you down and you will stop bathing.
Having a machine that is fast enough to go pixel by pixel through video is one thing, using that to, for instance, reliably track a face moving through the video is another. As you get into this stuff you will have new appreciation of your body's visual and mental equipment. The thing that makes computer vision so difficult is that our eyes and brains automatically adjust but cameras and computers do not. For instance your eye compensates when it sees a fire engine at noon and in the evening and it appears to be the same color even though the sun's color is changing the color of the truck. The computer will just see it as different colors. Making matters worse, we are using old video standards (NTSC, PAL etc) that are very noisy and the even if the color is not changing, blips in the signal will make it appear to change. Even if we could solve those problems with better equipment and specttragraphic imaging (like Mitchell Rosen is doing at RIT) that still leave us with the biggest problem, recognizing objects within the video. Our brains so effortlessly can pick out parts of a scene, for instance a head, a hand or a snake but when you sit down to write software to do that, it is really hard to even know where to start.
Many of these techniques fall roughly the fields of Computer Vision or Digital Image Processing. I suspect many of you are not anxious to go back to school to get a PhD in these subjects. In the examples I show you how to get pretty far, say 80% of the way, pretty easily and then expect you to contrive your situation so the remaining 20% never happens. For instance it might be very difficult to find a person against some changing and arbitrary background (the stuff of PHD dissertations) but very easy to find them against a uniform white background (the stuff of Art School projects). It is very difficult to recogonize a person's face but trivial to find a particular red hat. If you want to make your life easy have the person wear a red hat and ensure that there is a uniform white background. It does not have to be that contrived but you get the idea.
Java may not leap to mind as the perfect solution for pixel by pixel work because it tends to lag behind C++ in terms of the speed of execution which is so crutial when you are talking about so many pixels per frame. Java is now fast enough for some video scanning applications and as machines and just-in-time compilers get faster this will not be a problem. For example I have been working on a 2.0 Gighz PC with a firewire camera (some cameras or input devices cannot provide you with 640x480). I can get the 640 x 480 pixels and display them at about 24fps. I can track a color 640 x 480 at 15fps. I can blur (this is the most expensive act) and remove the background at 320 x 240 at 10fps. These are all with the using the convenient (but slower) setpixel and getpixel routines instead of unpacking things manually. The ConvolveOps for blurring are theoretically tapping your native graphics capabilities so a better graphics card might also help in addition to pure processor speed. There are of course all the usual benefits to Java like the fact that it is portable between platforms, and can be networked like the dickens but the main reason you might want to use Java for video by the pixel is because you already know how to use it or want to learn.
TrackThemColors Xtra -- Danny Rozin made this extention of Macromedia Director. Director is not fast enough to go through the pixels one by one but the Xtra (written in C) does most of the work for you. This is by far the easiest solution if you already use Director. Josh Nimoy has a similar Xtra for director that I have never used but he is a smart guy and I like his web page.
Jitter or NATO-- These are additions to MAX. If you are a Max or PD programmer this may be just the ticket for you.
C++. This approach will give you the absolute best speed a given machine is able to deliver in going through all those many pixels. It also has the advantage that you can connect with certain useful libraries like Intel's OpenCV (computer vision) libraries. Here are some samples for how to do it using Quicktime.
Microcontrollers. Now even tiny single chip computers (for instance the SX) that you can buy for $10 are fast enough to process video on a pixel by pixel basis. CMU has a kit and companies like TYZX are developing interesting products.
Code for finding the width (which can be mapped to distance): After doing the loops that find the best x and y for a given color, go into two loops that track out from the center of the best color out to the edges to find the width.
rightEdge = x ;
int[] nrgb = ps.getPixel(rightEdge,y);
int pRed = nrgb[1];
int thresholdOfChange = 10;
while(rightEdge < kWidth){
nrgb = ps.getPixel(rightEdge,y);
int diff = Math.abs(nrgb[1]-pRed);
if (diff > thresholdOfChange) break;
pRed = nrgb[1];
rightEdge++;
}
leftEdge = x ;
nrgb = ps.getPixel(x,y);
pRed = nrgb[1];
while(leftEdge >=0){
nrgb = ps.getPixel(leftEdge,y);
int diff = Math.abs(nrgb[1]-pRed);
if (diff > thresholdOfChange) break;
pRed = nrgb[1];
leftEdge--;
}
The expressiveness capacity to give impressions) appears to involve two radically different kinds of sign activity: the expression that he gives, and the expression that he gives off. The first involves verbal symbols or their substitutes which he uses admittedly and solely to convey the information that he and the others are known to attach to these symbols. This is communication in the traditional and narrow sense. The second involves a wide range of action that others can treat as symptomatic of the actor, the expectation being that the action was performed for reasons other than the information conveyed in this way. As we shall have to see, this distinction has an only initial validity. The individual does of course intentionally convey misinformation by means of both of these types of communication, the first involving deceit, the second feigning
Of the two kinds of communication - expressions given and expressions given off - this report will be primarily concerned with the latter, with the more theatrical and contextual kind, the non-verbal, presumably unintentional kind, whether this communication be purposely engineered or not.