Video By The Pixel

UPDATED VERION
By Dan O'Sullivan

Short Story:

Download this and put it in the same folder as your class.
Try these examples.
If it doesn't work make sure you are configured correctly with QuickTime For Java and a VDIG Driver.

Long Story:

You may never come back:

Having computers fast enough to analyze video on a pixel by pixel basis is relatively new. I find that people who experiment with this stuff experience a mind shift from which they never recover. Treating video as an input to your software instead of the usual output of your software lots of things open up to you. It solve one big problem, your software is boring your body. Your body is capable of so much subtle movement and response that is really just lost on the mouse and keyboard. A video camera with its millions of pixels every second, now that is a volume and a speed that is a worthy match for your body. After you get a hold of the pixels you start pulling objects out the video image, perhaps just removing a foreground from the background. Pretty soon you are treating video more like a mutable vector than a "take it or leave it" bitmap which opens up all sorts of interactive possiblities. You should know that you will never win, there are just too many of them and they change too often. They will just beat you down and you will stop bathing.

New Appreciation of your body:

Having a machine that is fast enough to go pixel by pixel through video is one thing, using that to, for instance, reliably track a face moving through the video is another. As you get into this stuff you will have new appreciation of your body's visual and mental equipment. The thing that makes computer vision so difficult is that our eyes and brains automatically adjust but cameras and computers do not. For instance your eye compensates when it sees a fire engine at noon and in the evening and it appears to be the same color even though the sun's color is changing the color of the truck. The computer will just see it as different colors. Making matters worse, we are using old video standards (NTSC, PAL etc) that are very noisy and the even if the color is not changing, blips in the signal will make it appear to change. Even if we could solve those problems with better equipment and specttragraphic imaging (like Mitchell Rosen is doing at RIT) that still leave us with the biggest problem, recognizing objects within the video. Our brains so effortlessly can pick out parts of a scene, for instance a head, a hand or a snake but when you sit down to write software to do that, it is really hard to even know where to start.

80% Approach:

Many of these techniques fall roughly the fields of Computer Vision or Digital Image Processing. I suspect many of you are not anxious to go back to school to get a PhD in these subjects. In the examples I show you how to get pretty far, say 80% of the way, pretty easily and then expect you to contrive your situation so the remaining 20% never happens. For instance it might be very difficult to find a person against some changing and arbitrary background (the stuff of PHD dissertations) but very easy to find them against a uniform white background (the stuff of Art School projects). It is very difficult to recogonize a person's face but trivial to find a particular red hat. If you want to make your life easy have the person wear a red hat and ensure that there is a uniform white background. It does not have to be that contrived but you get the idea.

How these classes compare with other technologies:

Java may not leap to mind as the perfect solution for pixel by pixel work because it tends to lag behind C++ in terms of the speed of execution which is so crutial when you are talking about so many pixels per frame. Java is now fast enough for some video scanning applications and as machines and just-in-time compilers get faster this will not be a problem. For example I have been working on a 2.0 Gighz PC with a firewire camera (some cameras or input devices cannot provide you with 640x480). I can get the 640 x 480 pixels and display them at about 24fps. I can track a color 640 x 480 at 15fps. I can blur (this is the most expensive act) and remove the background at 320 x 240 at 10fps. These are all with the using the convenient (but slower) setpixel and getpixel routines instead of unpacking things manually. The ConvolveOps for blurring are theoretically tapping your native graphics capabilities so a better graphics card might also help in addition to pure processor speed. There are of course all the usual benefits to Java like the fact that it is portable between platforms, and can be networked like the dickens but the main reason you might want to use Java for video by the pixel is because you already know how to use it or want to learn.

Other solutions for video by the pixel:

TrackThemColors Xtra -- Danny Rozin made this extention of Macromedia Director. Director is not fast enough to go through the pixels one by one but the Xtra (written in C) does most of the work for you. This is by far the easiest solution if you already use Director. Josh Nimoy has a similar Xtra for director that I have never used but he is a smart guy and I like his web page.

Jitter or NATO-- These are additions to MAX. If you are a Max or PD programmer this may be just the ticket for you.

C++. This approach will give you the absolute best speed a given machine is able to deliver in going through all those many pixels. It also has the advantage that you can connect with certain useful libraries like Intel's OpenCV (computer vision) libraries. Here are some samples for how to do it using Quicktime.

Java Media Framework --This is a very similar approach in that it also uses Java. The difference is that I am using the "Quicktime For Java" classes instead of the JMF classes to get at the pixels. There is no JMF classes for the Mac. Shawn Van Every has kindly made this code available..

Microcontrollers. Now even tiny single chip computers (for instance the SX) that you can buy for $10 are fast enough to process video on a pixel by pixel basis. CMU has a kit and companies like TYZX are developing interesting products.

Brass Tacks:

These Java classes that will allow you to take in live incoming video, visit each pixel individually, analyse it, change it, and spit is back out to the screen. They can enable you to write software to, for instance, track something within the image or remove the background or change the colors or blur the image. These classes really just allow you to do things like getPixel(x,y) and setPixel(x,y,color), the schemes for using these functions to accomplishing tracking or background removal are up to you (with a few examples from me). I could not resist throwing a bunch of other utilities into the classes for simple things like creating BufferedImages or jpegs from the video.

The Classes:You should download this vbp.jar file. It contains the all the necessary classes (essentially just PixelSource and ImageWrangler). The files are compressed in that one file and there is no need to uncompress them, java can go inside and see them.

System Intallation: It is possible that if some video input and Quicktime are installed on your machine you will not have to change a thing. On the other hand it may turn out that you spend more time getting drivers and files in the right places so that you can see any video at all, than you do writing your software to do crazy things with that video. This is the part I really hate so here are some tips from my experience.

Your Coding Enviroment: If you are an experienced java programmer you will skip this section. In addition to getting your machine's system set up to do Quicktime video, you will have to make sure whatever you use to compile and run your java code is seeing the necessary classes (yourcode.java, QTJava.zip,vbp.jar,rt.jar) in its classpath. Here are some descriptions of how to do this in the command line, in Textpad and in Codewarrior.

Documentation: Java Docs for the video by the pixel classes. PixelSource, ImageWrangler, ImageWranglerOld The documentation for QTJava which was used to make these classes is at http://qtj.apple.com/pub/doc/index.html and basic code for getting is here.

Your Code: The main class that you will use is called PixelSource. Basically you will make a repeat loop to visit all the rows and then another repeat loop inside that one to visit all the pixels in the row. Here is a snippet.

Examples:

*Application that will work with Java 1.1 (ie on Mac 0S9)