# Thesis – Week 08 (progress)

Face Recognition, XML, initial clusters…
Closing in on a working backend system

We, being myself and Akira Shibata, accomplished a lot over the break. The current face recognition system based on Phillip Wagner’s Eigen Face implementation is working and provides a good starting point that can be improved upon later. For now it provides us with fit values that we can use to cluster faces with.

Above is an Eigen Face reconstruction using a combination of faces we extracted, about 500 faces. For fun we decided to run our PCA Test (Principal Component Analysis) but with the test image included in the training set (below). The result is clear with the image rapidly converging. Beside it is a colorized grayscale version of Eigen Faces a useful  method for showing variation that the human eye has a hard time discerning in gray. Also a short animation combining the progression of all Eigen Faces.

XML

Next we began anticipating the need to make data available to a front-end interactive system. We researched a number of methods including existing xml schemas but in the end nothing existed that fit our needs so we created our own that our Python can generate:

```<Photo> <DayOfWeek>Wednesday</DayOfWeek> <Month>3</Month> <UnixTime>1268248703.0</UnixTime> <NumFaces>0</NumFaces> <Longitude>35.6816666667</Longitude> <Filename>IMG_2255.jpg</Filename> <Latitude>139.706833333</Latitude> <Year>2010</Year> <Date>10</Date> <TimeOfDay>14:18</TimeOfDay> <Size>(640, 480)</Size> </Photo>```

Clustering

Finally we began what we had really been waiting for… clustering. We are still trying to make our way around Matplotlib, but we were able to make an initial time-based cluster and make a combination plot with a dendrogram (seen above). Because we tend to think of time in a linear fashion the plotted representation may seem couter intuitive at first, but this cluster is our first attempt toward very rudimentary event detection. We are essentially trying finding a likeness among the photos based entirely on the EXIF time data represented in UNIX time. The dendrogram method configures the clusters so that things are in a rough sequence from left to right, but it also avoids intersecting itself so there is some shuffling, but we can interpret the dendrogram as illustrating a series of larger events containing various levels of sub-events. The dark blue represents more recent events with the lighter colors receding into the past. One can imagine how this method of event detection could become more interesting with histogram and faces combined, but already it shows different and possibly interesting way of interpreting one’s personal data.

This entry was posted in Research Studio: Algorithms, Spring 2012, Thesis. Bookmark the permalink.