## December 03, 2007

### Someone stop me before I markov again

Since I had already written the necessary code for my last mapping project, here's the Thesis Title Generator. I scraped the ITP thesis pages for titles, dumped them all into a text file, and trained my Markov algorithm on them. The program generates strings that very much resemble, but are not (or at least, rarely) identical to, thesis titles from years past. Some recent favorites:

• massive Narrative, and their interactive Shoes
• Be My Father
• The Spectacular Interactive Network City

### Mapping: Weekly Response Project

A few weeks ago, Rachel had us fill in a sheet of "responses" to the content of every week of the course. The space allotted to each response was small: one line, maybe one or two sentences (more if your handwriting was small). Later, she directed us to use these responses as the raw material for a map. The class was divided into groups, each group taking responses from a different week (or combination of weeks). I worked with Riddhima, mapping the responses for weeks two and three.

The resulting map is here.

Okay, so it's only a map in a very abstract sense: It's a program that generates text from the weekly responses using a Markov chain algorithm. Here's how it works: the program parses all of the source text (in this case, the student "responses"), and breaks it into groupings of n letters; these are called n-grams (or k-grams). It then calculates the probability of any other letter occurring after each n-gram. For example, given this source text:

and
animal
androgynous
animosity
anchor

The n-gram an would have a 40% probability of being followed by d, a 40% probabiliy of being followed by i, and a 20% probability of being followed by c. The program above then does a random walk through the map, printing out letters according to their probability, then feeding the next n-gram back into the algorithm; the result is a generated text that outwardly shares many of the surface features of the source text, while not being identical with any portion of it.

In essence, the program is building a probability map of the raw text. In the process, it reveals the lexical and structural similarities in all of our responses. The resulting texts are humorous (or at least, I think they are!), but the process of generating them is subversive: just like a well-made map, it subjects the underlying topography to new readings.

Source code is available on request (if you really need to see a trivial implementation of a Markov chain in Perl...). The original transcription is here.

## September 18, 2007

### Mapping Week 2: Methods and Mapping

In this week's selection from Visual Explanations, Tufte concerns himself with graphing data—specifically, the data from John Snow's investigation of the Broad Street Cholera outbreak in 1854. His main point is that (what he calls) aggregation—both spatial and temporal—can "mask relevant detail and generate misleading signals" (p. 36), which can in turn lead to an incorrect interpretation of the data. Tufte draws a distinction between "method" and "reality" - the former being bias introduced by aggregation techniques, and the latter being the "true story of the data." He goes on to note:

A further difficulty arises, a result of fast computing. It is easy now to sort through thousands of plausible varieties of graphical and statistical aggregations—and then to select for publication only those findings strongly favorable to the point of view being advocated. (p. 37)

What interests me here is that Tufte seems to take for granted the accuracy of the data—as if the collection of data is free from political and rhethorical considerations. But the process of collecting data is itself a kind of mapping: you have to decide which chunks of reality are relevant, how to formalize those chunks, how to digitize them. So data visualization is, in a sense, a map of a map, doubly subject to the problems of subjectivity and arbitrariness that Tufte mentions.

So the question is this: can data visualizers can take the data as basic? Or does the process (or potential) of data visualization itself have an effect on how data is collected? (The analogy here is with the observer's paradox, or even the uncertainty principle: observing the world to collect data also changes the world that you're observing.) Do researchers (unconciously?) practice data collection techniques that create data that is more easily visualized? Conversely, do data visualizers seek out data that is easy to visualize? Or, taking a step back: do we organize our world in a way that encourages certain methods of data collection and data visualization?

An even better question: How do these questions come to bear on Tufte's assertion that "the reason we seek causal explanations is in order to to intervene, to govern the cause so as to govern the effect" (p. 28)?

## September 10, 2007

The introduction to Else/Where is more or less a set of entry points, so I'm going to use it as an entry point into something I've been thinking about recently. Specifically, this article: Thoughts on the Social Graph by Brad Fitz (best known as the guy who created LiveJournal). The social graph, according to Fitz, is "the global mapping of everybody and how they're related," particularly in reference to social software. He goes on:

Unfortunately, there doesn't exist a single social graph (or even multiple which interoperate) that's comprehensive and decentralized. Rather, there exists hundreds of disperse social graphs, most of dubious quality and many of them walled gardens.

Fitz uses this article to present a project whose goal is to "make the social graph a community asset." In other words, if you're friends on LiveJournal, you should be friends on Facebook, and vice versa; it should, moreover, be trivial for emerging social networking sites to get their hands on this data. The idea is this: we're talking about a number of relationships that are entirely analogous here. If we could all just pull together and cooperate, we wouldn't have to rebuild our social networks on every site we join.

The problem is that these relationships aren't entirely analogous or, at least, there's room for questioning the analogy. Is being friends on Facebook really the same thing as being friends on LiveJournal? Are LiveJournal friends exactly like MySpace friends? Flickr contacts? Last.fm "neighbors"? It strikes me that the functionality and semantics of the "friend" relationship on each of these sites is very different. The software uses the relationship for different functions (hiding information, revealing information, making recommendations, etc.); different criteria must be met in order for someone to be counted as a "friend" (or "neighbor" or "contact" or whatever).

The belief that these relationships are somehow, deep down, essentially the same is, in my opinion, a mistake. It's a problem of mapping: a classic confusion of the map and the territory. The formalization of the system (the map) only accounts for only a subset (maybe even an imaginary subset) of "friend" relationship in practice (the territory).

It occurs to me that these subtle differences in the way relationships are created, maintained, and abandoned, and what the relationship means in terms of the way the software works, are all factors in what makes a particular social site unique. (E.g., Being able to easily find and grant privileges to schoolmates on Facebook is essentially what made Facebook popular.) Moreover, the fact that these social software sites exist in separate social "enclaves" may actually be a feature - it functions as a way of partitioning different social groups and different ways of socializing.