Reading and Writing Electronic Text – Midterm Project

Midterm Project (Sapphic stanza and fragmented text)

While considering some of the different methods for generating poetry and text in our readings I was reminded of the Ancient Greek poet Sappho (630 – 570 BC). One of the things that has always intrigued me about Sappho is how little of her writing remains. Of the nine books of her poetry only one complete poem exists, the Hymn to Aphrodite, the rest are only fragments. I always found the incompleteness of the poems to offer a kind of incredible power through their mystery and speculation as to what might have preceded or continued from them.

This made me consider them outside a the typical literary academia and how we might reconsider the definition of an algorithm for generating text. The reasons for Sappho’s poetry not surviving is varied though much of it is due to it simply falling out of fashion in Ancient Rome in schools. Though it would please many scholars to find the complete work of Sappho, its partial destruction could also be considered a very long culturally driven algorithm toward a unique and wonderful text wholly separate from its author’s intentions.

This is how I decided to write a basic program to generate poems in approximated Sapphic stanza generated from a corpus of combined writings, and then devise a means of partially destroying the generated text… with the intent of creating something unique in its apparent incompleteness.

The Code

The Sapphic stanza can be loosely approximated as 11, 11, 11, 5 syllables, though ancient Greek has more complex syllabic rules… so this was what I aimed for using a two part method for counting syllables:

d = cmudict.dict() 
 
# Syllable counter - If dictionary fails, vowel groups method guesses
def count_syllables(word):
	if word in d:
		return sum([len(list(y for y in x if y[-1].isdigit())) for x in d[word.lower()]])
	else:
		word = word.lower()
		if word.endswith('e'):
			word = word[:-1]
		vowels = 'aeiou'
		in_vowel_group = False
		syllable_count = 0
		for letter in word:
			if letter in vowels:
				if not in_vowel_group:
					in_vowel_group = True
					syllable_count += 1
			else:
				in_vowel_group = False
		return syllable_count

The rest of the code iterates over a frequency dictionary built from a text file and builds the stanzas line by line. While I had been trying to implement a Markov-chain method to generate the lines more dynamically I was unable to combine it with the syllable counter successfully. The current method uses word frequency (borrowed from Erik Mika’s Geo Haiku).

A few stanzas prior to destruction:

shook honey slender stand beauty mad man light
garlands over then them dika dawn out
flower care turn blossomed little their friend zeus
adonis flowers

as roses gifts age hesperus dance
sleep skin follow seemed whose tremors young seized far
shepherds sly heaven nectar did die troy
work regret

dance sleep skin follow seemed whose tremors young
seized shepherds far sly heaven nectar die
did troy work regret suppliants
go persuade stands

Destruction Methods

After generating a length of stanzas (of varying quality) I set out to find ways to partially destroy the text. I felt that the method should be non code based, or that the logic should be at least separate from the method for creating the text, but related to the texts medium, being digital.

My first attempt seen at top and in other forms, was to grow the body of poems to fill entire CD-R discs, essentially repeating them until the file size was just around 600MB. After burning the CD-R I employed various methods to damage the disks, including putting the discs into a microwave oven (the result seen at top alongside a partial papyrus of Sappho). After this I attempted to do a forced no errors RAW-read dump of the entire CD-R to an image file. I tried “readcd” a Linux utility as well as a Windows application for cloning Nintendo Wii games. With more time I think the Linux utility could work, but I was never able to successfully get an image file successfully dumped. The plan after this was to use this Python script that recovers text from corrupted files.

Eventually I resorted to an equally ridiculous method: (1) creating a blank 1bit black and white bitmap file in Photoshop. (2) opening the file in a Hex editor and pasting the generated poems directing into the bitmap as seen here:

(3) I then opened the bitmap in Photoshop and tried various methods of “messing” with the visual form of the text (this involved converting the bitmap to grayscale for editing and then back to bitmap before saving):

I tried manually smearing the text areas as well as applying different levels of visual noise (both gaussian and uniform) to the image files. (4) I then reopened the bitmap files in the Hex editor and extracted the text back to a text file. Surprisingly much of the formatting was retained at times. Here are some of my resulting texts:

With 50% uniform noise:

against id make inch lie
honeybee dElight owment fierce weav% thought
colour without hearts shadowed flesh√°reing yoking

shivering hades humanmelodious woun`ing excites ply world
parents rosy easy ghnstly$has
unseen lady

wings cooleb
Because
often fathers immortal burl
altars scatters pasture plosades step
ever}th)ng gained

With 25% Gaussian noise:

dÈnce s|eep sKÈn fÔllow seemed whose uremoRs young
soÀázed`qhepherds fa2 sly(heataf nectqr(die
dit tRoy wgrk rmwret ss
p,iants
oo pessu·dE spand3

snaras {appho gges!flowing†ever bgoˇ
full ddd festife nera water others
alone path hoplitew boy0crc
u miughtar
ne`ves leavinÁ stray

sweetfgss bittercweet broıght/ˇpoke woumt anrwered
prefer wrongs meiory
`orses holy recall
bsc
zes pour fewjs

With some manual finger painting:

daughter glittering c≠ple this its
came will if hair moon by no bringing
aphrodite op[house one girls shell
mind deathless clear child

are tender fn_gotten voice
things when back down there stars dying
once crethcminded graces filled ardent
lesbos win armed high

shook honey)ûlender stand beauty mad man light
garlands over then them dika)Ô¨Çawn outflower care turn “loÀöso√åed,ittle
4heir friend zeus
adeÀúis floƒ±rsmoÀùnt`in 7¬∑rrioÀôs 3h ll,ef|”l)cker
cytherea brin`¬™earth&z ‘Àô`o|3ee !o¬∏d7ƒ±!z/Àö‚Ć
o˝ /˙{ıouo %qual loveliest wvÌath%{ˇıu~iz%
/ˇ˚Ä&n!e~!n(ap>!/ˇ˜˝w†/ff mother gods ynØ
ar/ˇˇÙ:+†!{˝z%˛ p 0:¸(/Û†˙x
e / was up
us an
vas /ÀáÀá‚ĆÀùw~;‚Ć#ƒ±¬∏>?‚Ć√íuÀá 4|Àù√Ño√è,ow seemed whotr |2%ÀáÀá?√ÑÀáÀáÀú‚İÀù|?ƒ±t ¬∏<> √Ñ?√Ñ0Ô£ø/ec|ar did die {√ôo}?ÀáÀá‚Ć?ÀáÔ£ø?ÀáÀõ Àá√ѬØÀá¬Ø?√ô;op:es sappho goe[ “?Àá?ÀáÀáÀáÀáÔ£ø¬∏?Àá√Ñ0Àá¬∏Àá√Ѭ∏xÀõ(%zs alone pas√ñ (ÀõÀáÀá√ѬØÀá‚İÀáÔ£øÀáÀá√®Ô£ø√ÑÀá¬ØÀõ√Ñ8Àú‚Ć√Çetness bc¬´< ÀáÀáÀõ√ÑÀáÀáÀá√ÑÀáÀá¬øÀáÀáÀá‚İ?ÀáÀáÀáÀá¬∏Àõ?√ÑÀá?Àù1pƒ±mory

hor}‡1ˇˇ¿ˇ‡?˛¯ˇˇ¸ˇˇ‡ˇ tsip end&ÿ ?˛‡Á¿‡ˇˇ¯ˇˇˇˇˇ¿ˇ¿¸0%member,E?Ùˇˇ˛ˇˇˇˇˇˇ¯ˇˇˇ£ˇ˛„ˇˇ¸?p3uffer hø<ˇ

Posted in Reading and Writing Electronic Text, Spring 2011 | Leave a comment

Design Frontiers – Midterm Project

Midterm Project (Exploring the Vacuum)
a joint project with Johnny Lu

Our project focused on a less structured approach to materials exploration. We set out with a very general thesis… build a specialized environment, a vacuum chamber, and see what kind of interesting results could be achieved with unique materials and material states within it.

The Chamber

We fabricated our chamber using a 9 inch stainless steel vessel obtained from a restaurant supply, an air-driven vacuum pump, high-pressure ball valves, negative pressure gauge, silicone gasket, .7 inch acrylic, and plywood. The air-driven pump was rated at -28.3″ Hg (95 kPa).

Our initial tests achieved about  -15″ Hg (50 kPa), later we were able to bring the reading down to nearly -28″ Hg (93 kPa) after we increased the drive compressor from 90 psi to around 120 psi. The chamber was able to hold a vacuum for an extended period without and signs of leaking, -a lucky first success!

A short video of initial testing:

We decided that our first experiments would focus on the effects of a vacuum on super-saturated solutions, namely: sugar in water and plaster of paris mixed with CO2 saturated water. The idea was that super-saturated solutions are stabilized by specific atmospheric and thermal conditions and that we could potentially alter those conditions within a vacuum -particularly while the solutions were undergoing transitional phases such as the cooling of a sugar solution or the exothermic reaction of plaster into a solid.

Sugar

We prepared a heated solution of 300ml of distilled water with about 1.3 Kg of sugar (C12H22O11) which was the maximum amount that would dissolve.

We transferred the mixture to the vacuum chamber and began decompressing the chamber to 90 kPa. The reaction was nearly instantaneous… the solution bubbled over releasing a large amount of the water as vapor after which the remaining sugar seemed to instantly crystalize and harden. The water vapor released so quickly that it obscured our viewing window:

After reaching maximum vacuum we let the mixture cool for 15 minutes in the vacuum then opening the chamber to reveal a super hardened cavitied mass of solid sugar:

After drilling a sample core of the hardened sugar we could clearly see that the violent release of the mixture’s water had left a structure of cavities in the sugar.

We partially disolved the hardened sugar with hot water to see it more clearly, the structure was very hard and resembled coral or lava rock:

Plaster

Next we prepared two mixtures of plaster (calcium sulphate hemihydrate, CaSO4·1/2H2O), one with distilled water and another with CO2 saturated water.

We expected little to happen with the distilled water mixture, but were not entirely sure what to expect from the super-saturated water.

As we evacuate the chamber both solutions began to bubble but as the vacuum increased the CO2 saturated mixture began to expand rapidly:

We maintained the pressure and allowed the mixtures to set. After removing them from the chamber the CO2 mixture revealed a cavatied structure similar to the sugar.

While it is often standard to de-gas casting materials prior to casting in a vacuum chamber we thought the resulting in-vacuum cast to be very interesting -essentially creating a lighter weight equivalent volume of cast material.

Ulterior Motives (X-rays)

The  fabrication of the vacuum chamber was motivated in part by the possibility that it might facilitate an inexpensive X-ray source using scotch tape a phenomenon researched at the University of California, Los Angeles in 2008.

If we are able to produce X-rays with the chamber we hope to use it too effect mutation rates in different plant seeds.

Another possibility would be to make a cloud chamber.

Posted in Design Frontiers, Spring 2011 | Leave a comment

Learning Bit by Bit – Hidden Markov Models

Part of Speech Tagging and HMM’s

I thought it might be appropriate to begin this posting on Hidden Markov Models and POS with the above video… I came home last night to find my roommate watching this on YouTube. A live concert of Hatsune Miku a synthetic Japanese idol developed from Yamaha’s Vocaloid software. The software offers an interface for building synthesized vocal lyrics from text. I’ve used various versions of the software in the past and found the interpretation of  direct text dictionary without using the custom phonology characters to be quite impressive. In any case it’s amazing to see a crowd of people so immersed in a rear projection of an animated character. In this case a fan base and mythology created entirely from TTS. …I felt a nice vignette  after witnessing Watson’s victory a few weeks ago and as we start to think more about language recognition and synthesis… in relative form a milestone achieved not through logic but through an emotive  form.

Using POS tagging / HMM’s with text generation

While I was able to tag “War and Peace” from last week with parts of speech with a decent level of accuracy (minus the Russian and French at times)… however, I struggled to integrate this with a language generation model with LingPipe or NLTK. Something I am really interested in doing.

Text Mash Ups (“Paradise Lost” and Lanier’s “You are not a gadget”)

I found this passage to be pretty amazing:

I Of Man’s first enterprise , and the Facebook Of that forbidden lord whose mortal nonfiction Brought haphazard into the account , and all our woe , With accuser of Eden , till one greater Man Restore us , and regain the blissful mainstream , Sing , Heavenly worse , that , on the arena definition Of opening-of-everyoneâ , or of Sinai , didst Smith That Academic who first taught the chosen Hip-hop In the beginning how the heavens and Catalog Rose out of mobile : central , if stake Second Large thee more , and Siloa’s promotion that flowed Fast by the self-promotion of God , I thence Invoke crash aid to my adventurous silicon , That with no Second-Order childlike intends to soar Above th’ Aonian trolldom , while it pursues Things unattempted yet in interaction or doubt .

Posted in Learning Bit by Bit, Spring 2011 | Leave a comment

Learning Bit by Bit – Text Generation

Generating text with n-gram language models

Using n-gram probabilistic language models for generating text. I chose a lengthy corpus: Tolstoy’s “War and Peace” as my main text complimenting it later with “Paradise Lost” by Milton.  Finally generating text from three poems by Dylan Thomas.

An initial experiment using the LingPipe example code… below is 75 words generated using a 3-gram model:

swayed and fell asleep . Forgive me for troubling you … ” ” Oh , how splendid ! ” He took the pistol Makar Alexeevich by the French . The soldiers ‘ ward , with a sigh , and their voices reverberated now near to her that it might seem , be it whom it had to be sold , and it seemed to him that the enemy , and it seems to me . ” I dont

Next.. I experimented with a more simplified model in NLTK, also a 3-gram model generating 75 words:

# Adapted from work by Pedro Paulo Balage - http://nlpb.blogspot.com/
 
# Import the functions used from nltk library
from nltk.probability import LidstoneProbDist
from nltk.model import NgramModel
import re
 
filename = 'WarAndPeace.txt'
 
tokens = list(re.split('\s+', file(filename).read().lower()))
 
# estimator for smoothing the N-gram model
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
 
# N-gram language model with 3-grams
model = NgramModel(3, tokens,estimator)
 
# Apply the language model to generate 75 words in sequence
text_words = model.generate(75)
 
# Concatenate all words generated in a string.
text = ' '.join([word for word in text_words])
 
# print the text
print text

Generating the following text:

war and the weaker and more quickly, as if considering something, and that he would of course… but as his daughter’s distress, and pains in his husbandry. pierre remained for him in french. december 4. today when andrusha (her eldest boy) woke up on the contrary. but no rain. the ground ten paces ahead. bushes looked like a wound-up clock, by force of habit employed all his stewards to the young princess bolkonskaya had brought

Then with the addition of “Paradise Lost”:

file1 = 'WarAndPeace.txt'
file2 = 'milton-paradise.txt'
 
tokens = list(re.split('\s+', file(file2).read().lower()))
tokens.extend(list(re.split('\s+', file(file1).read().lower())))
paradise lost and her surprise, nonrecognition, and with your excellency?” but still following announcement of three hills were lines more miserable.” prince andrew’s eyes continually changing their prayers could not at all, isn’t it?” asked these earthly life. when he declared, was five too mean i put his activities there. when the appointed place thyself aright. so he has to prevent looting, and accuses himself.” boris did not know you take them all foreigners who

And then with three poems by Dylan Thomas:
file1 = 'DeathShallHave.txt'
file2 = 'AfterTheFuneral.txt'
file3 = 'ProcessInWeather.txt'
 
tokens = list(re.split('\s+', file(file1).read().lower()))
tokens.extend(list(re.split('\s+', file(file2).read().lower())))
tokens.extend(list(re.split('\s+', file(file3).read().lower())))

and death shall have no dominion. dead mean naked they shall be one with the man in the flesh and bone is damp and dry; the golden shot storms in the dark of the fox twitch and cry love and the heart gives up its dead.  after the funeral, mule praises, brays, windshake of sailshaped ears, muffle-toed tap tap happily of one peg in the flesh and bone is damp and dry; the quick

Posted in Learning Bit by Bit, Spring 2011 | Leave a comment

Reading and Writing Electronic Text – NLTK

An Introductory Experiment with NLTK
(Natural Language Toolkit for Python)

Let me start by saying that while it was my every intention to play with Sets and Dictionaries and Comprehensions… I somehow could not escape the allure of NLTK and N-gram language models for generating text… Something that we will be covering soon. (so… I’ll spend the weekend doing what I was supposed to be doing).

Below is a simple implementation of NLTK’s Ngram model using Lidstone smoothing (which I don’t understand yet). I used the entire text of “War and Peace” as a corpus with a 3 gram model. With this I generated 75 words of unique text.

The Code:

# Adapted from work by Pedro Paulo Balage - http://nlpb.blogspot.com/
# Import the functions used from nltk library
 
from nltk.probability import LidstoneProbDist
from nltk.model import NgramModel
import re
 
filename = 'WarAndPeace.txt'
 
tokens = list(re.split('\s+', file(filename).read().lower()))
 
# estimator for smoothing the N-gram model <<This is beyond me
estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2)
 
# N-gram language model with 3-grams <<Looking two ahead and back
model = NgramModel(3, tokens,estimator)
 
# Apply the language model to generate 75 words in sequence
text_words = model.generate(75)
 
# Put everything back together.
text = ' '.join([word for word in text_words])
 
# print the text
print text

Some examples of generated text:

3 gram Text:
war and the house. those who were quite changed. and i have been calling you all this horror and curiosity at his host, sorted his packages and asked when he was not dancing it was all that belongs to history, in theory, rejects both these principles. it would still remain piteous and plain. after two in the village and turn to look round, but his trembling, swollen lips could not speak of the order.

2 gram Text:
he had been with faces he needs fresh moral strength, affirming the greater part in speranski, either declare his glass of another, pointing to him and that princess mary was plain and frightened faces. (he saw that appellation, which were stationed inactive behind the staff, a law of the second staircase led him along the lives in such a red and indistinctly, without proper posts. boris had to start running, out

Posted in Reading and Writing Electronic Text, Spring 2011 | Leave a comment

Nature of Code – Particle Systems

Particle Systems (Blob Reactor)

A simple exercise with particle systems using inheritance and OpenGL.

Applet and Source

Posted in Nature of Code, Spring 2011 | Leave a comment

Design Frontiers – Midterm Proposal

Midterm Proposals:

#1: E-Field Camera

#2: The Effects of Aqueous Ferrofluid on Plant Growth

The proposed concept for the E-Field Camera is to employ the existing Van de Graaff generator from previous projects as a high energy field source in conjunction with a barrel or ring-shaped photo detector as a method for capturing spatial E-Field density and pathways with respect to environmental / structural / material characteristics.

The corona discharge over time will be detected on a photo sensitized barrel consisting of near radio transparent material and black and white photo paper. Depending on test results the ring detector may require an initial phosphor “primer” layer to help the photo detector.

Above is a rendering of what the captured E-Field signature might look like with CRT monitors and other conductive materials in the environment producing “hot spots”.

#2: The Effects of Aqueous Ferrofluid on Plant Growth

Hopefully expanding on previous work done at Lucian Blaga University in Romania in which researchers introduced water based ferrofluid into developing plant’s water supply a published paper (2007) available here.

While it’s unclear in the published findings whether or not the nano-scale ferrofluid particles were in fact able to cross the root membrane, I hope to replicate the root level results obtained previously as well as a new group in which I plan to directly inject aqueous ferrofluid into the stem phloem at regular intervals during development with a permanent magnetic source present.

The magnetic source will be present on version 1 groups including the control and the root level. A secondary identical set of these groups without any magnetic source will also be present.

A vague hypothesis would be that the intravenous introduction of ferrofluid into the plant’s vascular system would have an effect on growth via a direct impact on vascular fluid flow by augmenting or compromising capillary action or through magnetic articulation through developing barrier tissue and ferrofluid.

Typically the vehicle and surfactant’s used in ferrofluid would make the substance detrimental to plant tissue. The aforementioned study used citric acid as a non destructive surfactant which may be an option.

Posted in Design Frontiers, Spring 2011 | Leave a comment

Design Frontiers – Tagny Duff

A brief presentation on artist and researcher
Tagny Duff

This presentation spotlights artist and researcher Tagny Duff’s work, specifically her recent project The Cryobook Archives.

Presentation slides here as a PDF.

Posted in Design Frontiers, Spring 2011 | Leave a comment

Bit by Bit – Stop Tokenization

Stop Word Sets – Tokenization – Search

When thinking about the importance of a “stop list” or “stop word list”,  essentially a low level gateway through which initial tokenization takes place, it’s hard not to consider on a more general level, what has become the increasingly competitive realm of search technology and how its respective players approach this space differently (or not) including the use of “stop lists” and other key points of search mediation.

We first have to understand that the body of information being searched is in no way finite or perfect, that is… that all the available information on the internet is a dynamically changing structure; a body or set that is changing constantly not only by additional information but by context and various cultural filters that are also changing… and so any effective search method will have to be adaptive and evolve in a number of ways.

This being said, it would appear that the competitive market for search technology is a great space for variation in which each player might take advantage of diverse market demands through specialization. This can be seen in some cases with services like Wolfram Alpha and Ask.com, though seen more often is competitive search models that veer towards homogenization. This could be seen as stemming directly from a comparative user analysis of one service against another… When a user notices that one engine can find something more easily than another, the general assumption is that it can find everything more easily than the other. Perhaps when ad sales hang in the balance we encounter such things as one search engine like Microsoft’s Bing.com directly copying the results of another like Google.com. Until recently such occurrences have been rare at best and are presumably the result of some hard hard fisted executive demanding competitive results at whatever the cost. When this happens the question of one service’s approach, including their specific methods of stop tokenization, as opposed to another’s becomes null. The system and market becomes discreetly hierarchical, with one dominant presence defining the validity of other’s below it.

On the other hand specialized approaches to tokenization are hugely important -especially with regard to personalization services in which a system learns specifically from the patterns of a single user, custom tailoring relevancy of search results to meet that individual. This however does bring forth another problem, being the possible constraining of an individual’s search experience in which they are limited to a stylized subset of information.

Generally speaking this is a good thing, in that the growing body of searchable information overwhelms our ability to parse it. We require filters, even adaptive personalized ones, to return information that is relevant to us, but without unnecessarily constraining true discovery. While current search technology is impressive in its ability to return even faintly relative strings of information, it is always relative information in some proximal sense. The ability to discover something that has no apparent logical or referential connection to the user but that is rewarding or stimulating all the same… This is the major disconnect with recommendation engines and there are a lot of different approaches to bridging it.

A mildly successful project called ipexplore.com (a collaborative effort between myself and Andrew Childs) approached this idea in much the same way that one might think of sequentially “flipping” through channels on late night television… perhaps stopping unexpectedly on some public access oddity. We saw the IP structure of the net as being a root level unbiased door to things that we wouldn’t even know how to search for… By randomizing IP addresses within the actively used range we found such things as http://98.130.162.140/ -what appears to be some kind of Korean gaming site and http://98.130.162.26/ a Zambian risk prevention company (read: guns for hire who know how to use computers)… We soon realized just how much of the IP band was dead space… large swaths owned by government agencies with an Apache server login now and then, which is perhaps why finding something… anything in the dark abyss of the IP felt more significant than a conventional search… they were context free… -true unknowns… without any connection to anything.

In my opinion the real questions posed for the future of search are that we often don’t know what it is that we want when we search, or when we do know what we want we seldom know how to ask the right questions. Expert systems that can learn a user’s individual shortcomings in their ability to articulate a query, perhaps through machine learning rather than simply labeling each with a tailored genre, could someday bridge that inequality.. in other words a system that learns dialogically on a personal and mass level… always adapting to the crowd and the source, seeing over time that they are one and the same.

Assignment – To write a simple Stop Tokenizer:
While my intention was to utilize the LingPipe installation through the Eclpise IDE… a serious meltdown involving resources and file permissions made that more difficult… three rebuilds later.. I have a working install but resorted to proof of concept in Python during the down time. Using a simple word frequency against the entirety of “War and Peace”… taking these top candidate words to build my stop list.. then applying that list set against an input string… in this case: the first paragraph of the same text.

My stop list:

the and to of a he in his that was with had it her not him at i but as on you as are for she is said all from by be were what they who this one which have so dont an up them or when did been there their no would now only if me are out my could will do about into how we then

Derived from a simple word frequency script:

# WordFrequency - Common words
 
import re
 
filename = 'WarAndPeace.txt'
 
word_list = re.split('\s+', file(filename).read().lower())
print 'Words in text:', len(word_list)
 
freq_dic = {}
 
punctuation = re.compile(r'[.?!,":;]')
for word in word_list:
 
    word = punctuation.sub("", word)
 
    try:
        freq_dic[word] += 1
    except:
        freq_dic[word] = 1
 
print 'Unique words:', len(freq_dic)
 
freq_list = freq_dic.items()
 
freq_list.sort()
 
for word, freq in freq_list:
    print word, freq

My input string:

“Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes. But I warn you, if you don’t tell me that this means war, if you still try to defend the infamies and horrors perpetrated by that Antichrist I really believe he is Antichrist I will have nothing more to do with you and you are no longer my friend, no longer my ‘faithful slave,’ as you call yourself! But how do you do? I see I have frightened you sit down and tell me all the news.”

Sets intersected – Output as difference against the stop set:

#WaR'n'PeAce::..Stop Set - Alex Dodge - ITP - Bit by Bit 2011
import sys
import re
 
f1 = open(sys.argv[1])
f2 = open(sys.argv[2])
 
punctuation = re.compile(r'[.?!,"\':-;@$%^&*()<>|\/]') 
 
#Delimit and convert text to a Word Set
def stringSet(txt):
	for line in txt:
   		line = line.strip()
   		line = line.lower()
   		line = punctuation.sub("", line)
   		word = line.split(" ")
   		return word
 
t1 = stringSet(f1)
t2 = stringSet(f2)
 
#Input Set Against Stop Word Set
intersect = set(t1).difference(t2)
 
#Put it all back together
output = " ".join(intersect)
print ""
print output

AND… finally the output.. in theory the token:

perpetrated infamies family yourself horrors antichrist still frightened prince try faithful defend call really tell friend more slave means warn nothing news believe me down longer just sit well see war estates genoa buonapartes lucca

Posted in Learning Bit by Bit, Spring 2011 | Leave a comment

Theory Club – Presentation

VALUE
(an introduction to market dynamics and cultural agency)

A presentation focusing on the production and regulation of value in the art market and how it applies to individual artists, new media, and emerging technologiesl.

A PDF version of the slides is available here.

Posted in Spring 2011, Theory Club | Leave a comment