Source Files:

archive.zip

Includes:

•A2ZFileReader.java

•A2ZFileWriter.java

•A2ZUrlReader.java

•SourceFleschIndex.java

•SourceText.java

•SourceWord.java

•SyllableCounter.java

•Crawler.java

•Crawling.java

•MySpaceFleschIndex.java


Java Docs

J Docs - package a2z

J Docs - package mySpace



Programming from A to Z >>> Midterm >>>>>>>>>> Christian Croft


For my midterm, I am working towards implementing an application in which mySpace profiles collectively "read" through Freud's text Three Contributions to the Theory of Sex. The application begins at a specified profile and analyzes the "About Me" section of that profile. First, the program counts the number of words in the About me section. If the number is above 200, then we have our first reader, else we spider to the next profile until we find someone verbose enough ( I read somewhere that Flesch Indexing only really makes successful approximations at a word count above 200, and many mySpacers can't think of anything to say about themselves. ) When we find a 200-plus word about me, the program determines the Flesch Index, or reading level, of this body of text [1].


When the program finds a worthy "reader," it loads in a text file that contains a paragraph of Freud's Three Contributions. Then, we measure the Flesch Index of this body of Freud text and determine the difference in the reading levels of the mySpace user's profile and the Freud paragraph that we want that user to "read."


Next, the program finds words at random with more than 3 syllables and replaces them with synonyms of lesser syllables. For each word that is replaced, the Flesch level difference is decremented by 0.75 for each syllable that is trimmed from the final text. So, for instance, if "aberration" gets replaced by "flaw," there is a 3-syllable difference between the two words, so 2.25 gets subtracted from the Flesch difference. The program continues to find words and replace them at random until the Flesch difference is less than 1. [2]


At this point, we have a recombination of the original Freud text that this unique mySpace reader might be more likely to understand. This new String of text is passed to a text to speech engine (Free TTS) that reads out this unique "reading" of Freud.

Here's a sample comparison of original paragraph to processed or "mySpaced" paragraph of what the program should someday output:



In this first version of this project, several things are lacking in the application. Currently, the program only completes one round of the process: a sufficient mySpace profile is found, its Flesch Index is determined, a Freud paragraph is loaded, its Flesch index is gathered, and the text is reassembled and sent to the speech engine. I installed and got the examples working for WordNet and the Java Wordnet Library [3], but I haven't figured out yet how to successfully integrate this into my application. So, the program isn't finding synonyms for the words yet, just using a dummy word as replacement until the Flesch difference falls to less than 1. Other issues include tweaking the speech engine to read out the text at a normal speed (right now it's flying incomprehensibly through the text) and implementing threads so that the program can prepare itself for looping.



Footnotes:

[1] As mySpace profiles tend to be more informally written, using a Flesch Index formula on them can return some misleading results. The main difficulty here is that the Flesch Index puts such a high priority on word per sentence count, and many mySpace users tend to write run-on sentences and long lists of comma-separated items. I'm thinking of ways to work around this...perhaps treating a number of words beyond a certain threshold coming between two commas as a sentence.

[2] This algorithm keeps track of which words it has already visited, so it won't replace a word's replacement synonym with another synonym.

[3] Installing FreeTTS, WordNet, and the Java WordNet Library were somewhat difficult. I took some notes though, and I'll post some tutorials for those interested in using these tools as soon as I can.