Category: BitxBit

The Dangling Node

The PageRank Algorithm is pretty complicated and incredibly ingenious thing.
The idea of having the web determine the importance of a page is based on how many pages link to pages as well as their importance, which in turn is the importance of the site.
This guarantees a communal web based recommandation system, rather than a biased search engine approach.
I am trying to understand it fully, especially the probabilistic aspects of the algorithm.
The biggest concern I have, and I think the biggest drawback of using this algorithm for all page ranking, is the inevitable standardization of the search results, as they are based on the importance of the sites, not to mention the fact that its an averaging tool, defeating any results that might be unique or unpopular. Does this approach homogenize the web world?

I did some research and there are a few variation on the algorithm such as the HITS algorithm.
I wonder what the differences in search will be when comparing these two algorithmic approaches to get the results?

Cyanide and Apples

A brief composition of a few paragraphs of a persuasive nature on “You Are Not a Gadget” by Jaron Lanier, a book that reads like a manifesto…

Having read parts of “You are not a Gadget” in various contexts, whilst seeking other connections, I tend to find Jaron Lanier’s thoughts elucidating in their brevity, both as representing a historical context for modern computing structure and the correspondingly social realities inherent in such a discussion.
The aspect of his intellectual formulations, the act of “christening” the fundamentals of interconnected co-existence in a modern world, frames the situation that places the quo at a mercy of a priori historical mandates and decisions; concepts such as an implicit responsibility of software system design,in relation to the possible lock in.

I think one of the most compelling moments for me in the book is when he discusses Alan Matheson Turing. His initial approach of adoration and, a fleeting but evocative portrait of a human mathematician and the legacy he left behind, coupled with a quick analysis and critiques of the failures of the eponymous artificial intelligence test.

The interesting aspect this test lies in the idea that if a human can be fooled into thinking a computer is another human, then the deduction is that, this machine possesses intelligence i.e passes the Turing test.
What is interesting about the point that Lanier is arguing is that the Turing test can be passed, not because it represents a machine capable of superiour intelligence rivaling human, but by lowering the standards of how such intelligence is perceived. This is an important distinction in deduction. The discussion here revolves around people degrading themselves and their “personhood” the make machines/computers appear more intelligent. Degrading carries too negative of a connotation but the argument is a valid one- the fact that the test is dependent on a the illusion of relativity.
This can be seen even in recent times with the victory of WATSON over his human opponents in Jeopardy, or even historically with DEEP BLUE victory over a best in the world chess player, Gary Kasparov.

We necessitate an illusion, even though it is clear that this observed intelligence is both singular and limited, applicable only in specificity.
We remove the team of scientists who stand behind both personifications of AI, in order to believe in the intelligence of both AI personifications.
Lanier’s point is brief, and yet philosophically it resonates strongly with the way attribution of qualities becomes a function of how those qualities are interpreted.

POS_Visuals and Hidden Markov Chains

I continued my text experiments from last week with Hidden Markov Chains to analyze the same combination if texts. Still using the IndoEuropean Tokenizer, in addition to Regex Tokenizer.
I made some slight modifications to the code switching between output of Nouns and Verbs-NN/V.
I found that the trained Brown Model (CK 01-027) gave me dubious results, I assume based on the hidden statistical probability of the Markov Chain.
For each instance, words that are obviously either verbs or nouns, are being chunked, tokenized and tagged as different POS.
Not sure why there isn’t much precision, but I am still trying to wrap my head around the code, and perhaps the CK portion of the Brown Model is not the most appropriate training source.

Visualizations based on random RGB values.

Screen shot 2011-03-04 at 3.39.37 AM

NGram efyouen+ Automatic Writer

Reading through the concept and execution of N-Grams , I am reminded of a short story by Isaac Asimov of a writer and his relationship to a robot- his assistant. This relationship begins as a simple master and servant model, and develops further, as the writer attempts to nurture the robot literary aspirations and trains him on literary models(dictionary, his own books etc), much like the concept of the N-Grams. Ofcourse the robot is so successful that at a certain point the writer feel so threatened and has to destroy the robot, a threat to his unquantifiable humaness.

The idea of using N-Grams in addition to tokenizers is exciting just for the possibility of creation, a somewhat random yet methodical way of creating text.
The results are often quite tragic from a literary perspective.

Experiment 1
Using 3 N-Grams in a 30 word output.

I was curious to see how a machine trained on 2 separate texts; both separate in time and in content. I chose Dracula by Bram Stoker and Art of War by Sunzi. Both books are fascinating, Dracula a classic, not to mention a haute topic nowadays, and Art of War is an extremely intelligent book about war strategy with a very novel approach. Its basis lies in ideas of dealing with the context of the specific situation, in the moment. It prescribes a sense of malleability, quick change and necessary adaptability, based on any number of changeable factors, rather than standard and constipated strategic formation, forced to fit all situations.

Here are a few versions of what was generated using both texts (Both can be found on Project Gutenberg)
The results tend to be quite poetic. Pushing to more N-Grams gives me a memory error “Java Heap Space” which sounds like its communicating with the ISS and is getting an error. In any case I need to look into in addition to getting more familiar with Java in general.
Another concept I would love to try if the inverse of combining 2 poems will create a more literal text. This is something I hope to try next week…

when men believe not even feel it wet against my darling ! But there was a strange thing happened , and as yet . ” Ho Shih gives as real instances of strength

long and black moustache and grasping anything on which the enemy ‘ s face darken and draw together , and I can see superficially how a battle makes many calculations lead to victory

criticism : ” When you have done . But he continues : ” A brazen face and dispelled altogether the gloom of horror ! What devil or what manner of rooks

Stop Word Tokenizers in Search Engines

After doing some research in search engine stopwords I came up on a very interesting concept regarding Google’s approach to search as a multi stage query process.
I quick search through the web, on Google no less, revealed a copy of their patent for this specific search process from 2008 .
It appears that even back then, Google was fighting with the limitations of stop words, and perhaps stop words are becoming an antiquated and inflexible way of considering meaningful and relevant search.
An interesting point, specifically related to Finite State Transducers as they relate to search engines, is that on a very basic scale of utilizing the Automata, in this case the specific ” finite of complex symbols” is actually based on an index built by each specific search engine and corresponds to the bias inherent in each indexing whether algorithmic or simply based on paid sponsorship.

Nontheless, it is very interesting to see the application of their(Google) methods to stop words in general- whether they are ignored or not. In fact it appears that there is an effort in looking at meaningful relationships between stop words and actual search terms, rather than simply ignoring them.

One approach is the creation of an “exceptional” list which compares clusters of words/phrases that are known to be more relevant as a group. In this way, stop words with meaningful relationships are included in the search.
Another approach is to try and do a search with and without the stopwords.
The results from both lists will be compared for similarity and the most meaningful results will be extracted.
Some searching revealed a basic short list specifically for Google when not associated with the methods above-(I, a, about, an, are, as, at, be, by, com, for, from, how, in, is, it, of, on , or ,that, the , this, to, was, what, when, where, who, will , with, the, www).

An informative if outdated comparison(2009) chart of how the major search engines, BING, YAHOO, GOOGLE and ASK approach stop words in frequency. A big realization is that while the stop words might be similar, it is the relational difference in the way they are approached in determining the relevance of results based on comparing those words to an index.
Also even now, it appears that Yahoo search is powered by Bing in the US and Google in Japan.
Perhaps this is due to the difference of how tokenizers opearate in different characters/languages?

chart comparing estimates of the number of results for common words in Google Caffeine, Google, Yahoo, Bing, and Ask.

Turing / Searle on Artificial Intelligence

A brief overview on 2 concepts.


Computing Machinery and Intelligence, Alan Turing

Interesting concepts for Alan Turing.

Alan Turing aims to refute a series of theories directed in negating the ability of a computer to think now and in the future, or even emulate thinking in a satisfactory way.
He sets up a test, as a control, later to be known as the Turing test.
The test involves 3 individuals, one acting as an interrogator, aiming at finding out which individual is an imitator.
No voice, no image and no indication is provided, as all responses are delivered standardized.
His proposal is that to consider something intelligent or thinking it must fool the interrogator. If it does so then it passes the test.
He presents some very interesting theories at the end of his paper in simulating human intelligence, by suggesting that the child brain be used as a model. To him it appears that a child’s brain is something like a notebook as one buys it from the stationer’s. It is a little mechanism, so little in fact that it can be easily programmed.
By separating the process as such, he can separate the mechanism of the brain and move forward with programming.
This is an interesting concept as it draws some very interesting anthropomorphic aspects of evolution/mutation and subsequent education to the notion of machine learning and intelligence.
One of the funnies refutiations in my mind is his refusal to even refutiate a statement like Head in the Sand.
I think overall he brings up a lot of very interesting arguments based on a test that he created specifically to determine if intelligence can be mimicked and not only that but also fool an interrogator.

Minds, Brains and Programs, Searle

Weak vs Strong AI

Weak AI is a tool to test psychological explanations.
Strong AI is literally said to understand and have other cognitive powers which become psychological explanations

Criticism of Schanks program in
1. the machine can literally understand the story and provide answers to it
2.what the machine and its program do explains the human ability to understand teh story and answer questions about it.

Answer is the Gedankenexperiment/Chinese Room experiment.
Just because one can correlate symbols to language, does not mean that the language is understood. Performing calculations on formally specified elements, does not equate understanding. The main point being that as long as the program is defined in terms of computational operations on purely formally defined elements, what the example suggests is that these by themselves have no interesting connection with understanding.
The very interesting aspect of his argument is that it passes the Turing test but is disproved based on the lack of relations between the 2 system- English and Chinese.

Another response is the Robot Reply
In fact this is much like the Gedankenexperiment but with the addition of tacit physical manifestations of expressions that aid in the perception of appearing to understand and be human. Even though these expressions exist we still do not have a mode of understanding but rather a redefined consequence to formally specified elements.

The brain simulator reply
If we mimic the synapse structure of the brain, then surely that means that we are achieving understand.
Digression on the idea of Strong AI- not necessary or important to know how the brain works, to figure out how the mind works.
The rebutall, ofcourse is wrapped up in the Chinese Room. We have the same man, but in this case he is operating a complex set of pipes mimicking the synapse structure of a Chinese Man.
The man is still following instructions without actually understanding what he is doing. Clearly following the formal structure of the brain is not indicative of achieving understanding- or assumption of intentionality.