The PageRank algorithm introduced by Google, in simple words, uses three strategies for evaluating the importance of a web page. It first uses the link reference of a page, meaning that the more other pages link to a page n, n’s ranking is augmented. Furthermore the amount by which it is augmented depends on the ranking of the donor pages, which contain a link to n. The ranking value is divided into the amount of total links contained on a page and a corresponding fraction is given to a linked page. So if the NY times site contains a link to your page, it will augment your sites ranking by the value of NY times site divided by its total outgoing links. This is assumed to create a general evaluation of web pages that corresponds to our subjective perception. The third element is that of a personalized search. The ranking can be even more corrected by trying to adapt to a users personal interest. This can only be done by collecting or observing a user’s behavioral data. Such data can be the default home page set on a browser, a user’s bookmarks and his search queries.
The system is not a new system proper to web searches, but claims to follow common subjective sense. However, it does not reflect general common sense, but rather, it reflects the common capitalistic free market approach that dominates our thinking in this part of the world. Similarly, if you work for a large corporation, your market value as a work force is enhanced by the reputation of that corporation, at the same time your role is decreased by the sheer amount of employees the company hires, or by the amount of equal positions available. The optimal position to be in is one where high prestige is coupled with relatively low competition. In other words, cleaning the toilets at Microsoft won’t hugely improve your personal market value.
How effective is the system? We can only answer this question in a limited manner. The problem is that the tendency is to believe in a system that creates or represents truth about quality or content of the www. The internet becoming more and more embedded into our perception of reality, that content becomes ever more defining in our perception of the world. What one needs to remember, is that the algorithm is not based on mathematical or physical truths. It is a formula that attempts to recreate our subjective and human perception of value. It is adapted so as to match our judgement of the importance of content. In other words, there is no fundamental value in it without the subjective view of a human. And again, it not designed to match generally applicable human values, but the ones of a specific society and age group. This is not to say that the system excels brilliantly well applied to the tasks it was designed for and the results are satisfactory. But not perfect. And as always, we will forget the fact that it is at its core an aberration on human perception. Thus the system will shape our perception and we won’t know what was here first. But those are rather general issues that apply to systems in general.
Let’s get back to the realities of web queries. I see two issues with the current systems: One is that in order to get further optimization in search results one can not do without accumulating user data. But this raises privacy issues related concerns. The other is that, still, we are not evaluating any actual content, we are only looking at structure or map created by links. Also, the importance of a major website is already assumed, therefore its content value is already determined and not put into question. It is difficult to change this set hierarchy, so a small website with excellent content might have troubles being seen to begin with.
A solution would be to add some more chaos to the system, so as to have opportunities to constantly rebuild link structures. The added chaos would in the end make for a more dynamic and human representation, also one, that doesn’t claim to represent truth.
For instance, for every search the engine would randomly pick a low ranked web page and display it amongst the top ten search results, thus giving it a chance to be seen. The engine would keep a record of when a low-rank page was giving this privilege, so as to avoid having one site being presented as a high-rank page too often. This would allow the user to see a potentially worthy page that had no chance to be seen otherwise. Additionally, the engine would need to come up with a system to collect feedback regarding the relevance of the pages contents. After a user clicked on any page suggested after a search query, the user has the option to rate its relevancy. The important thing is that the search engine would not collect information about the user, but rather, would collect the correlation between a given search query and the subjectively rated content relevancy of a chosen web site. That data would then be added as a new factor for page ranking.