Search Engine Algorithms on the web, face a very interesting challenge. Out of the enormous amount of data present, providing an accurate result ( or almost one), based on a user’s query, can be lead to extreme complexity.
Page Ranking algorithm looks at the whole World Wide Web as a global structure and the intention is to rank the web pages globally by providing a ranking. For the same, it intends to use the concept of graphs in its concept. Each and every web page is being considered as a node. If a reference or a link is present, which connects the web page to another, that would serve as a link between the pages, or in other words, a link connecting two nodes in the graph.
Furthering the concept, it, then, checks for the incoming and outgoing links, which are associated with a node ( or the web page). A web page, with a higher number of incoming links, would have a higher rank in the system. Similarly, a link, connecting a node (say A), where A has a larger of outgoing links, to another node ( say, B), may not bring a significant impact on the ranking of the node B, since the probability of a user, referring to the node B from node B, will be less in this case, considering the large number of output links.
The above is a very simplistic model of the algorithm.
Looking at the above algorithm, a couple of questions do come up:
1. How does the algorithm deal with the situation, if the number of websites is very large?
2. What, if a website, has no output links?
3. What, if a website, has no input links? What would the rank of the website be in that case?
4. Considering a scenario, in which there is a cycle of nodes. A cycle would be a situation, in which , say there are 3 nodes, A, B and C. So, if A->B, B->C and C->A, then, in some ways, A is contributed to its own ranking. Is that a valid case?
5. Is there a way in which a website can be made to have a false higher ranking?
The above are the initial set of questions which I have. Hopefully, a deeper study of the algorithm, would surely help…!!