Stop Tokenization

Stop Tokenization provides an effective mechanism to break text into smaller pieces called tokens which can be used for further parsing. This provides further pathway for processing data and using it further.

Search engines are a place where the concept of stop tokenization is used. This method would provide search engines simplified raw materials through which they would be able to provide effective results to the users, who are using these engines. Each of the search engines maintains it own list of stop tokens and thus, has control of the search results it provides. Search engines like Yahoo, Google and Bing may provide different search results because of the same.

In my opinion, this has its own advantages and disadvantages. The advantage includes providing the user with options through various systems for getting better results. This, indirectly can foster competition, among the engines and provide a better granularity of the data requested. At the same time, this can also be looked at with, the perspective of inconsistent data which is provided by the search engines ( which can also be termed as API’s for the immense amount of data stored on the internet). These inconsistencies can mislead the user. A concept of the standardized stop list by a centralized organization would greatly beneficial.

Overall, I do feel that stop lists are essential for the success of the search engines, as well as the internet since the world today is primarily being data driven. In this context, it is very important that access to this data is fast and robust and stop tokenization helps in the same.