Deep Search Explorer
Discovering the Unknown
Deep Search Explorer is an extension of a previous project titled IPExplore that randomly searches IP addresses. IPExplore was an attempt at regaining a less mediated experience of discovery on the internet; inspired in part by a childhood experience of the internet and its precursors as well as late night TV channel surfing. During the days of dial-up and BBS the net was a much less accessible sphere which often meant that when you did find something interesting, that site had potentially more personal value simply through the ownership of “discovering” it. There is without question many times more information reachable now with efficient and pervasive search mediums like Google, but it is their inescapable pervasiveness that shapes our experience; Google is a very well engineered version of the internet, but it is still just one version.
Among the problems encountered by IPExplore is the fact that while a large amount of the active IP range has been claimed, most of it is publicly unused on port80 (http). This is made clear by how many IP addresses can be cycled through, often hundreds, before an active site is reached. A fitting analogy might be the distribution of matter in our universe with what can in relative human scale terms seem like so much empty space. This realization, as well as Google’s page rank algorithm, is how I reconsidered the function of something like IPExplore. Deep Search finds inspiration and its moniker from the Hubble space telescope, particularly its Deep Field images, the most remote objects seen in our universe. Deep Search could be thought of as a specialized telescope that points to a random coordinate in the sky hoping to find what few knew was there… particularly objects that robust search engines can not see.
Something to stress here is that much of Deep Search and IPExplore’s aim is to facilitate the experience of personal discovery through a specific reduced context; one that essentially has no content expectation… that is because the user has no idea what will be returned by it the returned object is essentially experienced in a reduced and hopefully unique sense.
How is Deep Search different?
While IPExplore does not repeat any addresses, its method of iteration over the potential 4 billion useable IP addresses is essentially random without any method of recognizing recurrent usage patterns that might help to predict which regions to look at next… other than a user option that limits the search within a specified address range such as 188.8.131.52 – 184.108.40.206. Deep Search hopes to implement forms of regression and classification to make its hit rate much better. Another issue with the previous project was that it was never clear what to do once a page was found with IPExplore. A tagging system was implemented which in theory just replaces DNS and search indexing with content tags, such as: “timeout”, “personal”, “business”, “adult”. While it did require active user engagement (a good thing), amassing a database of over 10,000 tagged IP’s, it was never understood how those tags would be useful or accurate for other users. A revision to this comes directly from Google’s PageRank method which assigns a rank to a page based on the number of back links or other pages linking to it. This approach makes sense when attempting to define relevance and value to accessible information relative to a query, but Deep Search is in a sense interested in the opposite. Instead of a high back link count, Deep Search is interested in low numbers of back links primarily it is interested in a rank of zero, or true orphans.
The discovery method applies a rank within a specified threshold based on the number of back links to the page. Back link checking could simply use existing search services like Google since our definition of mediated “knowns” and less mediated “unknowns” is tied closely to or defined by them. The optimal rank here is 0, but the probability of finding “zero objects” is something like finding dark matter, so we employ a “Discovery Rank” threshold to populate our “Discovery Feed”.
Understanding range usage and trends should allow for an optimized iteration over IP addresses. Within Class A, B, and C IP ranges (the ranges used on the internet) we exclude certain ranges from our searches such as loopback(127.0.0.0), zero addresses (0.123.123.123), and private ranges(192.168.1.1).
Within these ranges we can use existing sample IP data to generate search IP’s with each byte range within a predictive n gram model or regression.
Rather than a single IP search instance the “discoveries” would join an updated feed that could be followed and possibly searched. This adds the possibility for community exchange.
Web Client (collaborative discovery)
Allowing an active search instance of Deep Search to run in a browser tab many collective users could contribute discoveries in a volunteer computing – like model.
Using a structured random search with scarcity or remoteness as a ranking feature is a potentially valuable approach in other cultural mediums. An example of a similar approach is recent research published earlier this year (2011) at The University of Texas Southwestern Medical Center in which a viable Alzheimer’s treatment (P7C3) was discovered through a brute force approach, testing thousands of organic compounds with mice.