Clustering data from IPExplore
As an exercise I chose to use a database log from a previous project called IPExplore that visits IP addresses randomly. The full data comprised over 10,000 IP addresses of which roughly 600 were “hits” or actual websites of some kind and the others logged as network timeouts. I was interested in seeing if any discernable patterns would emerge from the “hit” IP’s vs. the timeouts. Below are two very simple sketches, one with 600 “hit” IP addresses and then the other with 600 timeouts. In the “hits” data, clear stratification can be seen which is encouraging as I work toward a more intelligent implementation of IPExplore for a final project.
While the clusters are present below in the “timeout” data the distribution, as expected, is fairly even. Something to note however is that the way IPExplore can sub-explore a “hit” IP range, essentially iterating through lower byte ranges successively up through the scope of the IP. This probably accounts for some of the consistency seen above but not all of it. This feature in the current build is not automated, but as I move IPExplore toward “Deep Search” this will be automated as the search component will no longer be interactive.

