In working with clusters, I was reminded of the old statistics saw that “correlation does not necessitate causation” (or something to that effect). The point being, among other things, that just because clusters are identified does not mean that there is any underlying meaning behind the clusters.
Unsurprisingly, what sounds at first glance like a relatively dry and academic point is actually a tremendously political issue and one that has serious implications for many people.
Human beings are “programmed” to identify patterns. This was an incredibly important evolutionary development. The human beings that only saw random stripes or dots in the tall grass were far less likely to reproduce and pass along their genes than the human beings who were able to “connect the dots” into something with a long tail and big teeth. Pattern recognition is a survival function.
However, sometimes patterns don’t always exist. Evolutionarily speaking, it’s better to think you saw a tiger and be wrong that fail to see a tiger that’s actually there. So evolution favors false positives over false negatives.
This means that we often simply see patterns that just aren’t there. If this is an issue of marketing a new soda to a demographic that actually doesn’t have any interest in an exciting new soft drink, that just means lost money to a company. If clustering suggests that a new drug may increase the risk of heart attacks, then, unfortunately, an otherwise useful drug may not be brought to market. When clustering suggests to us that there is something causing cancer in a population, you suddenly have a very very serious issue.
Cancer clustering is one of the more controversial issues in oncology and law right now.
And to be clear, the issue is not one of protecting companies who may have been polluting the environment and causing cancer. The issue is one of resources for the rest of the medical community.
Human beings are very good at seeing patterns, even if none exist. Lawyers are similarly very good at convincing people that patterns exist .
As shown below, random dots will cluster. This graphic comes from the State of Delaware Department of Public Health. It’s a graphic intended to demonstrate the potentially misleading appearance of clusters.
The following is an interesting article from Oxford University Press on the same issue. In short, clusters are real. They are also, at times, incorrectly identified. The resources (and here, I’m not concerned about the companies so much as the EPA, the CDC, etc.) are enormous.
However, I simply have no idea of how you would go about telling people who have sick family members that it’s a statistical anomaly, or that there are inadequate resources to pursue it. And there are always lawyers lawyers lawyers looking to intervene in both sides of the argument.
On a totally unrelated note, there are amazing studies that have been done with clustering of the Enron email corpus. The data is readily available and quite bit of fun to play with. Here are some interesting images links: