In working with clusters, I was reminded of the old statistics saw that “correlation does not necessitate causation” (or something to that effect). The point being, among other things, that just because clusters are identified does not mean that there is any underlying meaning behind the clusters. Unsurprisingly, what sounds at first glance like a [...]
Once I started really looking at this, I was utterly amazed at how it affects every aspect of our interaction with computers. And how it has become, in essence, more noise to be ignored than useful. Spell-Check / Autocorrect — Everywhere, everyday. Occasionally useful in word processing, less useful and more annoying in GMail, genuinely [...]
I had an argument once with a colleague about the quality writing and communication and the popularity of those communications. In short, my colleague argued that even if another colleague (a “coac” or “colleague of a colleague”) wrote a blog that only 10 other people read, it was still a good blog. My counter was that, [...]
Oh, spelling and grammar. My greatest educational and professional bane. My (close to) greatest source of embarrassment. Until roughly the late 19th century, spelling for most people, even the “educated classes” was essentially a process of deciding what looked good. It wasn’t even probabilistic so much as optimistic. Note that Terry Pratchett described one character’s [...]
I tried a few different datasets to generate the text. The first was a set of press releases. The second was from filings made by companies with the Securities and Exchange Commission. press-releases text-release-generated 10q_mda generated
“Stopwords” are those words (and potentially phrases) that search engines and search parsers filter out from the query. In my own experience, we frequently refer to them a “noise”. Typical examples include “a”, “an” and “the”. Most electronic content management systems (“ECM’s”) which store large quantaties of text data will remove these from the full text [...]
I’m still playing with this and getting the code to work as an applet embedded in a WordPress page is proving more complex than I thought. But, it is working and compiling and was a total hoot to play with. For those, like me, wrestling with RegEx, there are several extremely good websites on it [...]
I’m a little embarrassed to admit that I originally read Searle’s piece more than 25 years ago. I’ve diligently re-read it, together with re-reading Turing’s piece. And, in truth, I still think Searle is hiding the ball when it comes to the core issue. In short, Searle makes completely valid arguments through about 70% of [...]