I analyzed word frequency across emails. First, I downloaded all of my sent email as .eml files using a Python script that was created by Sean McIntyre. I then imported sets of the .eml files into processing. I had to limit the number of emails in order to not run out of memory. I have a ton of attachements that were downloaded with each email.
As I continue to work on this, I’m going to modify so that I ignore attachments and only look at text.
I brought each set of emails into Processing based on the WordFrequency across files sketch, then isolated the words between 4 & 8 letters and turned them red to make them more visible. I made any word with a frequency above 25 pink.
Here’s an analysis of Feb. 11th to now:
Here’s an analysis of Jan. 8th to Feb 11th:
I then compared those with email sets from around the same time last year.
The logKext is also now installed, so I’m going to try some analysis with that when I get more data.




