email footprint with word frequency

I analyzed word frequency across emails. First, I downloaded all of my sent email as .eml files using a Python script that was created by Sean McIntyre. I then imported sets of the .eml files into processing. I had to limit the number of emails in order to not run out of memory. I have a ton of attachements that were downloaded with each email.

As I continue to work on this, I’m going to modify so that I ignore attachments and only look at text.

I brought each set of emails into Processing based on the WordFrequency across files sketch, then isolated the words between 4 & 8 letters and turned them red to make them more visible. I made any word with a frequency above 25 pink.

Here’s an analysis of Feb. 11th to now:

feb1113_feb2713a

Here’s an analysis of Jan. 8th to Feb 11th:

jan0813_feb1113a

I then compared those with email sets from around the same time last year.

Jan – Feb 12
jan

Feb
feb12

The logKext is also now installed, so I’m going to try some analysis with that when I get more data.

Comments are closed.