Analyzing word frequency in my emails using Python (Collected data from one week)

Yucefs-MacBook-Pro2:ITP-ROY yucefmerhi$ python wordfreq.py

This program analyzes word frequency in a file
and prints a report on the n most frequent words.File to analyze: emails
Output analysis of how many words? 60
de           81
la           54
que          45
en           34
a            31
y            31
el           26
mi           24
por          23
es           20
un           20
una          19
http         17
i            17
las          16
para         16
los          15
como         14
me           14
the          14
con          13
to           13
com          12
no           11
tu           11
and          10
arte         10
lo           10
se           10
yucef        10
al            9
te            9
you           9
dos           8
cibernetic    7
del           7
is            7
it            7
muy           7
página        7
años          6
bien          6
he            6
hola          6
org           6
www           6
cualquier     5
espero        5
gracias       5
mis           5
mucho         5
my            5
número        5
sin           5
web           5
artista       4
at            4
entre         4
for           4
have          4
Yucefs-MacBook-Pro2:ITP-ROY yucefmerhi$Code

# wordfreq.py

import string

def compareItems((w1,c1), (w2,c2)):
  if c1 > c2:
		return - 1
	elif c1 == c2:
		return cmp(w1, w2)
	else:
		return 1

def main():
	print "This program analyzes word frequency in a file"
	print "and prints a report on the n most frequent words.\n"

	# get the sequence of words from the file
	fname = raw_input("File to analyze: ")
	text = open(fname,'r').read()
	text = string.lower(text)
	for ch in """!"#$%&()*+,-./:;<=>?@[\\]?_'`{|}?""":
		text = string.replace(text, ch,' ') 
		words = string.split(text)

# construct a dictionary of word counts
	counts = {}
	for w in words:
		try:
			counts[w] = counts[w] + 1
		except KeyError:
			counts[w] = 1

	# output analysis of n most frequent words.
	n = input("Output analysis of how many words? ")
	items =counts.items()
	items.sort(compareItems)
	for i in range(n):
		print "%-10s%5d" % items[i]

if __name__ == '__main__': main()

Comments are closed.