Hi everyone,
Thanks for all suggestions. Let me just preface this by saying that Im new
to both python and programming. I started learning 3 months ago with online
tutorials and reading the questions you guys post. So, thank you all very, very
much
and I apologize if Im doing something really stupid..:-)
OK. Ive solved the
problem of opening several files to process as a batch with glob.glob(). Only
now did I realize that the program and files need to be in the same folder
.
Now I have another problem.
1- I want to open several files and count the total number of words. If I do
this with only 1 file, it works great. With several files ( now with glob), it
outputs the total count for each file individually and not the whole corpus
(see comment in the program below).
2- I also want the program to output a word frequency list (we do this a lot
in corpus linguistics). When I do this with only one file, the program works
great (with a dictionary). With several files, I end up with several frequency
lists, one for each file. This sounds like a loop type of problem, doesnt it?
I looked at the indentations too and I cant find what the problem is. Your
comments, suggestions, etc are greatly appreciated. Thanks again for all your
help. Paulo
Here goes what I have.
# The program is intended to output a word frequency list (including all
words in all files) and the total word count
def sortfile(): # I created a function
filename = glob.glob('*.txt') # this works great! Thanks!
for allfiles in filename:
infile = open(allfiles, 'r')
lines = list(infile)
infile.close()
words = [] # initializes list of words
wordcounter = 0
for line in lines:
line = line.lower() # after this, I have some clunky code to get
rid of punctuation
words = words + line.split()
wordfreq = [words.count(wrd)for wrd in words] # counts the freq of
each word in a list
dictionary = dict(zip(words, wordfreq))
frequency_list = [(dictionary[key], key)for key in dictionary]
frequency_list.sort()
frequency_list.reverse()
for item in frequency_list:
wordcounter = wordcounter + 1
print item
print "Total # of words:", wordcounter # this will give the word count
of the last file the program reads.
print "Total # of words:", wordcounter # if I put it here,
I get the total count after each file
sortfile()
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor