On 16/10/13 19:49, Pinedo, Ruben A wrote:
I was given this code and I need to modify it so that it will:

#1. Error handling for the files to ensure reading only .txt file

I'm not sure what is meant here since your code only ever opens 'emma.txt', so it is presumably a text file... Or are you
supposed to make the filename a user provided value maybe
(using raw_input maybe?)

#2. Print a range of top words... ex: print top 10-20 words

I assume 'top' here means the most common? Whoever is writing the specification for this problem needs to be a bit more specific
in their definitions.

If so you need to fix the bugs in process_line() and
process_file(). I don;t know if these are deliberate bugs
or somebody is just sloppy. But neither work as expected
right now. (Hint: Consider the return values of each)

Once you've done that you can figure out how to extract
the required number of words from your (unsorted) dictionary.
and put that in a reporting function and print the output.
You might be able to use the two common words functions,
although watch out because they don't do exactly what
you want and one of them is basically broken...

#3. Print only the words with > 3 characters

Modify the above to discard words of 3 letters or less.

#4. Modify the printing function to print top 1 or 2 or 3 ....

I assume this means take a parameter that speciffies the
number of words to print. Or it could be the length of
word to ignore. Again the specification is woolly
In either case its a small modification to your
reporting function.

#5. How many unique words are there in the book of length 1, 2, 3 etc

This is slicing the data slightly differently but
again not that different to the earlier requirement.

I am fairly new to python and am completely lost, i looked in my book as
to how to do number one but i cannot figure out what to modify and/or
delete to add the print selection. This is the code:

You need to modify the two brokemn functions and add a
new reporting function. (Despite the reference to a
printing function I'd suggest keeping the data extraction
and printing seperate.

import string

def process_file(filename):
     hist = dict()
     fp = open(filename)
     for line in fp:
         process_line(line, hist)
     return hist

def process_line(line, hist):
     line = line.replace('-', ' ')
     for word in line.split():
         word = word.strip(string.punctuation + string.whitespace)
         word = word.lower()
         hist[word] = hist.get(word, 0) + 1

def common_words(hist):
     t = []
     for key, value in hist.items():
         t.append((value, key))
     t.sort(reverse=True)
     return t

def most_common_words(hist, num=100):
     t = common_words(hist)
     print 'The most common words are:'
     for freq, word in t[:num]:
         print freq, '\t', word
hist = process_file('emma.txt')
print 'Total num of Words:', sum(hist.values())
print 'Total num of Unique Words:', len(hist)
most_common_words(hist, 50)

Any help would be greatly appreciated because i am struggling in this
class. Thank you in advance

hth
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to