On 09/30/2017 11:12 AM, Sri G. wrote: > I'm learning programming with Python. > > I’ve written the code below for finding the most common words in a text > file that has about 1.1 million words. It's working fine, but I believe > there is always room for improvement. > > When run, the function in the script gets a text file from the command-line > argument sys.argv[1], opens the file in read mode, converts the text to > lowercase, makes a list of words from the text after removing any > whitespaces or empty strings, and stores the list elements as dictionary > keys and values in a collections.Counter object. Finally, it returns a > dictionary of the most common words and their counts. The > words.most_common() method gets its argument from the optional top > parameter. > > import sysimport collections > def find_most_common_words(textfile, top=10): > ''' Returns the most common words in the textfile.''' > > textfile = open(textfile) > text = textfile.read().lower() > textfile.close() > words = collections.Counter(text.split()) # how often each word appears > > return dict(words.most_common(top)) > > filename = sys.argv[1] > top_five_words = find_most_common_words(filename, 5) > > I need your comments please.
Others have made some pertinent comments already. How much you spend time to "improve" a bit of code depends on what you're going to do with it. If you've solved your problem, and it's a one-shot: don't worry much about it. Nothing wrong with a bit of throwaway code (although things do sometimes take on a life much longer than you intended, I can say from experience!!!) I'd ask a question or two to think about: first off, if you know the intended use of this function always will be to get a "top 10 list" - then why convert it back to a dict (unsorted) to return after Counter.most_common() has just given you a sorted list? You're most likely going to have to take your top_five_words dictionary and turn it back into something sorted to report back out on the counts. Second, if your function is likely to be called from many places in your code, is it too specialized? Can you design it as a more general API that could be used for other things than this specific purpose? For example, perhaps you just want to return the Counter instance without the further processing of "most_common", and let the client decide what it wants to do with that object. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor