On May 4, 2014 8:31 PM, "Jake Blank" <jakenbl...@gmail.com> wrote: > > Hi, > So I'm doing a problem on the Alice_in_wonderland.txt where I have to write a > program that reads a piece of text from a file specified by the user, counts > the number of occurrences of each word, and writes a sorted list of words and > their counts to an output file. The list of words should be sorted based on > the counts, so that the most popular words appear at the top. Words with the > same counts should be sorted alphabetically. > > My code right now is > > word_count = {} > file = open ('alice_in_wonderland.txt', 'r') > full_text = file.read().replace('--',' ') > full_text_words = full_text.split() > > for words in full_text_words: > stripped_words = words.strip(".,!?'`\"- ();:") > try: > word_count[stripped_words] += 1 > except KeyError: > word_count[stripped_words] = 1 > > ordered_keys = word_count.keys() > sorted(ordered_keys) > print ("All the words and their frequency in", 'alice in wonderland') > for k in ordered_keys: > print (k, word_count[k]) > > The Output here is just all of the words in the document NOT SORTED by amount > of occurrence. > I need help sorting this output of words in the Alice_in_wonderland.txt, as > well as help asking the user for the input information about the files. > > If anyone could give me some guidance you will really be helping me out. > > Please and Thank you
Hi Jake, You are sorting the dictionary keys by the keys themselves, whereas what you want is the keys sorted by their associated values. Look at the key parameter in https://docs.python.org/3.4/library/functions.html#sorted. To get you started, here is an example in the vicinity: >>> data = ['abiab', 'cdocd', 'efaef', 'ghbgh'] >>> sorted(data) ['abiab', 'cdocd', 'efaef', 'ghbgh'] >>> sorted(data, key=lambda x:x[2]) ['efaef', 'ghbgh', 'abiab', 'cdocd'] >>> def get_third(x): return x[2] ... >>> sorted(data, key=get_third) ['efaef', 'ghbgh', 'abiab', 'cdocd'] >>> In case the lambda version is confusing, it is simply a way of doing the get_third version without having to create a function outside of the context of the sorted expression. If that sorts you, great. If not, please do ask a follow-up. (I was trying not to do it for you, but also not to frustrate by giving you too little of a push.) Best, Brian vdB _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor