On Tue, Jan 20, 2009 at 10:33 AM, Paul McGuire <pt...@austin.rr.com> wrote:
> "Finding the shortest word among a list of words" sounds like something of > a > trick question to me. I think a more complete problem statement would be > "Find the list of words that are the shortest", since there is no guarantee > that the list does not contain two words of the same shortest length. If > you just add "me" to your sample set, you now get misleading answers: > > words = "man woman children he me".split() > print min(words, key=len) > > prints: > he > > What happened to "me"? It is just as short as "he"! > > To get *all* the words that are the shortest length, I'll show two > approaches. The first uses a defaultdict, from the collections module of > the Python stdlib. With a defaultdict, I can have new dict entries get > initialized with a default value using a specified factory function, > without > first having to check to see if that entry exists. I would like to create > a > dict of lists of words, keyed by the lengths of the words. Something that > would give me this dict: > > { 2 : ['he', 'me'], 3 : ['man'], ... etc. } > > from which I could then find the minimum length using > min(wordlendict.keys()). By using a defaultdict, I can just build things > up > by iterating over the list and adding each word to the entry for the > current > word's length - if the current word is the first one for this length to be > found, the defaultdict will initialize the value to an empty list for me. > This allows me to safely append the current word regardless of whether > there > was an existing entry or not. Here is the code that does this: > > from collections import defaultdict > wordlendict = defaultdict(list) > for w in words: > wordlendict[len(w)].append(w) > minlen = min(wordlendict.keys()) > minlist = wordlendict[minlen] > print minlist > > prints: > ['he', 'me'] > > Now we are getting a more complete answer! > > A second approach uses the groupby method of the itertools module in the > stdlib. groupby usually takes me several attempts to get everything right: > the input must be sorted by the grouping feature, and then the results > returned by groupby are in the form of an iterator that returns > key-iterator > tuples. I need to go through some mental gymnastics to unwind the data to > get the group I'm really interested in. Here is a step-by-step code to use > groupby: > > from itertools import groupby > grpsbylen = groupby(sorted(words,key=len),len) > mingrp = grpsbylen.next() > minlen = mingrp[0] > minlist = list(mingrp[1]) > print minlist > > The input list of words gets sorted, and then passed to groupby, which uses > len as the grouping criteria (groupby is not inherently aware of how the > input list was sorted, so len must be repeated). The first group tuple is > pulled from the grpsbylen iterator using next(). The 0'th tuple element is > the grouping value - in this case, it is the minimum length 2. The 1'th > tuple element is the group itself, given as an iterator. Passing this to > the list constructor gives us the list of all the 2-character words, which > then gets printed: > ['he', 'me'] > > Again, 'me' no longer gets left out. > > Maybe this will get you some extra credit... > > -- Paul Thank you so much Paul for this. In my original post, I wrote word(s), which means word or words. Your solutions look a little bit too advanced to me. I never used the collections module, and used itertools only once or twice. I will study these solutions for sure, not for extra credit, simply because I'm simply a linguistics person who uses corpora to find information without any official interest in computer science or programming classes. Maybe when I'm good enough at programming, I will take a Python class for credit, although I'm past the classes thing now. Thank you all again for helping me get a better understanding of Python. > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > -- لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي "No victim has ever been more repressed and alienated than the truth" Emad Soliman Nawfal Indiana University, Bloomington --------------------------------------------------------
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor