On Tue, Jan 20, 2009 at 10:33 AM, Paul McGuire <pt...@austin.rr.com> wrote:

> "Finding the shortest word among a list of words" sounds like something of
> a
> trick question to me.  I think a more complete problem statement would be
> "Find the list of words that are the shortest", since there is no guarantee
> that the list does not contain two words of the same shortest length.  If
> you just add "me" to your sample set, you now get misleading answers:
>
> words = "man woman children he me".split()
> print min(words, key=len)
>
> prints:
> he
>
> What happened to "me"?  It is just as short as "he"!
>
> To get *all* the words that are the shortest length, I'll show two
> approaches.  The first uses a defaultdict, from the collections module of
> the Python stdlib.  With a defaultdict, I can have new dict entries get
> initialized with a default value using a specified factory function,
> without
> first having to check to see if that entry exists.  I would like to create
> a
> dict of lists of words, keyed by the lengths of the words.  Something that
> would give me this dict:
>
> { 2 : ['he', 'me'], 3 : ['man'], ... etc. }
>
> from which I could then find the minimum length using
> min(wordlendict.keys()).  By using a defaultdict, I can just build things
> up
> by iterating over the list and adding each word to the entry for the
> current
> word's length - if the current word is the first one for this length to be
> found, the defaultdict will initialize the value to an empty list for me.
> This allows me to safely append the current word regardless of whether
> there
> was an existing entry or not.  Here is the code that does this:
>
> from collections import defaultdict
> wordlendict = defaultdict(list)
> for w in words:
>    wordlendict[len(w)].append(w)
> minlen = min(wordlendict.keys())
> minlist = wordlendict[minlen]
> print minlist
>
> prints:
> ['he', 'me']
>
> Now we are getting a more complete answer!
>
> A second approach uses the groupby method of the itertools module in the
> stdlib.  groupby usually takes me several attempts to get everything right:
> the input must be sorted by the grouping feature, and then the results
> returned by groupby are in the form of an iterator that returns
> key-iterator
> tuples.  I need to go through some mental gymnastics to unwind the data to
> get the group I'm really interested in.  Here is a step-by-step code to use
> groupby:
>
> from itertools import groupby
> grpsbylen = groupby(sorted(words,key=len),len)
> mingrp = grpsbylen.next()
> minlen = mingrp[0]
> minlist = list(mingrp[1])
> print minlist
>
> The input list of words gets sorted, and then passed to groupby, which uses
> len as the grouping criteria (groupby is not inherently aware of how the
> input list was sorted, so len must be repeated).  The first group tuple is
> pulled from the grpsbylen iterator using next().  The 0'th tuple element is
> the grouping value - in this case, it is the minimum length 2.  The 1'th
> tuple element is the group itself, given as an iterator.  Passing this to
> the list constructor gives us the list of all the 2-character words, which
> then gets printed:
> ['he', 'me']
>
> Again, 'me' no longer gets left out.
>
> Maybe this will get you some extra credit...
>
> -- Paul

Thank you so much Paul for this. In my original post, I wrote word(s), which
means word or words. Your solutions look  a little bit too advanced to me. I
never used the collections module, and used itertools only once or twice. I
will study these solutions for sure, not for extra credit, simply because
I'm simply a linguistics person who uses corpora to find information without
any official interest in computer science or programming classes. Maybe when
I'm good enough at programming, I will take a Python class for credit,
although I'm past the classes thing now.
Thank you all again for helping me get a better understanding of Python.

>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington

--------------------------------------------------------
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to