Alan Gauld wrote:
> The loop is a bit clunky. it would be clearer just to iterate over
> a_list:
>
> for item in a_list:
> words[item] = a_list.count(item)
This is a very inefficient approach because you repeat counting the number
of occurrences of a word that appears N times N times:
>>> words = {}
>>> a_list = "in the garden on the bank behind the tree".split()
>>> for word in a_list:
... print "counting", word
... words[word] = a_list.count(word)
...
counting in
counting the # <-- 1
counting garden
counting on
counting the # <-- 2
counting bank
counting behind
counting the # <-- 3
counting tree
>>> words
{'on': 1, 'garden': 1, 'tree': 1, 'behind': 1, 'in': 1, 'the': 3, 'bank': 1}
To avoid the duplicate effort you can check if the word was already counted:
>>> words2 = {}
>>> for word in a_list:
... if word not in words2:
... print "counting", word
... words2[word] = a_list.count(word)
...
counting in
counting the
counting garden
counting on
counting bank
counting behind
counting tree
>>> words == words2
True
Inside the count() method you are still implicitly iterating over the entire
list once for every distinct word. I would instead prefer counting manually
while iterating over the list once. This has the advantage that it will even
work if you don't keep the whole sequence of words in a list in memory (e.
g. if you read them from a file one line at a time):
>>> words3 = {}
>>> for word in a_list:
... if word in words3:
... words3[word] += 1
... else:
... words3[word] = 1
...
>>> words3 == words
True
Finally there's collections.defaultdict or, in Python 2.7,
collections.Counter when you are more interested in the result than the way
to achieve it.
Peter
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor