On Fri, 20 Aug 2010 01:40:54 am Wayne Werner wrote: > > age_dict = dict([(key.upper(), value) for key,value in > > age_dict.items()]) > > This is a bad place to use a list comprehension. This will create a > list of values first and then create a dict from that list, so now > you have a list floating around that you didn't need.
How do you know that dict() doesn't need it the entire list at once? Unless you're an expert on the implementation, for all you know it looks something like this: class dict: def __new__(cls, arg): ... try: n = len(arg) except AttributeError: # probably a generator expression arg = list(arg) n = len(arg) allocate_memory_for_items(n) ... (only written in C). I'm not saying it does, or that it doesn't, but you're assuming a pattern of behaviour which might not be the case. Here's a similarly common idiom: which of these is faster? ' '.join(gen_expr) ' '.join(list_comp) [st...@sylar ~]$ python -m timeit "' '.join(str(n) for n in xrange(300000))" 10 loops, best of 3: 437 msec per loop [st...@sylar ~]$ python -m timeit "' '.join([str(n) for n in xrange(300000)])" 10 loops, best of 3: 401 msec per loop The list comprehension is consistently faster, because join() works more efficiently if it knows how many items it needs to pre-allocate memory for. > Generator expressions, OTOH, generate the values on the fly, only as > they're needed, so there's no extra list left over once the dict is > created. And sometimes that's a win, and sometimes it's not. Generator expressions are more computationally expensive than lists -- they're functions which remember their internal state so you can pause them and restart them at will. That doesn't happen for free -- it takes memory and time. The reason Python has generator expressions is that for large amounts of data, the saving you have by not needing to produce the entire list all at once more than makes up for the extra cost, but for small amounts of data, that's not always the case: [st...@sylar ~]$ python -m timeit "dict((k,k+1) for k in xrange(2))" 100000 loops, best of 3: 5.89 usec per loop [st...@sylar ~]$ python -m timeit "dict([(k,k+1) for k in xrange(2)])" 100000 loops, best of 3: 4.78 usec per loop Here, using a generator expression is a pessimation, not an optimization. > Sure it will eventually be garbage collected, but "waste not, want > not", as my grandmother used to say. Does your grandmother have a box labelled "Pieces of string, too short to use" as well? -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor