Rance Hall wrote:
On Mon, Apr 18, 2011 at 9:50 PM, Marc Tompkins <marc.tompk...@gmail.com> wrote:
On Mon, Apr 18, 2011 at 6:53 PM, Rance Hall <ran...@gmail.com> wrote:

I'm going to go ahead and use this format even though it is deprecated
and then later when we upgrade it I can fix it.

And there you have your answer.

A list might make sense, but printing a message one word at a time
doesn't seem to me like much of a time saver.

Did you try my example code?  It doesn't "print a message one word at a
time"; any time you print " ".join(message), you get the whole thing.  Put a
\n between the quotes, and you get the whole thing on separate lines.


I think you misunderstood me, I simply meant that the print "
".join(message) has to parse through each word in order to get any
output, I didn't mean to suggest that you got output one word at a
time.  Sorry for the confusion.

Well, yes, but you have to walk over each word at some point. The join idiom merely puts that off until just before you need the complete string, instead of walking over them over and over and over again. That's why the join idiom is usually better: it walks over each string once, while repeated concatenation has the potential to walk over each one dozens, hundreds or thousands of times (depending on how many strings you have to concatenate). To be precise: if there are N strings to add, the join idiom does work proportional to N, while the repeated concatenation idiom does work proportional to N*N.

This is potentially *so* catastrophic for performance that recent versions of CPython actually go out of its way to protect you from it (other Python, like Jython, IronPython and PyPy might not). But with a little bit of extra work, we can shoot ourselves in the foot and see how bad *repeated* string concatenation can be:


>>> from timeit import Timer
>>>
>>> class Magic:
...     def __add__(self, other):
...         return other
...
>>> m = Magic()
>>> strings = ['a']*10000
>>>
>>> t1 = Timer('"".join(strings)', 'from __main__ import strings')
>>> t2 = Timer('sum(strings, m)', 'from __main__ import strings, m')
>>>
>>> t1.timeit(1000)  # one thousand timing iterations
1.0727810859680176
>>> t2.timeit(1000)
19.48655891418457


In Real Life, the performance hit can be substantial. Some time ago (perhaps a year?) there was a bug report that copying files over the network was *really* slow in Python. By memory, the bug report was that to download a smallish file took Internet Explorer about 0.1 second, the wget utility about the same, and the Python urllib module about TEN MINUTES. To cut a long story short, it turned out that the module in question was doing repeated string concatenation. Most users never noticed the problem because Python now has a special optimization that detects repeated concatenation and does all sorts of funky magic to make it smarter and faster, but for this one user, there was some strange interaction between how Windows manages memory and the Python optimizer, the magic wasn't applied, and consequently the full inefficiency of the algorithm was revealed in all it's horror.


Bottom line: unless you have actually timed your code and have hard measurements showing different, you should always expect repeated string concatenation to be slow and the join idiom to be fast.



--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to