Re: [Tutor] working with strings in python3

Steven D'Aprano Tue, 19 Apr 2011 10:53:07 -0700

Rance Hall wrote:

On Mon, Apr 18, 2011 at 9:50 PM, Marc Tompkins <marc.tompk...@gmail.com> wrote:

On Mon, Apr 18, 2011 at 6:53 PM, Rance Hall <ran...@gmail.com> wrote:

I'm going to go ahead and use this format even though it is deprecated
and then later when we upgrade it I can fix it.

And there you have your answer.

A list might make sense, but printing a message one word at a time
doesn't seem to me like much of a time saver.


Did you try my example code?  It doesn't "print a message one word at a
time"; any time you print " ".join(message), you get the whole thing.  Put a
\n between the quotes, and you get the whole thing on separate lines.


I think you misunderstood me, I simply meant that the print "
".join(message) has to parse through each word in order to get any
output, I didn't mean to suggest that you got output one word at a
time.  Sorry for the confusion.

Well, yes, but you have to walk over each word at some point. The joinidiom merely puts that off until just before you need the completestring, instead of walking over them over and over and over again.That's why the join idiom is usually better: it walks over each stringonce, while repeated concatenation has the potential to walk over eachone dozens, hundreds or thousands of times (depending on how manystrings you have to concatenate). To be precise: if there are N stringsto add, the join idiom does work proportional to N, while the repeatedconcatenation idiom does work proportional to N*N.

This is potentially *so* catastrophic for performance that recentversions of CPython actually go out of its way to protect you from it(other Python, like Jython, IronPython and PyPy might not). But with alittle bit of extra work, we can shoot ourselves in the foot and see howbad *repeated* string concatenation can be:



>>> from timeit import Timer
>>>
>>> class Magic:
...     def __add__(self, other):
...         return other
...
>>> m = Magic()
>>> strings = ['a']*10000
>>>
>>> t1 = Timer('"".join(strings)', 'from __main__ import strings')
>>> t2 = Timer('sum(strings, m)', 'from __main__ import strings, m')
>>>
>>> t1.timeit(1000)  # one thousand timing iterations
1.0727810859680176
>>> t2.timeit(1000)
19.48655891418457

In Real Life, the performance hit can be substantial. Some time ago(perhaps a year?) there was a bug report that copying files over thenetwork was *really* slow in Python. By memory, the bug report was thatto download a smallish file took Internet Explorer about 0.1 second, thewget utility about the same, and the Python urllib module about TENMINUTES. To cut a long story short, it turned out that the module inquestion was doing repeated string concatenation. Most users nevernoticed the problem because Python now has a special optimization thatdetects repeated concatenation and does all sorts of funky magic to makeit smarter and faster, but for this one user, there was some strangeinteraction between how Windows manages memory and the Python optimizer,the magic wasn't applied, and consequently the full inefficiency of thealgorithm was revealed in all it's horror.

Bottom line: unless you have actually timed your code and have hardmeasurements showing different, you should always expect repeated stringconcatenation to be slow and the join idiom to be fast.




--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] working with strings in python3

Reply via email to