Re: [Tutor] Joining all strings in stringList into one string

Steven D'Aprano Sat, 02 Jun 2012 07:32:48 -0700

Jordan wrote:

#Another version might look like this:


def join_strings2(string_list):
    final_string = ''
    for string in string_list:
        final_string += string
    print(final_string)
    return final_string

Please don't do that. This risks becoming slow. REALLY slow. Painfully slow.Like, 10 minutes versus 3 seconds slow. Seriously.

The reason for this is quite technical, and the reason why you might notnotice is even more complicated, but the short version is this:


Never build up a long string by repeated concatenation of short strings.
Always build up a list of substrings first, then use the join method to
assemble them into one long string.


Why repeated concatenation is slow:

Suppose you want to concatenation two strings, "hello" and "world", to make anew string "helloworld". What happens?

Firstly, Python has to count the number of characters needed, whichfortunately is fast in Python, so we can ignore it. In this case, we need 5+5= 10 characters.

Secondly, Python sets aside enough memory for those 10 characters, plus alittle bit of overhead: ----------


Then it copies the characters from "hello" into the new area: hello-----

followed by the characters of "world": helloworld

and now it is done. Simple, right? Concatenating two strings is pretty fast.You can't get much faster.

Ah, but what happens if you do it *repeatedly*? Suppose we have SIX strings wewant to concatenate, in a loop:


words = ['hello', 'world', 'foo', 'bar', 'spam', 'ham']
result = ''
for word in words:
    result = result + word

How much work does Python have to do?

Step one: add '' + 'hello', giving result 'hello'
Python needs to copy 0+5 = 5 characters.

Step two: add 'hello' + 'world', giving result 'helloworld'
Python needs to copy 5+5 = 10 characters, as shown above.

Step three: add 'helloworld' + 'foo', giving 'helloworldfoo'
Python needs to copy 10+3 = 13 characters.

Step four: add 'helloworldfoo' + 'bar', giving 'helloworldfoobar'
Python needs to copy 13+3 = 16 characters.

Step five: add 'helloworldfoobar' + 'spam', giving 'helloworldfoobarspam'
Python needs to copy 16+4 = 20 characters.

Step six: add 'helloworldfoobarspam' + 'ham', giving 'helloworldfoobarspamham'
Python needs to copy 20+3 = 23 characters.

So in total, Python has to copy 5+10+13+16+20+23 = 87 characters, just tobuild up a 23 character string. And as the number of loops increases, theamount of extra work needed just keeps expanding. Even though a single stringconcatenation is fast, repeated concatenation is painfully SLOW.

In comparison, ''.join(words) one copies each substring once: it counts outthat it needs 23 characters, allocates space for 23 characters, then copieseach substring into the right place instead of making a whole lot of temporarystrings and redundant copying.

So, join() is much faster than repeated concatenation. But you may never havenoticed. Why not?

Well, for starters, for small enough pieces of data, everything is fast. Thedifference between copying 87 characters (the slow way) and 23 characters (thefast way) is trivial.

But more importantly, some years ago (Python 2.4, about 8 years ago?) thePython developers found a really neat trick that they can do to optimizestring concatenation so it doesn't need to repeatedly copy characters over andover and over again. I won't go into details, but the thing is, this trickworks well enough that repeated concatenation is about as fast as the joinmethod MOST of the time.

Except when it fails. Because it is a trick, it doesn't always work. And whenit does fail, your repeated string concatenation code will suddenly drop fromrunning in 0.1 milliseconds to a full second or two; or worse, from 20 secondsto over an hour. (Potentially; the actual slow-down depends on the speed ofyour computer, your operating system, how much memory you have, etc.)

Because this is a cunning trick, it doesn't always work, and when it doesn'twork, and you have slow code and no hint as to why.


What can cause it to fail?

- Old versions of Python, before 2.4, will be slow.

- Other implementations of Python, such as Jython and IronPython, will nothave the trick, and so will be slow.

- The trick is highly-dependent on internal details of the memory managementof Python and the way it interacts with the operating system. So what's fastunder Linux may be slow under Windows, or the other way around.

- The trick is highly-dependent on specific circumstances to do with thesubstrings being added. Without going into details, if those circumstances areviolated, you will have slow code.

- The trick only works when you are adding strings to the end of the newstring, not if you are building it up from the beginning.



So even though your function works, you can't rely on it being fast.




--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Joining all strings in stringList into one string

Reply via email to