Jordan wrote:

#Another version might look like this:

def join_strings2(string_list):
    final_string = ''
    for string in string_list:
        final_string += string
    print(final_string)
    return final_string

Please don't do that. This risks becoming slow. REALLY slow. Painfully slow. Like, 10 minutes versus 3 seconds slow. Seriously.

The reason for this is quite technical, and the reason why you might not notice is even more complicated, but the short version is this:

Never build up a long string by repeated concatenation of short strings.
Always build up a list of substrings first, then use the join method to
assemble them into one long string.


Why repeated concatenation is slow:

Suppose you want to concatenation two strings, "hello" and "world", to make a new string "helloworld". What happens?

Firstly, Python has to count the number of characters needed, which fortunately is fast in Python, so we can ignore it. In this case, we need 5+5 = 10 characters.

Secondly, Python sets aside enough memory for those 10 characters, plus a little bit of overhead: ----------

Then it copies the characters from "hello" into the new area: hello-----

followed by the characters of "world": helloworld

and now it is done. Simple, right? Concatenating two strings is pretty fast. You can't get much faster.

Ah, but what happens if you do it *repeatedly*? Suppose we have SIX strings we want to concatenate, in a loop:

words = ['hello', 'world', 'foo', 'bar', 'spam', 'ham']
result = ''
for word in words:
    result = result + word

How much work does Python have to do?

Step one: add '' + 'hello', giving result 'hello'
Python needs to copy 0+5 = 5 characters.

Step two: add 'hello' + 'world', giving result 'helloworld'
Python needs to copy 5+5 = 10 characters, as shown above.

Step three: add 'helloworld' + 'foo', giving 'helloworldfoo'
Python needs to copy 10+3 = 13 characters.

Step four: add 'helloworldfoo' + 'bar', giving 'helloworldfoobar'
Python needs to copy 13+3 = 16 characters.

Step five: add 'helloworldfoobar' + 'spam', giving 'helloworldfoobarspam'
Python needs to copy 16+4 = 20 characters.

Step six: add 'helloworldfoobarspam' + 'ham', giving 'helloworldfoobarspamham'
Python needs to copy 20+3 = 23 characters.

So in total, Python has to copy 5+10+13+16+20+23 = 87 characters, just to build up a 23 character string. And as the number of loops increases, the amount of extra work needed just keeps expanding. Even though a single string concatenation is fast, repeated concatenation is painfully SLOW.

In comparison, ''.join(words) one copies each substring once: it counts out that it needs 23 characters, allocates space for 23 characters, then copies each substring into the right place instead of making a whole lot of temporary strings and redundant copying.

So, join() is much faster than repeated concatenation. But you may never have noticed. Why not?

Well, for starters, for small enough pieces of data, everything is fast. The difference between copying 87 characters (the slow way) and 23 characters (the fast way) is trivial.

But more importantly, some years ago (Python 2.4, about 8 years ago?) the Python developers found a really neat trick that they can do to optimize string concatenation so it doesn't need to repeatedly copy characters over and over and over again. I won't go into details, but the thing is, this trick works well enough that repeated concatenation is about as fast as the join method MOST of the time.

Except when it fails. Because it is a trick, it doesn't always work. And when it does fail, your repeated string concatenation code will suddenly drop from running in 0.1 milliseconds to a full second or two; or worse, from 20 seconds to over an hour. (Potentially; the actual slow-down depends on the speed of your computer, your operating system, how much memory you have, etc.)

Because this is a cunning trick, it doesn't always work, and when it doesn't work, and you have slow code and no hint as to why.

What can cause it to fail?

- Old versions of Python, before 2.4, will be slow.

- Other implementations of Python, such as Jython and IronPython, will not have the trick, and so will be slow.

- The trick is highly-dependent on internal details of the memory management of Python and the way it interacts with the operating system. So what's fast under Linux may be slow under Windows, or the other way around.

- The trick is highly-dependent on specific circumstances to do with the substrings being added. Without going into details, if those circumstances are violated, you will have slow code.

- The trick only works when you are adding strings to the end of the new string, not if you are building it up from the beginning.


So even though your function works, you can't rely on it being fast.




--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to