At 05:57 AM 6/26/2008, Kent Johnson wrote:
On Thu, Jun 26, 2008 at 3:18 AM, Dick Moores <[EMAIL PROTECTED]> wrote:

> I thought I'd use this to compare the 2 ways of string concatenation. Ever
> since I began to learn Python I've been told that only one of these is the
> proper and efficient one to use, and especially so if the string to be
> stitched together is a very long one.

String concatenation was optimized in Python 2.4. You might like to
try this test in Python 2.3. See the last note here:
http://www.python.org/doc/2.4.4/whatsnew/node12.html#SECTION0001210000000000000000

Kent

Interesting.

Instead I've tried to find out if it's true what Alex Martelli writes on p. 484 in the section, "Building up a string from pieces" in his _Python in a Nutshell_, 2nd ed., which covers Python 2.4x.

======================================================================================
The single Python "anti-idiom" that's likeliest to kill your program's performance, to the point that you should _never_ use it, is to build up a large string from pieces by looping on string concatenation statements such as  big_string+=piece.  Python strings are immutable, so each such concatenation means that Python must free the M bytes previously allocated for  big_string, and allocate and fill M+K bytes for the new version. Doing this repeatedly in a loop, you end up with roughly O(N**2) performance, where N is the total number of characters. More often than not, O(N**2) performance where O(N) is available is a performance disaster. On some platforms, things may be even bleaker due to memory fragmentation effects caused by freeing many memory areas of progressively larger sizes.

To achieve O(N) performance, accumulate intermediate pieces in a list rather than build up the string piece by piece. Lists, unlike strings, are mutable, so appending to a list has O(1) performance (amortized). Change each occurrence of   big_string+=piece  into   temp_list.append(piece) ,  Then when you're done accumulating, use the following to build your desired string result inO(N) time:

   big_string = ''.join(temp_list)
======================================================================================

Please see < http://py77.python.pastebin.com/f324475f5>.

It's probably obvious to you what I did. But just in case it's not, I successively modified the lines 14 and 29 so that the length of b varied from 6,000 to 60,000,000 for both ch1() and ch2(). The outputs show that the longer b (Martelli's  big_string) is, the greater the performance hit taken by string concatenation (function ch2() ) compared to the other kind (function ch1()  ), the time ratios ranging from about 1 to about 9.

Dick Moores
Win XP Pro, Python 2.5.1







_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to