Roundup Robot added the comment:
New changeset d396e0716bf4 by Serhiy Storchaka in branch 'default':
Issue #16061: Speed up str.replace() for replacing 1-character strings.
http://hg.python.org/cpython/rev/d396e0716bf4
--
nosy: +python-dev
___
Python
Serhiy Storchaka added the comment:
Thanks to Ezio Melotti and Daniel Shahaf for their great help in correcting my
clumsy wording.
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker
Serhiy Storchaka added the comment:
Here is an updated patch. Some comments added (I will be grateful for help in
the improvement of these comments), an implementation moved to stringlib (a new
file Objects/stringlib/replace.h added).
unicode_2.patch optimizes only too special case and I
STINNER Victor added the comment:
str_replace_1char_2.patch looks good to me. Just one nit: please add a
reference to this issue in the comment (in replace.h).
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
Terry J. Reedy added the comment:
My experiments last September, before this was filed, showed that str.find
(index) had most of the relative slowdown of str.replace. I assumed at that
time that .replace used .find or .index to find substrings to replace, so that
the fix for .replace would
STINNER Victor added the comment:
How can we move this issu forward? I still prefer unicode_2.patch over
str_replace_1char.patch because the code is simpler and so easier to maintain.
str_replace_1char.patch has a bug: replace_1char() does not use pos for the
latin1 path.
--
Changes by Jesús Cea Avión j...@jcea.es:
--
nosy: +jcea
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
___
Python-bugs-list mailing list
Serhiy Storchaka added the comment:
str_replace_1char.patch: why not implementing replace_1char_inplace() in
stringlib, with one version per character type (UCS1, UCS2, UCS4)?
Because there are no benefits to do it. All three versions (UCS1, UCS2, and
UCS4) have no any common code. The best
Serhiy Storchaka added the comment:
I going speed up other cases for replace(), but for now I have only this patch.
Is it good? Should I apply it to 3.3 as there is a 3.3 regression?
--
keywords: +3.3regression
___
Python tracker
Benjamin Peterson added the comment:
As __ap__ says, it would be nice to have a comment.
--
nosy: +benjamin.peterson
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
Antoine Pitrou added the comment:
64-bit linux results:
3.2 3.3 patch
133 (-28%) 1343 (-93%) 96 1 'a' 'b' 'c'
414 (-9%)704 (-47%) 3752 'a' 'b' 'c'
319 (-8%)491 (-40%) 2933 'a' 'b' 'c'
253 (-7%)384 (-39%) 2354 'a' 'b' 'c'
216 (-8%)320
Antoine Pitrou added the comment:
64-bit windows results:
3.3 patched
925 (-90%) 97 1 'a' 'b' 'c'
881 (-54%) 4052 'a' 'b' 'c'
623 (-51%) 3083 'a' 'b' 'c'
482 (-48%) 2524 'a' 'b' 'c'
396 (-44%) 2235 'a' 'b' 'c'
344 (-40%) 2086 'a' 'b' 'c'
306 (-38%)
Serhiy Storchaka added the comment:
As __ap__ says, it would be nice to have a comment.
Oh, I thought I had already done this.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
STINNER Victor added the comment:
str_replace_1char.patch: why not implementing replace_1char_inplace() in
stringlib, with one version per character type (UCS1, UCS2, UCS4)?
I prefer unicode_2.patch algorithm because it's simpler: only one loop (vs two
loops for str_replace_1char.patch, with
Changes by Serhiy Storchaka storch...@gmail.com:
--
assignee: - serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
___
Changes by Antoine Pitrou pit...@free.fr:
--
stage: needs patch - patch review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
___
Serhiy Storchaka added the comment:
After much experimentation, I suggest the new patch.
Benchmark results (time of replacing 1 of n character (ch1 to ch2) in 10-
char string).
Py3.2Py3.3patch n ch1 ch2 fill
231 (-13%) 3025 (-93%) 2001 'a' 'b' 'c'
626 (-18%) 2035
Serhiy Storchaka added the comment:
The patch should be completed to optimize also other Unicode kinds.
I'm working on it.
Here are benchmark scripts which I use. First tests regular strings (replace
every n-th char), second tests random strings (replace 1/n of total randomly
distributed
Antoine Pitrou added the comment:
The performance numbers are very nice, but the patch needs a comment about the
optimization, IMO.
--
nosy: +pitrou
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
Serhiy Storchaka added the comment:
I compared performances of the two methods: dummy loop vs find.
You can hybridize them. First just compare chars and if not match then use
memcmp(). This speed up the case of repeated chars.
--
Added file:
STINNER Victor added the comment:
You can hybridize them. First just compare chars and if not match then use
memcmp(). This speed up the case of repeated chars.
Oh, you're patch is simple and it's amazing fast! I compare unicode with
Python 2.7, 3.2, 3.4 and 3.4 patched, and bytes with 2.7.
STINNER Victor added the comment:
The code is now using the heavily optimized findchar() function.
I compared performances of the two methods: dummy loop vs find. Results with a
string of 100,000 characters:
* Replace 100% (rewrite all characters): find is 12.5x slower than a loop
*
Changes by STINNER Victor victor.stin...@gmail.com:
--
nosy: +loewis
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
___
Python-bugs-list
Changes by Kushal Das kushal...@gmail.com:
--
nosy: +kushaldas
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
___
Python-bugs-list
Thomas Lee added the comment:
My results aren't quite as dramatic as yours, but there does appear to be a
regression:
$ ./python -V
Python 2.7.3+
$ ./python -m timeit -s s = 'b'*1000 s.replace('b', 'a')
10 loops, best of 3: 16.5 usec per loop
$ ./python -V
Python 3.3.0rc3+
$ ./python -m
STINNER Victor added the comment:
Python 3.3 is 2x faster than Python 3.2 to replace a character with
another if the string only contains the character 3 times. This is not
acceptable, Python 3.3 must be as slow as Python 3.2!
$ python3.2 -m timeit ch='é'; sp=' '*1000; s = ch+sp+ch+sp+ch;
New submission from Mark Lawrence:
Quoting Steven D'Aprano on c.l.p.
But add a call to replace, and things are very different:
[steve@ando ~]$ python2.7 -m timeit -s s = 'b'*1000 s.replace('b', 'a')
10 loops, best of 3: 9.3 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s s = 'b'*1000
Changes by Serhiy Storchaka storch...@gmail.com:
--
components: +Interpreter Core
nosy: +storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
___
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
components: +Unicode
nosy: +ezio.melotti, haypo
stage: - needs patch
versions: +Python 3.4 -Python 3.3
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16061
29 matches
Mail list logo