[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-13 Thread STINNER Victor
STINNER Victor added the comment: You must check that data is aligned. Did you run a benchmark? -- ___ Python tracker ___ ___ Python-

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Some crazy ideas. Try something like this: #define BLOCK unsigned long if (size >= sizeof(BLOCK)) { if (*(BLOCK*)data1 != *(BLOCK*)data2) return 0; return (memcmp((unsigned char*)data1 + sizeof(BLOCK),

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Because most people agree that checking first and/or last > byte/character is not a good idea (may be slower), here is a new patch > removing code checking first/last byte or character in > bytes_richcompare() and unicode_eq(). You misunderstood. Checking the

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-09 Thread STINNER Victor
STINNER Victor added the comment: Because most people agree that checking first and/or last byte/character is not a good idea (may be slower), here is a new patch removing code checking first/last byte or character in bytes_richcompare() and unicode_eq(). It removes the usage of the "register"

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Note that unicode_eq() always called after identity check and hash check. I.e. identity check in Victor's patch is redundant and unicode_eq() called only for strings which have the same hash. The probability to have the same first byte and be equal is a grea

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 04.04.2013 19:00, Eric Snow wrote: > > Eric Snow added the comment: > >> Marc-Andre Lemburg added the comment: >> Same here. The heuristic may work for short strings that easily fit >> into the CPU cache, but as soon as you use it on longer strings, >> t

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-04 Thread Eric Snow
Eric Snow added the comment: > Marc-Andre Lemburg added the comment: > Same here. The heuristic may work for short strings that easily fit > into the CPU cache, but as soon as you use it on longer strings, > this will result in much slower comparisons. When testing both, would it help to test th

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 04.04.2013 11:21, STINNER Victor wrote: > > STINNER Victor added the comment: > > By the way, my initial concern was to merge unicode_compare_eq() and > unicode_eq() functions. > > unicode_compare_eq() only uses memcmp(), whereas unicode_eq() checks > i

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-04 Thread STINNER Victor
STINNER Victor added the comment: By the way, my initial concern was to merge unicode_compare_eq() and unicode_eq() functions. unicode_compare_eq() only uses memcmp(), whereas unicode_eq() checks if the first byte is different before calling memcmp(). So the question is to decide which one is fa

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 04.04.2013 10:33, STINNER Victor wrote: >>> I don't understand why the patch makes the comparaison much slower, >>> since most time is supposed to be spend in memcmp()? >> >> Because reading the last character evicts useful data from the CPU cache, >> jus

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-04 Thread STINNER Victor
STINNER Victor added the comment: > In other words, I'm not convinced this is a useful heuristic. Me neither, but we should use the same optimization strategy for all functions. If we don't compare first and/or last character for str==str, we should do the same for bytes==bytes and Py_UNICODE_MA

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I don't understand why the patch makes the comparaison much slower, > since most time is supposed to be spend in memcmp()? Because reading the last character evicts useful data from the CPU cache, just before memcmp() reads it again from memory? In other wor

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-03 Thread STINNER Victor
STINNER Victor added the comment: benchmark2: Results on a slower computer. Comparing equal strings is much faster with the patch. Example: equal, 'A', 100 | 945 us (*) | 1.25 ms (+32%) I don't understand why the patch makes the comparaison much slower, since most time is suppos

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-03 Thread STINNER Victor
STINNER Victor added the comment: Attach the benchmark script. -- Added file: http://bugs.python.org/file29671/bench_unicode_eq.py ___ Python tracker ___

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-03 Thread STINNER Victor
STINNER Victor added the comment: According to my benchmark, performances are almost the same with the patch. The major difference is on comparing two strings longer than 10 characters, of the same length, with a common prefix but a different suffix. See attached benchmark for the result. ---

[issue17628] str==str: compare the first and last character before calling memcmp()

2013-04-03 Thread STINNER Victor
New submission from STINNER Victor: In Python 3.4, str==str is implemented by calling memcmp(). unicode_eq() function, used by dict and set types, checks the first byte before calling memcmp(). bytes==bytes uses the same check. Py_UNICODE_MATCH macro checks the first *and* last character befor