[issue16427] Faster hash implementation

2013-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Since issue19183 supersedes it, yes. -- resolution: -> duplicate stage: patch review -> committed/rejected status: open -> closed superseder: -> PEP 456 Secure and interchangeable hash algorithm ___ Python tracker

[issue16427] Faster hash implementation

2013-10-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: Shouldn't this issue be obsoleted? -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsub

[issue16427] Faster hash implementation

2013-10-27 Thread Christian Heimes
Changes by Christian Heimes : -- assignee: -> christian.heimes ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue16427] Faster hash implementation

2013-10-27 Thread Christian Heimes
Christian Heimes added the comment: Antoine, I have addressed your concern "Well, the quality of the hash function is clearly reduced" in patch http://hg.python.org/features/pep-456/rev/765930d944a5 >>> s = set() >>> for i in range(256): ... s.add(hash("abcdfeg" + chr(i)) & 0xff) ... >>>

[issue16427] Faster hash implementation

2013-06-05 Thread STINNER Victor
STINNER Victor added the comment: I'm not sure that micro-benchmarks are revelant for this issue. Hash collisions may have an higher cost than the gain of a faster hash function. Some people feel also concerned by the dict denial of service issue. It would be interesting to compare performance

[issue16427] Faster hash implementation

2013-06-05 Thread Lukas Lueg
Changes by Lukas Lueg : Added file: http://bugs.python.org/file30475/cityhash_fasthast3.txt ___ Python tracker ___ ___ Python-bugs-list mailin

[issue16427] Faster hash implementation

2013-06-05 Thread Lukas Lueg
Lukas Lueg added the comment: Here are more benchmarks of vanilla 3.4 vs. cityhash vs. fast_hash_3 on both arm7l and x86-64. The patch was applied varbatim, only caching disabled. On arm7l, the cpu was fixed to maximum freq (it seems to take ages to switch frequencies, at least there is a lot

[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg
Lukas Lueg added the comment: The 10**4-case is an error (see insane %), I've never been able to reproduce. Having done more tests with fixed cpu frequency and other daemons' process priority reduced, cityhash always comes out much slower on arm7l. -- _

[issue16427] Faster hash implementation

2013-06-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Here are some benchmarks for a arm7l on a rk30-board. CityHash was > compiled with -mcpu=native -O3. The results look unbelievable. If you take "Length 10 ** 4", it means arm7l is able to hash 20 GB/s using the default unicode hash function. (did you disable

[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg
Lukas Lueg added the comment: Here are some benchmarks for a arm7l on a rk30-board. CityHash was compiled with -mcpu=native -O3. CityHash is around half as fast as the native algorithm for small strings and way, way slower on larger ones. My guess would be that the complex arithmetic in cityh

[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg
Lukas Lueg added the comment: It's a cache sitting between an informix db and and an internal web service. Stuff comes out of db, processed, json'ifed, cached and put on the wire. 10**6s of strings pass this process per request if uncached... I use CityHash64WithSeed, the seed being cpython's

[issue16427] Faster hash implementation

2013-06-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I was investigating a callgrind dump of my code, showing how badly > unicode_hash() was affecting my performance. Can you tell us about your use case? There are several CityHash variants, which one did you use? CityHash64? --

[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg
Lukas Lueg added the comment: I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - t

[issue16427] Faster hash implementation

2013-04-23 Thread Martin Morrison
Changes by Martin Morrison : -- nosy: +isoschiz ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue16427] Faster hash implementation

2013-04-10 Thread Mark Dickinson
Mark Dickinson added the comment: > Note that the patch uses type punning through a union: while GCC allows > > this, it's not allowed by ANSI. I believe it's legal under C99 + TC3. -- ___ Python tracker

[issue16427] Faster hash implementation

2013-04-09 Thread Charles-François Natali
Charles-François Natali added the comment: > Is p guaranteed to be size_t aligned? > If not, unaligned access can segfault (e.g. on Sparc IIRC). Apparently yes. -- ___ Python tracker __

[issue16427] Faster hash implementation

2013-04-09 Thread Charles-François Natali
Charles-François Natali added the comment: > I'd expect just casting the pointer type before dereferencing: > > unsigned char *p; > ... > hash = (multiplier * hash) ^ *((Py_uhash_t *)p); > > (don't use size_t, use Py_uhash_t) Is p guaranteed to be size_t aligned? If not, unaligned access can seg

[issue16427] Faster hash implementation

2013-04-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: > pybench and perf.py can be used to measure performances of the patch. > The speedup may not be detected, but a slowdown would be detected at > least. The slowdown would only occur for specific, well-chosen patterns. Also it may make DoS attacks easier. (remem

[issue16427] Faster hash implementation

2013-04-09 Thread Gregory P. Smith
Gregory P. Smith added the comment: > > Note that the patch uses type punning through a union > > What is the standard and portable way to cast an array of bytes to size_t? I'd expect just casting the pointer type before dereferencing: unsigned char *p; ... hash = (multiplier * hash) ^ *((Py_uh

[issue16427] Faster hash implementation

2013-04-09 Thread STINNER Victor
STINNER Victor added the comment: > Note that the patch uses type punning through a union What is the standard and portable way to cast an array of bytes to size_t? 2013/4/9 Charles-François Natali : > > Charles-François Natali added the comment: > > Note that the patch uses type punning throug

[issue16427] Faster hash implementation

2013-04-09 Thread Charles-François Natali
Charles-François Natali added the comment: Note that the patch uses type punning through a union: while GCC allows this, it's not allowed by ANSI (although since we're using a char [], it's somewhat a grey area). An aggresive compiler could optimiza the read/write away. -- nosy: +neolo

[issue16427] Faster hash implementation

2013-04-09 Thread STINNER Victor
STINNER Victor added the comment: > (dicts and sets have a sophisticated probing algorithm which takes into > account the whole hash value, not the masked one). Correct, so your specific example should not be a problem since the whole hash value is different for the 6 hash values. > Now to kno

[issue16427] Faster hash implementation

2013-04-09 Thread Antoine Pitrou
Antoine Pitrou added the comment: Well, the quality of the hash function is clearly reduced: >>> hash("abcdefgh") & 0xff 206 >>> hash("abcdefgi") & 0xff 206 >>> hash("abcdefgj") & 0xff 206 >>> hash("abxx") & 0xff 206 >>> hash("aa11") & 0xff 206 >>> hash("aa12") & 0xff 206 Now to kn

[issue16427] Faster hash implementation

2013-04-08 Thread Raymond Hettinger
Changes by Raymond Hettinger : -- nosy: +rhettinger ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail

[issue16427] Faster hash implementation

2013-04-08 Thread STINNER Victor
STINNER Victor added the comment: Does anyone know if fast_hash_3.patch may reduce the quality of the hash function? (May the patched hash function produce more collisions? The "Avalanche effect" thing.) -- ___ Python tracker

[issue16427] Faster hash implementation

2013-04-08 Thread STINNER Victor
Changes by STINNER Victor : Added file: http://bugs.python.org/file29743/bench_hash.py ___ Python tracker ___ ___ Python-bugs-list mailing lis

[issue16427] Faster hash implementation

2013-04-08 Thread STINNER Victor
STINNER Victor added the comment: fast_hash_3.patch is a litte bit (6%) slower for Unicode string shorter than 10 characters, but much faster for string equal or longer than 100 characters (up to 10x faster). I used the str type and disabled its cache ("_PyUnicode_HASH(self) = x;" in unicode_

[issue16427] Faster hash implementation

2013-04-08 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Removed file: http://bugs.python.org/file27947/fast_hash_2.patch ___ Python tracker ___ ___ Python-bugs-list mai

[issue16427] Faster hash implementation

2013-04-08 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Removed file: http://bugs.python.org/file27950/fast_str_hash.patch ___ Python tracker ___ ___ Python-bugs-list m

[issue16427] Faster hash implementation

2013-04-07 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue16427] Faster hash implementation

2012-11-15 Thread Andrew Svetlov
Changes by Andrew Svetlov : -- nosy: +asvetlov ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyth

[issue16427] Faster hash implementation

2012-11-11 Thread Roundup Robot
Roundup Robot added the comment: New changeset 797de1864fd9 by Antoine Pitrou in branch '3.3': Add a test for hashing of unaligned memory buffers (from issue #16427). http://hg.python.org/cpython/rev/797de1864fd9 New changeset 9cb1366b251b by Antoine Pitrou in branch 'default': Add a test for ha

[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Stefan, thank you for the suggestion. The test showed that, in fact, at least under some x86 there is no performance decrease when using memcpy on nonaligned data. This is good news. The code can left simple and even some doubtful potential undefined beha

[issue16427] Faster hash implementation

2012-11-11 Thread Stefan Krah
Stefan Krah added the comment: > The code can be identical, but the time will differ significantly for > aligned and non-aligned data. Of course, but in most cases the data *is* aligned, so only code that does something quite special pays the performance penalty. -- ___

[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Unaligned accesses are not a problem on x86(-64), but they will segfault (bus error, IIRC) on other architectures such as SPARC, unfortunately. On x86(-64) this kills performance and makes the optimization be senseless. > FWIW, on x86/x64 gcc often generate

[issue16427] Faster hash implementation

2012-11-11 Thread Stefan Krah
Stefan Krah added the comment: FWIW, on x86/x64 gcc often generates identical code for x = y and memcpy(x, y, 8). See e.g. the PACK_SINGLE and UNPACK_SINGLE macros in Objects/memoryobject.c. I didn't look at the patch yet, just an idea. -- ___ Python

[issue16427] Faster hash implementation

2012-11-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > If a fast hash for bytes/memoryview is desirable, I can write a fast > robust implementation for nonaligned data. But this will be more > cumbersome and a bit slower. Unaligned accesses are not a problem on x86(-64), but they will segfault (bus error, IIRC) o

[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Oh, I see, it's already here. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscr

[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Here is a test case for the hash/alignment issue. I think here should be a test for a shifted data. Something like hash(b'abcd...') == hash(memoryview(b'xabcd...')[1:]). -- ___ Python tracker

[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > The patch is too optimistic, it gives different results depending on the > alignment of the memory buffer: So this method is not applicable for a byte. Here is a patch only for strings. If a fast hash for bytes/memoryview is desirable, I can write a fast

[issue16427] Faster hash implementation

2012-11-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is a test case for the hash/alignment issue. -- Added file: http://bugs.python.org/file27948/test_hash_align.patch ___ Python tracker ___ _

[issue16427] Faster hash implementation

2012-11-10 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +mark.dickinson, skrah, tim_one ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscr

[issue16427] Faster hash implementation

2012-11-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: The patch is too optimistic, it gives different results depending on the alignment of the memory buffer: >>> b = b"abcd"*100 >>> hash(b[1:]) 7264687738704559842 >>> hash(memoryview(b)[1:]) 9054791208347464792 >>> memoryview(b)[1:] == b[1:] True -- nosy

[issue16427] Faster hash implementation

2012-11-10 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Removed file: http://bugs.python.org/file27920/fast_hash.patch ___ Python tracker ___ ___ Python-bugs-list maili

[issue16427] Faster hash implementation

2012-11-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Any reason you're using an unsigned int in your loop instead of a Py_uhash_t? In fact, there is no serious reason. This should be the type aligned as minimal alignment of void*, size_t and Py_hash_t. Since de facto Py_uhash_t is size_t, then we can use siz

[issue16427] Faster hash implementation

2012-11-10 Thread Gregory P. Smith
Gregory P. Smith added the comment: yes please! Any reason you're using an unsigned int in your loop instead of a Py_uhash_t? -- nosy: +gregory.p.smith ___ Python tracker ___ __

[issue16427] Faster hash implementation

2012-11-09 Thread Jesús Cea Avión
Changes by Jesús Cea Avión : -- nosy: +jcea ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue16427] Faster hash implementation

2012-11-07 Thread Christian Heimes
Changes by Christian Heimes : -- nosy: +christian.heimes ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:/

[issue16427] Faster hash implementation

2012-11-07 Thread Serhiy Storchaka
New submission from Serhiy Storchaka: In the discussion of issue14621 it was noted that much more complex hash algorithms can overtake the current one due to the fact that they process more data at a time. Here is a patch that implements this idea for the current algorithm. Also code duplica